Tip jar

If you like CaB and wish to support it, you can use PayPal or KoFi. Thank you, and I hope you continue to enjoy the site - Neil.

Buy Me a Coffee at ko-fi.com

Support CaB

Recent

Welcome to Cook'd and Bomb'd. Please login or sign up.

March 29, 2024, 01:41:27 PM

Login with username, password and session length

A.I. Robot attempts to continue songs

Started by daf, October 25, 2020, 04:42:09 PM

Previous topic - Next topic

daf

Rick Astely - Never Gonna Give You Up - It's Ghastly!
Aha - Take On Me - Uh oh!
Nirvana - Smells Like Teen Spirit - This Stinks!
Queen - Don't Stop Me Now - I'm having such a bad time, I'm ruining it all!

- - - -

I think the software that created these horrors is called 'OpenAI Jukebox' if anyone fancies having a go.

Captain Z

The Aha one seems to have some legitimately interesting ideas, especially the second attempt.

Is there a more detailed description of what is being fed into the AI here, and how the algorithm works?

Edit: https://openai.com/blog/jukebox/

Check out "Hot Tub Christmas"

daf

#2
Here's some more stinkers :

Queen - Bohemian Rhapsody - any way the wee flows
Toto - Africa - Writhes like a limp-moose above the Serengeti
Michael Jackson - Thriller - The King of Poop
Oasis - Wonderwall - yeah . . . that's actually an improvement! (ZING!!)

Quote from: 'wazzpqazzza' in the youtube commentsTake1: Liam_Gallagher_Wonderwall_original_demo_recording_(RARE 1995).wav.mp3
Take2:  Oasis but played in a Mongolian vodka shack around a fire by people who only have the tabs and lyrics
Take3: Oasis but it's Eddie Vedder
Take4: Google translate Polish->Mandarin>Welsh (that scream at the end, tho)
Take5: Actually just a Smashing Pumpkins song
Take6: Oasis has two drummers and they cannot agree on how to play so they just stop and argue
Take7: Wonderwall - Kevin Shields
Take8: Liam Gallagher got drunk and forgot the lyrics

It's crazy how good these will be in a few years, can't imagine what it means for the music industry.

Tikwid

Quote from: Captain Z on October 25, 2020, 04:57:05 PM
Is there a more detailed description of what is being fed into the AI here, and how the algorithm works?
As CaB's resident machine learner, and somebody who's been doing a lot of experimenting with Jukebox (I won't doxx myself but you can find my experiments on YouTube) I feel qualified enough to give a basic rundown:

Jukebox is basically a custom-built AI that's been trained on a database of around 1.3 million songs, along with their lyrics and their artist/genres tags. What this essentially means is it knows what the sounds of particular artists and genres are like at the raw audio level, and it can generate new variations of those patterns - it's not cutting up stems or acapellas or anything, it's literally generating entirely new audio based on what it knows and on the parameters you input. The main parameters are:

- Input audio, for continuations (running without input audio will create completely original music in the chosen style and genre)
- Input lyrics (running without input lyrics will either result in an instrumental, or in surreal wordless "singing in tongues")
- Genre (from this list - if your genre can be assembled from these single word constituent parts it'll accept it as valid)
- Artist (from this list - if you're doing a continuation and the original artist isn't on the list, setting the artist to "unknown" will base the results on the input audio alone, and in most cases results in a reasonable approximation of the style of that artist)
- Length of generated audio (the higher length, the longer it'll take to generate)
- "Temperature" (essentially a measure for randomness, or how many risks the AI takes with what it's generating; with continuations, a lower temperature means it'll bear a closer resemblance to the input audio, while a higher temperature will quickly diverge.)

To run Jukebox you either download the code onto your own machine and run it there, or open up a Google Colab page that allows you to remotely host it on one of Google's servers. The latter is what most people use, since Jukebox consumes an unfathomable amount of memory and most consumer-level GPUs wouldn't be able to handle it; using a Colab also allows you to link up your Google Drive, so you can easily add input audio and save your generated files.

Once the code's all set up and you've put in all the parameters you want, you can get generating. Because even a few seconds of audio has so much data in it, Jukebox starts off generating new music at a very low resolution, known as level 2; this takes about thirty minutes to generate one minute of new audio, give or take. This preliminary result is very fuzzy, but still listenable, and you can decide whether you're happy with the results before progressing to upscaling. This stage rebuilds the new audio at a higher resolution, and progresses through level 1 followed by level 0, both of which take a lot longer than level 1 - two hours for a minute of new audio at level 1, up to nine hours at level 0 - but resulting in much greater fidelity than level 0. Most of the popular videos on YouTube feature level 0 or level 1 generations.

It's also possible to "co-compose" - generate a few very short level 2 segments, deciding which candidate you like, and repeating, in order to give you a little more control over the process - as well as saving checkpoints between levels and coming back to your generation later (I think modifying the parameters between checkpoints allows you to generate samples with multiple artists, such as the Sinatra/Fitzgerald duets on OpenAI's showcase site of generated samples).

Hopefully what I've said makes sense as a basic primer, this video tutorial goes over the process in a bit more depth. And if you're ready to take the plunge here's the Colab:https://colab.research.google.com/github/SMarioMan/jukebox/blob/master/jukebox/Interacting_with_Jukebox.ipynb

A few of my favourite experiments (none generated by me, just ones I've found elsewhere on the net):
- A variety of interpretations of Smash Mouth's All Star, including a groovy Parappa the Rapper-style beat, a Snoop Dogg guest feature, and a strangely anthemic "AHH AHH AHH AHH, THEY'RE OUT TO GET ME"
- Horrifying satanic Prince with an extremely fitting URL
- Genuinely beautiful Muppets takes on Twinkle Twinkle Little Star and Itsy Bitsy Spider (described by my girlfriend as "like listening to Jim Henson's dreams")
- Crazy In Love but Beyonce reveals her hidden screamo vocal talents
- The Ramones recast as a mariachi band, performing a jolly singsong to an audience of rabid chimpanzees
- Morphine-addled Bowie sings the countdown on Space Oddity
- A universe where Christmastime is a dark pagan winter ritual, and Frank Sinatra is its unholy acolyte...
- ...and another where he's just very very randy

Captain Z

Thanks for the info, appreciate you taking the time. I was interested in how the AI 'knows' to continue with the song's original lyrics in most of the above cases, and if it the full song is input in the first case, isn't it biased towards trying to follow the original structure. I'll probably look into it a bit more once I've gone through all these examples.

Quote from: Tikwid on October 25, 2020, 10:19:22 PM
- Genuinely beautiful Muppets takes on Twinkle Twinkle Little Star


Tikwid

A little treat for CaB - think these might genuinely be improvements over the original https://vocaroo.com/15T6pxprKbXU

daf

Love that - as you say, a definite improvement!

Chriddof

It took me a while to figure out what the original song was, but yeah, a massive improvement. I particularly like the Boredoms-esque chaotic breakdown in the second go.

PaulTMA


kngen

Quote from: Tikwid on October 25, 2020, 10:19:22 PM

- Morphine-addled Bowie sings the countdown on Space Oddity


I must have listened to this 100 times now, and each time I marvel at Bowie's singing countdown (almost Phil Cornwell-esque in its cartoonish accuracy). I don't what kind of unexpected cerebral episode hits him when he reaches 'one', but what follows makes me laugh like nothing else has in years.

Tikwid

Quote from: Chriddof on November 16, 2020, 08:03:41 AM
I particularly like the Boredoms-esque chaotic breakdown in the second go.
You'll be pleased to hear one of your own masterpieces has suffered a similar fate: https://vocaroo.com/1nUhAQsurjIn

Chriddof

Incredible! I did have a go at messing up that track of mine myself, but it didn't come out nearly as well as that. I think I can hear an unidentified (female?) rock vocalist at the end growling "SHITTIING IN THE NINETEEN-FOOOOURTIIIIIIIES".

Gregory Torso

Quote from: Tikwid on October 25, 2020, 10:19:22 PM
- Horrifying satanic Prince with an extremely fitting URL
- Morphine-addled Bowie sings the countdown on Space Oddity

These are great. It's both fascinating and slightly worrying about the progress of these AI learning machines. Talk to transformer I could at least see how it was 'learning' from texts, but it's absolutely beyond my brain how a program is 'completing' songs or generating them in a style.

Tikwid

Quote from: Gregory Torso on November 17, 2020, 08:32:25 PM
It's both fascinating and slightly worrying about the progress of these AI learning machines. Talk to transformer I could at least see how it was 'learning' from texts, but it's absolutely beyond my brain how a program is 'completing' songs or generating them in a style.
Thing is at the end of the day Jukebox and Talk to Transformer essentially use the same principles: the AI's given a sufficiently large dataset, it learns the patterns of the data and the connections between them, and once the training's done it should be able to create new information in the same space as that dataset. The songs in Jukebox's training data were tagged with their genre and artist though, so it did have some help on that front. (If the increase in parameters between language models GPT-2 and GPT-3 was from 1.5 billion parameters to 175 billion respectively, imagine what a similar increase would do to Jukebox's output...)

Stoneage Dinosaurs

Isn't A. I. Robot the guy behind PC Music?