Making this album with AI 'felt like wandering in a giant maze'

The fear is over and the fun can begin. I think of creative endeavors involving artificial intelligence these days. I think we've moved past, exaggerated claims about AI that made human art redundant and can now enjoy all the possibilities of this technology. In that light, Shadow Planet — a new album created as a three-way collaboration between two humans and an AI — shows what kind of fun can really be.

Shadow Planet is the creation of author Robin Sloan, composer Jesse Solomon Clark, and Jukebox, a machine learning music program created by OpenAI. After an off-the-cuff Instagram conversation between Sloan and Clark about starting a band (named The Cotton Modules), the two began exchanging music tapes. A veteran musician, Clark sends song seeds to Sloan who feeds him into a jukebox, which is trained on a massive dataset of 1.2 million songs and tries to auto-correct any audio. The AI program, run by Sloan, then built on Clarke's ideas, which Sloan sent back to him to develop further.

The end result of this three-way trade is Shadow Planet, an atmospheric album in which pieces of folk songs and electronic hooks emerge like moss-covered logs from a vague quagmire of ambient loops and disintegrating samples. It's an entire album in itself: a pocket musical universe to explore.

As Sloan explained to me in an interview over email, Shadow Planet's sound is in many ways a result of the limitations of the jukebox, which only outputs mono audio at 44.1kHz. "While making this album, I learned that this kind of AI model is absolutely an 'instrument' that you have to learn to play," he told me. "It's basically a tuba! A very...weird...and powerful...tuba..."

It is this kind of budding creativity, when machines and humans respond to each other with limitations and advantages in programming, that makes AI art so interesting. Think about how the development of the harpsichord for piano influenced styles of music, for example, and how the latter's ability to play louder or softer (rather than the single fixed dynamic of the harpsichord) gave rise to new music. . Styles were born. This, I think, is happening now with a whole range of AI models that are shaping creative output.

You can read my interview with Sloan below, and find out why working with machine learning made him "feel like wandering a giant maze." And you can listen to Shadow Planet on Spotify, Apple Music, iTunes, Bandcamp, or Sloan and Clark's website.

Hey Robin, thanks for taking the time to talk to me about this album. First, please tell me a little bit about what material was Jesse sending you to start this collaboration? Was this the original song?

Yes! Jesse is a composer for commercials, movies and physical installations – he wrote the generative soundtrack that plays inside the visitor center in areas of the Amazon in Seattle. So he's well used to sitting down and making a bunch of musical choices. There were about a dozen short "songs" on each tape I received from them, some only 20-30 seconds long, others a few minutes, all separated, all separated by a short silence. So, my first task was always to listen, decide what I liked best, and copy it to the computer.

And then you feed them into the AI system. Can you tell me something about that program? What was it and how does it work?

I used OpenAI's jukebox model, which they trained on ~1.2 million songs, 600K of which were in English; It works on raw audio samples. That's a big part of the appeal for me; I also get MIDI-focused AI systems... polite? He has great respect for the grid! Sample-based systems (which I've used before, in various incarnations, to compose music for my previous novel's audiobook) are crunchier and more volatile, so I like them better.

To sample the jukebox model, I used my own customized code. The technology that OpenAI describes in their publication is a lot like, "Hey, Jukebox, play me a song that sounds like The Beatles," but I wanted to be able to "strange" it, so my sample The code allows me to specify several different species and styles and interpolate between them, even if they have nothing in common.

And that's all just setup. The sampling process itself is interactive. I always start with a "seed" from one of Jessie's tapes, which will give the model a direction, a stretch to follow. In a nutshell, I would say to the model: "I want something that is a mix of genre X and Y, like artists A and B, but at the same time, it has to follow this introduction: <Jesse's playing of music>"