Google has released MusicLM, a model for generating music from text.

A team of engineers at Google has unveiled a new music generation AI system called MusicLM. The model produces high-quality music based on text descriptions such as “calming violin melody supported by a distorted guitar riff.” It works in a similar way to DALL-E, generating images from text.

MusicLM uses AudioLM’s multilevel autoregressive modeling as its generative component and extends it to text processing. To address the key problem of lack of paired data, the scientists applied MuLan, a joint music-text model trained to project music and its textual descriptions onto representations that are close to each other in the embedding space.

While training MusicLM on a large unlabeled music dataset, the model treats the conditional music generation process as a hierarchical sequence modeling task and generates music at 24 kHz, which remains constant for several minutes. To address the lack of evaluation data, the developers released MusicCaps, a new high-quality music captioning dataset containing 5,500 examples of music text pairs prepared by professional musicians.

Experiments show that MusicLM outperforms previous systems in terms of sound quality and compliance with text descriptions. Additionally, the MusicLM model can condition on both text and melody. The model can generate music according to the style described in the text description and can transform the melody even if the song is whistled or hummed.

Check out the model demo on our website.

The AI system learned how to create music by training on a dataset containing 5 million audio clips representing 280,000 hours of songs performed by singers. MusicLM can create songs of various lengths. For example, you can create a quick riff or an entire song. And it can go even further than that, as is often the case in symphonies, where alternating compositions of songs create the feeling of a story. The system can also handle specific requests, such as requests for specific instruments or specific genres. It can also create a similar feel to vocals.

The creation of the MusicLM model is part of a deep learning AI application designed to replicate human mental abilities such as speaking, writing a paper, drawing a picture, taking a test, or writing a proof of a mathematical theorem.

Now, the developers have announced that Google will not be releasing the system publicly. Test results showed that about 1% of the music generated by the model was a direct copy of music from real performers. Therefore, we are wary of content theft and lawsuits.

What is a network engineer?

MathPrompt: A new AI method for evading AI safety mechanisms through mathematical encoding

Study: AI Could Lead to Inconsistent Results in Home Surveillance | MIT News

GoM on GST rate rationalization will meet on September 25 to discuss slabs, rate adjustments.

DJI OSMO Action 5 Pro camera features new 40-megapixel sensor and longer battery life

What is a network engineer?

Space Marine 2, Opens the Xbox 360 Era, Brothers Enthusiasts in Steam Reviews

Texas Court Dismisses Consensys Lawsuit Against SEC Regarding Ethereum Investigation

Wheels of Change: Self-Balancing Technologies for Urban Mobility

Discord CEO sheds light on future of gamer communication as users cross 200M

Most Popular

Save up to 50 percent on headphones, speakers, gaming gear and more

Reach the Limit – Join EU-Startups CLUB

xAI previews Grok-1.5 and creates a new benchmark called RealWorldQA.

Our Picks

GoM on GST rate rationalization will meet on September 25 to discuss slabs, rate adjustments.

DJI OSMO Action 5 Pro camera features new 40-megapixel sensor and longer battery life

What is a network engineer?

Google has released MusicLM, a model for generating music from text.

Related Posts