ElevenLabs, Stability AI Drop New AI Music Models—Can They Catch Suno?
Music v2 brings genre-shifting and section-by-section composition to ElevenLabs. Stable Audio 3.0 ships open weights and six-minute tracks. Is either good enough to dethrone the category leader?
By Jose Antonio LanzEdited by Guillermo JimenezMay 27, 2026May 27, 20265 min read
In brief
- ElevenLabs launched Music v2, capable of switching genres mid-track, building songs section by section, and inpainting specific parts.
- Stability AI released Stable Audio 3.0, a four-model family with open weights for three variants, trained on licensed data, generating tracks up to six minutes and twenty seconds long.
- Both releases lean hard into licensed training data—but Suno, valued at $2.45 billion with roughly 100 million users, is still the platform most people reach for first.
Two significant AI music updates landed this week, and neither came from Suno.
ElevenLabs, the Polish-founded voice AI company sitting at an $11 billion valuation after a $500 million Series D in February, launched Music v2. Stability AI—the Stable Diffusion people—dropped Stable Audio 3.0, a four-model family with open weights and tracks that run past six minutes.
The backdrop is the Recording Industry Association of America copyright suits from 2024 against Suno and Udio, which made "trained on licensed data" the most important phrase in any AI music announcement. Both ElevenLabs and Stability are leaning on that hard, making sure you won’t have issues with the outputs you generate.
Music v2: One track, opera to heavy metal, no breakdown
Music v2 is ElevenLabs' second music model, arriving roughly 10 months after the first. The core pitch is coherence under pressure. According to Elevenlabs, a single track can shift from opera to heavy metal and back, hold together through fast rap, and embed non-musical sound effects—all without the composition coming apart.
Generative audio tends to fall apart exactly when prompts get complicated, so this is the thing worth watching, especially in longer compositions.
Inpainting is now actually useful: select a section, regenerate it, leave everything else untouched. Users can also build songs section by section—intro, verse, chorus—with the model maintaining continuity throughout instead of treating each clip as a standalone generation. Multilingual support has improved too, though ElevenLabs didn't publish specifics.
The model powers three platforms: ElevenMusic for creators, ElevenAPI for developers, and ElevenCreative for brands. It's live on ElevenMusic and ElevenCreative now; API access is early-entry via the sales team.
ElevenLabs also cut Music v1 and v2 pricing by up to 50% for ElevenAPI and up to 40% for ElevenCreative self-serve. The company hit $500 million in annual recurring revenue in April 2026. Music is still a small slice of that—but ElevenMusic, which launched as a consumer app in April, is a direct shot at Suno's user base.
Stable Audio 3.0: Open weights, on-device, actually longer
Stable Audio 2.0 topped out at three minutes and was already behind Suno when it launched in 2024. Stable Audio 3.0 ships four models: Small SFX (on-device sound effects), Small (full music composition on-device), Medium (up to 6:20, stronger hardware), and Large (API-only). Three of the four have open weights on Hugging Face.
The Small models run at 459 million parameters each—no GPU needed. (Parameters are what measure an AI model’s capacity, essentially.) Medium hits 1.4 billion parameters and generates its 6:20 output in about 1.31 seconds on an H200 GPU. Large, at 2.7 billion, is API-only for organizations with over $1 million in revenue. Per-second generation granularity means you get exactly the track length you asked for, not an approximation.
It’s also supported in ComfyUI for local setups
The architecture is new: a semantic-acoustic autoencoder Stability calls SAME, designed to hold melodic coherence over longer outputs. LoRA fine-tuning is supported, so artists can adapt the models to their own catalogs. Inpainting is in too—single-segment, multi-segment, and causal continuation to extend a track past its original endpoint.
For context, a LoRA (Low-Rank Adaptation model) is like a tiny model that conditions how the full model generates its outputs. If you train a LoRA on blues, the model will produce blues, if you train a LoRA on BB King blues, the model will produce songs that will sound like BB King. Inpainting means a model can fix small errors in its creation. So, for example, if the model hallucinates something at the 2:30 mark, you can select a few seconds of the song, ask the model to change it into whatever you want, and the model will generate a piece of the song that fits perfectly in that timeframe and blends with the actual song as a whole.
Stability has been technically credible in AI music for years without breaking through commercially. The open-weight play is the Stable Diffusion strategy applied to audio—seed the developer community, see what gets built. The licensing is cleaner than anything Stable Audio has shipped before, with partnerships in place with Universal Music Group and Warner Music Group.
The target: Suno, the AI music king
If ChatGPT is the king of AI text, Suno is the king of AI music. The company behind the model hit a $2.45 billion valuation in November 2025, crossed $300 million in annual recurring revenue, and has been used by roughly 100 million people.
It generates around 7 million songs per day. Warner Music settled its suit against Suno in November 2025; Sony and UMG are still in federal court.
To avoid these copyright wars, ElevenLabs has licensing deals with Believe, Kobalt, and Merlin. Stability has Warner and Universal. Udio settled with all three majors and is now a walled garden—nothing you generate can leave the platform.
Stable Audio 3.0 Small and Medium are available on Hugging Face now. Large is live via the Stability AI API. Music v2 is free for ElevenMusic users, with commercial tiers through ElevenCreative and ElevenAPI.