NVIDIA (NVDA, Financial) has launched a new AI model named Fugatto, designed to generate and modify music and audio. This innovative model aims to assist creators in the music, film, and video game industries. Fugatto can use any text and audio files to produce or alter music and sounds. For instance, it can create music based on text prompts, add or remove instruments from existing tracks, and even alter accents or emotions in sounds.
The model is particularly beneficial for advertising agencies, allowing them to adapt existing ads for different regions by incorporating varied accents and emotions in voiceovers. Video game developers can also leverage Fugatto to modify pre-recorded assets, making them responsive to players' actions.
Fugatto boasts the capability to transform sounds creatively, such as making a trumpet sound like a barking dog or a saxophone like a meowing cat. By fine-tuning with minimal singing data, researchers have discovered that Fugatto can handle tasks it wasn't pre-trained for, such as generating high-quality singing from text.
The full version of Fugatto utilizes 2.5 billion parameters and was trained on NVIDIA's DGX system, featuring 32 Nvidia H100 Tensor Core GPUs. The development of this model took over a year. Fugatto is expected to compete with similar technologies from startups like Runway and major companies like Meta Platforms (META), which recently introduced an AI model called Movie Gen for creating realistic video and audio clips.