Nari Labs: Dia
Importance: 3 | # | tts
Dia is a 1.6B parameter text to speech model created by Nari Labs.
Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc.
Seriously impressive capabilities.
Insane how much low hanging fruit there is for Audio models right now. A team of two picking things up over a few months can build something that still competes with large players with tons of funding