What are the key latency and accuracy metrics of CosyVoice 2.0?

Question

Answers ( 1 )

    0
    2025-04-01T00:46:43+00:00

    - **Latency**: 150 milliseconds for the first synthesized audio packet.
    - **Accuracy**: 30-50% reduction in pronunciation errors compared to CosyVoice 1.0, with the lowest character error rate on the Seed-TTS hard test set.
    - **MOS Score**: 5.53 (on par with leading commercial models).

Leave an answer