Answers ( 2 )

    0
    2025-03-31T21:36:09+00:00

    The **Llasa 3b Tts** is a non-official demonstration space hosted on Hugging Face, created by **srinivasbilla**. It showcases the capabilities of the **Llasa-3B model**, a text-to-speech (TTS) system developed by **Hong Kong University of Science and Technology (HKUST)**. The space enables users to generate speech from text or clone voices using short audio samples, leveraging the model's advanced zero-shot voice cloning and multilingual (Chinese-English) TTS functionalities.

    0
    2025-03-31T21:36:23+00:00

    The space uses the **Llasa-3B model**, a **text-to-speech (TTS) system** based on the **LLaMA framework**, developed by **HKUST**. Key features of the model include:
    - **Training Data**: 250,000 hours of Chinese and English speech.
    - **Architecture**: Utilizes **XCodec2 codebooks** (65,536 tokens) for speech processing.
    - **Capabilities**: Supports zero-shot voice cloning, multilingual TTS, and emotional/style matching in generated speech.
    The official model repository is hosted at [HKUSTAudio/Llasa-3B](https://huggingface.co/HKUSTAudio/Llasa-3B).

Leave an answer