Register Now

Login

Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Captcha Click on image to update the captcha .

Add question

You must login to ask a question.

Login

Register Now

Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.

Babillage Dataset - A multimodal benchmark dataset for evaluating vision speech models.

## Purpose of Babillage Dataset The Babillage Dataset is designed to serve as a benchmark for evaluating vision speech models, specifically their ability to handle spoken visual question-answering tasks in conversational formats. ## Source Datasets of Babillage Dataset The Babillage Dataset is based on three existing datasets: COCO-Captions 2014, OCR-VQA, and VQAv2, which were transformed into conversational question-answer pairs. ## Subsets of Babillage Dataset The Babillage Dataset consists of three subsets: 1. Conversational COCO (CoCOCO) 2. Conversational OCR-VQA (CoOCR-VQA) 3. Conversational VQAv2 (CoVQAv2) ## Sample Structure in Babillage Dataset Each sample in the Babillage Dataset typically includes: - sample_id (unique identifier) - image_id (for CoOCR-VQA and CoCOCO) - Question Audio (duration and content) - Question Transcript - Question Alignment (time alignment sequence) - Answer Audio (duration and content) - Answer Transcript - Answer Alignment (time alignment sequence) ## Accessing Babillage Dataset The Babillage Dataset can be loaded via Hugging Face's datasets library using the following commands: - CoCOCO: `datasets.load_dataset("kyutai/Babillage", "coco", split=split)` - CoOCR-VQA: `datasets.load_dataset("kyutai/Babillage", "ocrvqa", split=split)` - CoVQAv2: `datasets.load_dataset("kyutai/Babillage", "vqav2", split=split)` ## License of Babillage Dataset The Babillage Dataset is released under the CC-BY 4.0 license, which allows for sharing and adaptation with proper attribution. ## Supported Tasks by Babillage Dataset The Babillage Dataset supports evaluation of: - Image Description - Visual Question Answering (VQA) - Optical Character Recognition related QA (OCR-VQA) - Performance assessment of multimodal dialogue systems ## Babillage Dataset and MoshiVis Connection The Babillage Dataset was developed by the Kyutai team and is closely associated with the MoshiVis project, which is an open-source vision speech model supporting real-time voice conversations with visual understanding capabilities. ## Hosting Location of Babillage Dataset The Babillage Dataset is officially hosted on Hugging Face at: https://huggingface.co/datasets/kyutai/babillage ## Audio Format in Babillage Dataset The dataset stores audio files in ogg format, but provides code snippets to convert them to wav format if needed. ### Citation sources: - [Babillage Dataset](https://huggingface.co/datasets/kyutai/babillage) - Official URL Updated: 2025-04-01