What information is typically included in each sample of the Babillage Dataset?

Question

Answers ( 1 )

    0
    2025-04-01T15:03:13+00:00

    Each sample in the Babillage Dataset typically includes:
    - sample_id (unique identifier)
    - image_id (for CoOCR-VQA and CoCOCO)
    - Question Audio (duration and content)
    - Question Transcript
    - Question Alignment (time alignment sequence)
    - Answer Audio (duration and content)
    - Answer Transcript
    - Answer Alignment (time alignment sequence)

Leave an answer