What is multi-token prediction in the context of large language models?

Question

Answers ( 1 )

    0
    2025-03-28T03:13:03+00:00

    Multi-token prediction is a training method for large language models where the model predicts multiple future tokens simultaneously at each position in the training corpus. This method uses independent output heads for each token, calculating cross-entropy loss independently, which enhances sample efficiency and performance.

Leave an answer