What is multi-token prediction in the context of large language models?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.
Answers ( 1 )
Multi-token prediction is a training method for large language models where the model predicts multiple future tokens simultaneously at each position in the training corpus. This method uses independent output heads for each token, calculating cross-entropy loss independently, which enhances sample efficiency and performance.