Answers ( 1 )

    0
    2025-03-31T17:40:12+00:00

    To use QwQ-32B, follow these guidelines:
    - **Environment Requirements**: Use the latest version of the transformers library (versions below 4.37.0 may cause errors).
    - **Quick Start**: Load the model using AutoModelForCausalLM and AutoTokenizer from "Qwen/QwQ-32B".
    - **Forced Thought Output**: Begin prompts with "\\n" and set add_generation_prompt=True using apply_chat_template.
    - **Sampling Parameters**: Recommended settings include Temperature=0.6, TopP=0.95, MinP=0, TopK=20-40, and presence_penalty=0-2.
    - **Multi-turn Dialogue**: Use apply_chat_template for smooth dialogue without thought content in history.
    - **Output Format Standardization**: For mathematical problems, provide step-by-step reasoning and box the final answer with \\boxed{}. For multiple-choice questions, use JSON format and output only the option letter (e.g., "answer": "C").
    - **Long Input Handling**: Enable YaRN for prompts exceeding 8,192 tokens by adding specific configurations to config.json and using vLLM.

Leave an answer