QwQ-32B - A 32.5 billion parameter causal language model by Qwen, specialized in reasoning tasks.
## Overview of QwQ-32B
QwQ-32B is a 32.5 billion parameter causal language model developed by Qwen. It is designed for reasoning tasks, excelling in text generation, question answering, and complex problem-solving. The model supports a context length of 131,072 tokens and uses advanced architectural components like RoPE, SwiGLU, and RMSNorm.
## Key Features of QwQ-32B
QwQ-32B is a causal language model with 32.5 billion parameters, including 31.0 billion non-embedding parameters. It features 64 layers and uses Grouped Query Attention (GQA) with 40 query heads and 8 key/value heads. The model supports a context length of 131,072 tokens and requires YaRN for inputs exceeding 8,192 tokens. Its architecture includes RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
## Performance of QwQ-32B in Specific Tasks
QwQ-32B is suitable for text generation, question answering, and reasoning tasks, particularly those requiring logical deduction and complex problem-solving. It can handle long inputs using YaRN, making it ideal for extended dialogues or documents.
## Optimal Settings for QwQ-32B
The recommended settings for QwQ-32B include a temperature of 0.6, TopP of 0.95, MinP of 0, TopK between 20 and 40, and a presence_penalty between 0 and 2. For inputs exceeding 8,192 tokens, YaRN should be enabled. Multi-turn conversations should avoid including thinking content in the history, and math problems should be output with step-by-step reasoning and the final answer in a `\boxed{}` format.
## Access Points for QwQ-32B
QwQ-32B can be accessed via the [QwQ-32B Demo](https://huggingface.co/spaces/Qwen/QwQ-32B-Demo) or [QwenChat](https://chat.qwen.ai). It requires the latest version of Hugging Face transformers (4.37.0 or higher) and is recommended to be deployed using vLLM. Detailed deployment guidelines are available in the [vLLM Deployment Guide](https://qwen.readthedocs.io/en/latest/deployment/vllm.html).
## Key Features of QwQ-32B
QwQ-32B uses a Transformer architecture with Rotary Position Embedding (RoPE), SwiGLU activation, RMSNorm normalization layer, and Attention QKV bias. It has 64 layers and employs Grouped Query Attention (GQA) with 40 query heads and 8 key/value heads.
## Context Length of QwQ-32B
QwQ-32B supports a context length of 131,072 tokens. For inputs exceeding 8,192 tokens, YaRN must be enabled to manage the extended context.
## Training Methods of QwQ-32B
QwQ-32B undergoes pretraining and post-training, which includes supervised fine-tuning and reinforcement learning. These stages enhance its reasoning capabilities and performance in downstream tasks.
## Formatting Math Problems in QwQ-32B
Math problems should be output with step-by-step reasoning, and the final answer should be placed within a `\boxed{}` format. This ensures clarity and standardization in responses.
## Licensing of QwQ-32B
QwQ-32B is licensed under the Apache-2.0 license. The license details can be found at [QwQ-32B License](https://huggingface.co/Qwen/QWQ-32B/blob/main/LICENSE).
### Citation sources:
- [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) - Official URL
Updated: 2025-03-31