Skywork-R1V - An advanced AI model integrating text and visual reasoning capabilities.
## Purpose of Skywork-R1V
Skywork-R1V is designed to integrate text and visual reasoning capabilities, enabling advanced multimodal tasks such as solving complex mathematical problems and analyzing medical/scientific images. It bridges logical reasoning from text models to visual tasks through a lightweight adapter.
## Technical Components of Skywork-R1V
- **Model Variants**: Skywork-R1V-38B and Skywork-R1V-38B-qwq.
- **Vision Encoder**: InternViT-6B-448px-V2_5.
- **Language Models**: DeepSeek-R1-Distill-Qwen-32B (for Skywork-R1V-38B) and QwQ-32B (for Skywork-R1V-38B-qwq).
- **Parameters**: 38.4 billion, using BF16 tensor type.
- **Training Method**: Three-stage approach combining iterative supervised fine-tuning (SFT) and reinforcement learning (GRPO).
## Performance Benchmarks for Skywork-R1V
- **Reasoning**:
- MATH-500: 94.0 pass@1
- AIME 2024: 72.0 pass@1
- GPQA: 61.6 pass@1
- **Vision**:
- MathVista(mini): 67.5 pass@1
- MMMU(Val): 69.0 pass@1.
The model competes with closed-source models like GPT-4o and Claude 3.5 Sonnet.
## Purpose of Skywork-R1V
Skywork-R1V uses **visual chain-of-thought reasoning**, breaking down complex image-based problems into multi-step logical analyses. Its lightweight visual adapter efficiently transfers text-model reasoning to visual tasks without extensive retraining.
## Deployment Instructions for Skywork-R1V
1. Clone the repository:
`git clone https://github.com/SkyworkAI/Skywork-R1V`
2. Install dependencies:
`pip install -r requirements.txt`
`pip install flash-attn --no-build-isolation`
3. Run inference:
`CUDA_VISIBLE_DEVICES="0,1" python inference_with_transformers.py --model_path [path] --image_paths [image_path] --question "your question"`
## Unique Features of Skywork-R1V
- **Lightweight Visual Adapter**: Efficiently transfers text-model reasoning to visual tasks.
- **High Accuracy in Niche Tasks**: Excels in visual mathematics (e.g., MathVista) and medical/scientific image analysis.
- **Open-Source Competitiveness**: Matches or outperforms closed-source models like GPT-4o in specific benchmarks.
## Resource Access for Skywork-R1V
- **Model Weights & Code**: [Hugging Face](https://huggingface.co/Skywork/Skywork-R1V-38B)
- **GitHub Repository**: [Skywork-R1V GitHub](https://github.com/SkyworkAI/Skywork-R1V)
- **Vision Encoder**: [InternViT-6B-448px-V2_5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5)
- **Language Models**: [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) / [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B).
### Citation sources:
- [Skywork-R1V](https://github.com/SkyworkAI/Skywork-R1V) - Official URL
Updated: 2025-04-01