Register Now

Login

Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Captcha Click on image to update the captcha .

Add question

You must login to ask a question.

Login

Register Now

Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.

Skywork-R1V - An advanced AI model integrating text and visual reasoning capabilities.

## Purpose of Skywork-R1V Skywork-R1V is designed to integrate text and visual reasoning capabilities, enabling advanced multimodal tasks such as solving complex mathematical problems and analyzing medical/scientific images. It bridges logical reasoning from text models to visual tasks through a lightweight adapter. ## Technical Components of Skywork-R1V - **Model Variants**: Skywork-R1V-38B and Skywork-R1V-38B-qwq. - **Vision Encoder**: InternViT-6B-448px-V2_5. - **Language Models**: DeepSeek-R1-Distill-Qwen-32B (for Skywork-R1V-38B) and QwQ-32B (for Skywork-R1V-38B-qwq). - **Parameters**: 38.4 billion, using BF16 tensor type. - **Training Method**: Three-stage approach combining iterative supervised fine-tuning (SFT) and reinforcement learning (GRPO). ## Performance Benchmarks for Skywork-R1V - **Reasoning**: - MATH-500: 94.0 pass@1 - AIME 2024: 72.0 pass@1 - GPQA: 61.6 pass@1 - **Vision**: - MathVista(mini): 67.5 pass@1 - MMMU(Val): 69.0 pass@1. The model competes with closed-source models like GPT-4o and Claude 3.5 Sonnet. ## Purpose of Skywork-R1V Skywork-R1V uses **visual chain-of-thought reasoning**, breaking down complex image-based problems into multi-step logical analyses. Its lightweight visual adapter efficiently transfers text-model reasoning to visual tasks without extensive retraining. ## Deployment Instructions for Skywork-R1V 1. Clone the repository: `git clone https://github.com/SkyworkAI/Skywork-R1V` 2. Install dependencies: `pip install -r requirements.txt` `pip install flash-attn --no-build-isolation` 3. Run inference: `CUDA_VISIBLE_DEVICES="0,1" python inference_with_transformers.py --model_path [path] --image_paths [image_path] --question "your question"` ## Unique Features of Skywork-R1V - **Lightweight Visual Adapter**: Efficiently transfers text-model reasoning to visual tasks. - **High Accuracy in Niche Tasks**: Excels in visual mathematics (e.g., MathVista) and medical/scientific image analysis. - **Open-Source Competitiveness**: Matches or outperforms closed-source models like GPT-4o in specific benchmarks. ## Resource Access for Skywork-R1V - **Model Weights & Code**: [Hugging Face](https://huggingface.co/Skywork/Skywork-R1V-38B) - **GitHub Repository**: [Skywork-R1V GitHub](https://github.com/SkyworkAI/Skywork-R1V) - **Vision Encoder**: [InternViT-6B-448px-V2_5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5) - **Language Models**: [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) / [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B). ### Citation sources: - [Skywork-R1V](https://github.com/SkyworkAI/Skywork-R1V) - Official URL Updated: 2025-04-01