DeepSeek-OCR WebUI

Convert documents to markdown, extract raw text, and locate specific content with bounding boxes.

Mode
Task
150 600

Modes

  • ⚡ Gundam: 1024 base + 640 tiles with cropping - Best balance
  • 🧩 Tiny: 512×512, no crop - Fastest
  • 📄 Small: 640×640, no crop - Quick
  • 📚 Base: 1024×1024, no crop - Standard
  • 🖼️ Large: 1280×1280, no crop - Highest quality

Tasks

  • Markdown: Convert document to structured markdown (grounding ✅)
  • Tables: Extract tables only as Markdown (grounding ✅)
  • Locate: Find specific text in image (grounding ✅)
  • Describe: General image description
  • Custom: Your own prompt (add <|grounding|> for boxes)