โ† Scoopfeeds โ€” Intelligent news, curated.
computer-science

Unlimited OCR: One-Shot Long-Horizon Parsing

Hacker News ยท Jun 23, 2026, 11:35 AM

Key takeaways

  • Welcome the Era of One-shot Long-horizon Parsing.
  • [2026/06/23] ๐Ÿ“„ Our paper is now available on ar Xiv. [2026/06/23] ๐Ÿค Thanks to the Model Scope community for their support.
  • Install the local SGLang wheel first, then pin kernels==0.9.0 and install PyMuPDF for PDF-to-image conversion:

Welcome the Era of One-shot Long-horizon Parsing.

[2026/06/23] ๐Ÿ“„ Our paper is now available on ar Xiv. [2026/06/23] ๐Ÿค Thanks to the Model Scope community for their support. Our model is now available at Model Scope. [2026/06/22] ๐Ÿš€ We present Unlimited-OCR, aiming to push Deepseek-OCR one step further. Inference Transformers Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.3 + CUDA12.9๏ผš

torch==2.10.0 torchvision==0.25.0 transformers==4.57.1 Pillow==12.1.1 matplotlib==3.10.8 einops==0.8.2 addict==2.4.0 easydict==1.13 pymupdf==1.27.2.2 psutil==7.2.2 import os import torch from transformers import AutoModel, AutoTokenizer model_name = 'baidu/Unlimited-OCR' tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModel.from_pretrained( model_name, trust_remote_code=True, use_safetensors=True, torch_dtype=torch.bfloat16, ) model = model.eval().cuda() # โ”€โ”€ Single image supports two configs: gundam or base โ”€โ”€ # gundam: base_size=1024, image_size=640, crop_mode=True # base: base_size=1024, image_size=1024, crop_mode=False model.infer( tokenizer, prompt='<image>document parsing.', image_file='your_image.jpg', output_path='your/output/dir', base_size=1024, image_size=640, crop_mode=True, max_length=32768, no_repeat_ngram_size=35, ngram_window=128, save_results=True, ) # โ”€โ”€ Multi page / PDF only uses base (image_size=1024) โ”€โ”€ model.infer_multi( tokenizer, prompt='<image>Multi page parsing.', image_files=['page1.png', 'page2.png', 'page3.png'], output_path='your/output/dir', image_size=1024, max_length=32768, no_repeat_ngram_size=35, ngram_window=1024, save_results=True, ) # โ”€โ”€ PDF (convert pages to images, then multi-page parsing) โ”€โ”€ import tempfile, fitz # PyMuPDF def pdf_to_images(pdf_path, dpi=300): doc = fitz.open(pdf_path) tmp_dir = tempfile.mkdtemp(prefix='pdf_ocr_') mat = fitz.Matrix(dpi / 72, dpi / 72) paths = [] for i, page in enumerate(doc): out = os.path.join(tmp_dir, f'page_{i+1:04d}.png') page.get_pixmap(matrix=mat).save(out) paths.append(out) doc.close() return paths model.infer_multi( tokenizer, prompt='<image>Multi page parsing.', image_files=pdf_to_images('your_doc.pdf', dpi=300), output_path='your/output/dir', image_size=1024, max_length=32768, no_repeat_ngram_size=35, ngram_window=1024, save_results=True, ) SGLang Set up the environment (uv-managed virtualenv). Install the local SGLang wheel first, then pin kernels==0.9.0 and install PyMuPDF for PDF-to-image conversion:

Article preview โ€” originally published by Hacker News. Full story at the source.
Read full story on Hacker News โ†’ More top stories
Aggregated and edited by the Scoop newsroom. We surface news from Hacker News alongside other reporting so you can compare coverage in one place. Editorial policy ยท Corrections ยท About Scoop