Download 665k Zip ✮ [UPDATED]

A significant portion of the 665k dataset relies on external datasets like OCR-VQA. However, many original image URLs in these datasets are no longer active.

If you are starting a vision-language project, downloading the is highly recommended as a foundational step. However, it is vital to:

Consider using it in conjunction with newer, more specialized datasets if you are working with top-tier models like Qwen-VL. Download 665K zip

Moderate; broken links in the original source require searching for community mirrors/zips.

Verify the source of the zip to ensure it includes the images. A significant portion of the 665k dataset relies

Some distributed versions of the 665k zip files use the Parquet format rather than standard JPG/PNG files. While efficient for storage, this requires an extra conversion step before the data can be used directly for training in many standard pipelines.

Fine-tuning on the 665k dataset consistently improves "Average Relative Performance" (ARP) for medium-sized models like TinyLLaVA 2.0B. However, it is vital to: Consider using it

add ocr vqa images by Victorwz · Pull Request #1458 - GitHub