Download 665k Zip ✮ [UPDATED]
A significant portion of the 665k dataset relies on external datasets like OCR-VQA. However, many original image URLs in these datasets are no longer active.
If you are starting a vision-language project, downloading the is highly recommended as a foundational step. However, it is vital to:
Consider using it in conjunction with newer, more specialized datasets if you are working with top-tier models like Qwen-VL. Download 665K zip
Moderate; broken links in the original source require searching for community mirrors/zips.
Verify the source of the zip to ensure it includes the images. A significant portion of the 665k dataset relies
Some distributed versions of the 665k zip files use the Parquet format rather than standard JPG/PNG files. While efficient for storage, this requires an extra conversion step before the data can be used directly for training in many standard pipelines.
Fine-tuning on the 665k dataset consistently improves "Average Relative Performance" (ARP) for medium-sized models like TinyLLaVA 2.0B. However, it is vital to: Consider using it
add ocr vqa images by Victorwz · Pull Request #1458 - GitHub