PDF to Image Ingestion & Vision Model Testing
This use case provides tooling for converting PDFs to images and testing vision model quality before ingesting into the knowledge graph system.
Purpose
- PDF to Images: Simple converter for preparing PDFs for multimodal ingestion
- Vision Model Testing: Scratch space for evaluating Granite Vision 3.3 2B quality
- Quality Verification: Assess description accuracy and performance before production use
Prerequisites
System Dependencies
# Install poppler-utils (required for PDF conversion)
sudo apt install poppler-utils # Debian/Ubuntu
# or
brew install poppler # macOS
Python Dependencies
# Install from requirements.txt
pip install -r requirements.txt
# Or install individually:
pip install pdf2image ollama Pillow
Ollama Setup
Make sure Ollama is running with Granite Vision model:
# Check Ollama status
docker ps | grep ollama
# Verify Granite Vision model is available
docker exec kg-ollama ollama list | grep granite3.3-vision
Quick Start
1. Convert PDF to Images
# Basic conversion (300 DPI, default)
python convert.py document.pdf
# Custom output directory
python convert.py document.pdf /path/to/output
# Higher quality (larger files)
python convert.py document.pdf --dpi 600
# Lower quality (smaller files, faster)
python convert.py document.pdf --dpi 150
Output: Ordered PNG files page-001.png, page-002.png, etc.
2. Test Vision Model Quality
# Test single image
python test_vision.py page-001.png
# Save description to file
python test_vision.py page-001.png --save-description
# Use custom prompt
python test_vision.py page-001.png --prompt "Extract all text and describe the layout"
Output: Markdown description + performance metrics
Workflow Example
End-to-End Testing
# 1. Convert PDF to images
python convert.py /path/to/document.pdf
# 2. Test vision model on sample pages
python test_vision.py document_images/page-001.png --save-description
python test_vision.py document_images/page-010.png --save-description
python test_vision.py document_images/page-050.png --save-description
# 3. Review descriptions and evaluate quality
cat document_images/page-001.txt
cat document_images/page-010.txt
cat document_images/page-050.txt
# 4. If quality is good, prepare for batch ingestion
# (Future: integrate with kg ingest image command)
DPI Recommendations
| DPI | Use Case | File Size | Quality |
|---|---|---|---|
| 150 | Quick preview, testing | Small (~50-100 KB) | Basic |
| 300 | Standard ingestion (recommended) | Medium (~200-400 KB) | Good |
| 600 | High-quality archival | Large (~1-2 MB) | Excellent |
Default: 300 DPI strikes a good balance between quality and file size.
Testing Notes
What to Look For
When evaluating Granite Vision descriptions:
- Text Accuracy: Does it capture all visible text verbatim?
- Structure Recognition: Does it identify headings, lists, tables?
- Visual Elements: Does it describe diagrams, charts, images?
- Relationships: Does it explain how elements relate to each other?
- Layout: Does it capture the organization and flow?
Performance Metrics
Expected performance on typical presentation slides (300 DPI):
- Image size: ~200-400 KB per page
- Processing time: 5-15 seconds per image
- Description length: 500-2000 characters
Quality Assessment
Good description (ready for ingestion): - Captures all text accurately - Identifies visual structure (headings, bullets) - Describes diagrams and charts meaningfully - Maintains logical flow
Poor description (needs adjustment): - Missing or incorrect text - Ignores visual structure - Generic diagram descriptions ("there is a box") - No logical organization
Common Issues
PDF Conversion Errors
Error: pdf2image: command not found
Fix: Install poppler-utils system dependency
Error: Permission denied
Fix: Make script executable: chmod +x convert.py
Vision Model Errors
Error: Connection refused to Ollama
Fix: Start Ollama container: docker start kg-ollama
Error: Model not found
Fix: Pull model: docker exec kg-ollama ollama pull ibm/granite3.3-vision:2b
Error: Out of memory Fix: Reduce image DPI or use smaller batch sizes
File Organization
pdf-to-images/
├── convert.py # PDF to images converter
├── test_vision.py # Vision model quality tester
├── requirements.txt # Python dependencies
├── README.md # This file
├── .gitignore # Ignore large files
└── [scratch space] # Your PDFs, images, test outputs (gitignored)
Integration with Knowledge Graph
Once you've verified vision model quality:
- Future CLI:
kg ingest image page-001.png -o "My Ontology" - Future Batch:
kg ingest images document_images/ -o "My Ontology" - Future API: POST
/ingest/imagewith image bytes
Example Test Data
Test with the EPOM (Enterprise Product Operating Model) presentation:
# 1. Convert EPOM PDF to images
python convert.py "/home/aaron/Projects/ai/data/etfm/Enterprise Product Operating Model.pdf"
# 2. Test vision model on sample slides
python test_vision.py "Enterprise Product Operating Model_images/page-001.png" --save-description
python test_vision.py "Enterprise Product Operating Model_images/page-010.png" --save-description
# 3. Review quality
cat "Enterprise Product Operating Model_images/page-001.txt"
Next Steps
After verifying vision model quality:
- Document findings: Note description quality, performance, issues
- Decide approach: Local (Granite) vs Cloud (GPT-4o/Claude)
- Implement ingestion: Build image ingestion into main pipeline
- Create API routes:
/ingest/imageendpoint - Add CLI commands:
kg ingest imagecommand - Test end-to-end: Full pipeline from PDF to concept graph
License
This tooling is part of the Knowledge Graph System (Apache 2.0).
Dependencies: - pdf2image: MIT License - poppler-utils: GPL (external tool, not linked) - ollama: MIT License - Pillow: HPND License (PIL Software License)