Overview
Each comparison run generates a timestamped output directory containing:- Diff images - Visual representations of differences
- Extra page images - Pages that exist in only one PDF
- results.json - Machine-readable comparison report
Directory structure
The output follows this structure:pdf_visual_diff.py:14-17
If PDFs are identical, the directory is still created but will only contain
results.json.Diff images
Diff images highlight visual differences between corresponding pages.File naming
N is the page number (1-indexed).
Example:
diff_page_1.png- Differences on page 1diff_page_3.png- Differences on page 3diff_page_10.png- Differences on page 10
How diff images are generated
The process involves multiple steps: 1. Render pages to images (pdf_visual_diff.py:31-43)
pdf_visual_diff.py:59-60)
pdf_visual_diff.py:62-63)
pdf_visual_diff.py:65-69)
Visual characteristics
- Colors
- Resolution
- Thresholding
- Base image: Original content from PDF1
- Red overlay: Areas with differences (semi-transparent, 50% opacity)
- Unchanged areas: Original colors from PDF1
Diff images are only generated when
thresholded_diff.getbbox() returns a bounding box. If differences exist but are too subtle (all pixels < 20), no image is saved even if the SSIM is below threshold.Extra page images
When PDFs have different page counts, extra pages are rendered as standalone images.File naming
N= Page number (1-indexed)X= Which PDF contains the extra page (1 or 2)
extra_page_5_only_in_pdf1.png- Page 5 exists only in the first PDFextra_page_9_only_in_pdf2.png- Page 9 exists only in the second PDF
Generation logic
For extra pages in PDF1: (pdf_visual_diff.py:74-81)
pdf_visual_diff.py:82-89)
Results JSON file
Every comparison generates aresults.json file with detailed metadata.
Schema
Implementation:pdf_visual_diff.py:109-126
Field reference
Timestamp when the comparison was run.Format:
YYYYDDMM_HHMMSS (year-day-month_hour-minute-second)Example: "20260304_143052"Comparison result status.
"success"- PDFs are identical"error"- Differences or extra pages found
Human-readable summary of the comparison result.Examples:
"All pages are visually identical.""Visual differences found on pages: 1, 3, 5""Visual differences found on pages: 2 Extra pages only in PDF1: 9, 10"
Absolute path to the first PDF file.Example:
"/home/user/documents/baseline.pdf"Absolute path to the second PDF file.Example:
"/home/user/documents/updated.pdf"Total number of pages in the first PDF.Example:
10Total number of pages in the second PDF.Example:
8SSIM threshold value used for the comparison.Example:
0.999Whether the PDFs are visually identical.
true- No differences foundfalse- Differences or extra pages exist
pdf_visual_diff.py:118Array of page numbers (1-indexed) with visual differences.Examples:
[]- No differences[1, 3, 5]- Pages 1, 3, and 5 have differences
Array of page numbers (1-indexed) that exist in only one PDF.Examples:
[]- Same page count[9, 10]- Pages 9 and 10 exist in only one PDF
Which PDF contains the extra pages.
"PDF1"- First PDF has extra pages"PDF2"- Second PDF has extra pagesnull- PDFs have the same page count
Example JSON outputs
Programmatic usage
Parsing results in shell scripts
Parsing results in Python
CI/CD integration example
File size considerations
Typical sizes
- Diff images: 100KB - 5MB per page (depends on page complexity)
- Extra page images: 50KB - 3MB per page
- results.json: < 1KB
Storage management
Compress old results
Compress old results
Keep only JSON reports
Keep only JSON reports
Limit output size
Limit output size
See also
- Command reference - Complete CLI documentation
- Configuration options - Threshold and output settings
- Basic comparison - Getting started guide