Skip to main content

Quick start

The most basic comparison requires two PDF files:
python pdf_visual_diff.py document1.pdf document2.pdf
This will compare the two PDFs page-by-page and output any differences to the default diff_output directory.

Understanding the output

When differences are detected, the tool will:
  1. Print a summary to the console
  2. Create a timestamped output directory
  3. Generate diff images highlighting visual differences
  4. Save a JSON report with detailed results

Console output examples

All pages are visually identical.

Specifying an output directory

To save results to a custom location, use the --output flag:
python pdf_visual_diff.py old.pdf new.pdf --output ./reports
The tool will create a timestamped subdirectory within your specified path:
reports/
└── 20260304_143052_diff/
    ├── diff_page_1.png
    ├── diff_page_3.png
    └── results.json
The timestamp format is YYYYDDMM_HHMMSS, ensuring each comparison run creates a unique output directory.

Adjusting sensitivity

Control how strict the comparison is using the --threshold parameter:
# Very strict (default): only flag significant differences
python pdf_visual_diff.py doc1.pdf doc2.pdf --threshold 1

# More sensitive: detect subtle differences
python pdf_visual_diff.py doc1.pdf doc2.pdf --threshold 0.95

# Very sensitive: detect minor variations
python pdf_visual_diff.py doc1.pdf doc2.pdf --threshold 0.85
The threshold value ranges from 0.0 to 1.0, where:
  • 1.0 = Pixel-perfect match required
  • 0.999 = Internal default when comparing
  • Lower values = More tolerant of differences
See Configuration options for detailed threshold guidance.

Common usage patterns

python pdf_visual_diff.py report_2024-01.pdf report_2024-02.pdf \
  --output monthly_comparisons \
  --threshold 0.99
# Before code changes
python generate_pdf.py --output baseline.pdf

# After code changes
python generate_pdf.py --output updated.pdf

# Compare
python pdf_visual_diff.py baseline.pdf updated.pdf \
  --output test_results
#!/bin/bash
for file in baseline/*.pdf; do
  filename=$(basename "$file")
  python pdf_visual_diff.py \
    "baseline/$filename" \
    "updated/$filename" \
    --output "batch_results/$filename"
done

What gets compared

The tool performs visual comparison by:
  1. Rendering each PDF page to an image at 144 DPI (2x zoom)
  2. Converting images to RGB arrays
  3. Computing Structural Similarity Index (SSIM) between pages
  4. Flagging pages below the threshold
  5. Generating highlighted diff images for flagged pages
The comparison is purely visual. Changes to PDF metadata, embedded fonts, or internal structure are ignored unless they affect the rendered appearance.

Handling different page sizes

If PDFs have different dimensions, the tool automatically:
  • Resizes images to match for comparison (using LANCZOS interpolation)
  • Continues the comparison without error
This allows comparing PDFs with different page sizes or orientations.

Next steps

Command reference

Complete documentation of all CLI options

Output formats

Understanding generated files and JSON reports