Quickstart - PDF Visual Regression

Basic usage

The PDF visual regression tester is invoked from the command line using the pdf_visual_diff.py script. At minimum, you need to provide two PDF files to compare.

Command syntax

python pdf_visual_diff.py <path/to/pdf1.pdf> <path/to/pdf2.pdf> [options]

Arguments

pdf1: Path to the first PDF file (typically the reference or expected version)
pdf2: Path to the second PDF file (typically the new or actual version)
--output: (Optional) Directory where diff images will be saved (default: diff_output)
--threshold: (Optional) Similarity threshold for SSIM comparison, from 0.0 to 1.0 (default: 1.0)

A threshold of 1.0 means pages must be identical. Lower values like 0.999 allow for minor rendering variations.

Your first comparison

Let’s run a simple comparison using the example PDFs included in the repository.

Navigate to the project directory

Make sure you’re in the project root and your virtual environment is activated:

cd pdf-visual-regression
source venv/bin/activate  # On Windows: venv\Scripts\activate

Run the comparison

Compare two example PDF files:

python pdf_visual_diff.py example-pdfs/example-working-gov-letter.pdf example-pdfs/example-broken-gov-letter.pdf

This command compares a working government letter template against a broken version.

Review the output

If differences are found, you’ll see console output like:

Visual differences found on pages: 1
Diff images saved to: /path/to/pdf-visual-regression/diff_output/20261202_171728_diff/

The output directory contains:

diff_page_1.png: Highlighted image showing differences
results.json: Detailed comparison metadata

Understanding the results

When differences are found

The tool generates annotated images with red highlights marking the exact locations of differences:

Diff images: Named diff_page_N.png where N is the page number
Console output: Lists all pages with differences
JSON report: Contains structured data about the comparison

When PDFs are identical

If no differences are detected, you’ll see:

All pages are visually identical.

No diff images are generated, and the output directory will only contain results.json with "status": "success".

The output directory uses timestamps (e.g., 20261202_171728_diff) to preserve comparison history and prevent overwriting previous results.

Custom output directory

Specify a custom directory for saving diff images:

python pdf_visual_diff.py document_v1.pdf document_v2.pdf --output my_results

This creates timestamped subdirectories inside my_results/ (e.g., my_results/20261202_171728_diff/).

Adjusting sensitivity

The --threshold parameter controls comparison sensitivity. The default value of 1.0 requires perfect matches.

Example: Allow minor rendering differences

python pdf_visual_diff.py reference.pdf generated.pdf --threshold 0.999

Lower threshold values are useful when:

Comparing PDFs generated on different systems
Minor font rendering variations are acceptable
Anti-aliasing differences should be ignored

Setting the threshold too low (e.g., below 0.95) may cause the tool to miss significant visual differences. Start with 0.999 and adjust as needed.

Working with the JSON output

Every comparison generates a results.json file with detailed metadata:

{
  "timestamp": "20261202_171728",
  "status": "error",
  "description": "Visual differences found on pages: 1",
  "pdf1": "/absolute/path/to/pdf1.pdf",
  "pdf2": "/absolute/path/to/pdf2.pdf",
  "pdf1_pages": 1,
  "pdf2_pages": 1,
  "threshold": 1.0,
  "identical": false,
  "diff_pages": [1],
  "extra_pages": [],
  "extra_pages_in": null
}

Key fields:

status: "success" if identical, "error" if differences found
identical: Boolean indicating if PDFs are visually the same
diff_pages: Array of page numbers with differences
extra_pages: Pages that exist in only one PDF
extra_pages_in: Which PDF has extra pages (“PDF1” or “PDF2”)

Parse this JSON file in your CI/CD pipeline to automatically fail builds when visual regressions are detected.

Handling page count mismatches

When PDFs have different page counts, the tool compares up to the shorter document’s length:

python pdf_visual_diff.py short_doc.pdf long_doc.pdf

Output:

Warning: PDFs have different page counts. PDF1: 3 pages, PDF2: 5 pages.
Comparing up to the lower page count.
Extra pages only in PDF2: 4, 5
Diff images saved to: diff_output/20261202_171728_diff/

Extra pages are saved as separate images:

extra_page_4_only_in_pdf2.png
extra_page_5_only_in_pdf2.png

Running tests

Verify your installation by running the included test suite:

make test

This command:

Sets up test PDFs using the create_test_pdfs.py script
Runs unit tests from tests/test_diff_script.py
Validates that the comparison logic works correctly

Clean up test files

Remove all generated test files and outputs:

make clean

This removes:

tests/test_output/ - Test comparison results
tests/test_pdfs/ - Generated test PDFs
diff_output/ - Any diff output from manual tests
__pycache__/ - Python cache files

Integration with CI/CD

Integrate the tool into your continuous integration pipeline:

#!/bin/bash
# Example CI script

python pdf_visual_diff.py expected_output.pdf generated_output.pdf --output test_results

# Check exit code or parse JSON
if grep -q '"status": "error"' test_results/*/results.json; then
  echo "Visual regression detected!"
  exit 1
fi

echo "PDFs are visually identical"
exit 0

The tool prints results to stdout and generates JSON for programmatic access, making it easy to integrate with any CI/CD system.

Next steps

Now that you’ve run your first comparison, explore:

Integrate the tool into your testing workflow
Set up automated comparisons in your CI/CD pipeline
Adjust threshold values for your specific use case
Parse JSON output for custom reporting

For questions or issues, refer to the project repository or file an issue on GitHub.

​Basic usage

​Command syntax

​Arguments

​Your first comparison

​Understanding the results

​When differences are found

​When PDFs are identical

​Custom output directory

​Adjusting sensitivity

​Example: Allow minor rendering differences

​Working with the JSON output

​Handling page count mismatches

​Running tests

​Integration with CI/CD

​Next steps

Basic usage

Command syntax

Arguments

Your first comparison

Understanding the results

When differences are found

When PDFs are identical

Custom output directory

Adjusting sensitivity

Example: Allow minor rendering differences

Working with the JSON output

Handling page count mismatches

Running tests

Integration with CI/CD

Next steps