Basic usage
The PDF visual regression tester is invoked from the command line using thepdf_visual_diff.py script. At minimum, you need to provide two PDF files to compare.
Command syntax
Arguments
pdf1: Path to the first PDF file (typically the reference or expected version)pdf2: Path to the second PDF file (typically the new or actual version)--output: (Optional) Directory where diff images will be saved (default:diff_output)--threshold: (Optional) Similarity threshold for SSIM comparison, from 0.0 to 1.0 (default:1.0)
A threshold of
1.0 means pages must be identical. Lower values like 0.999 allow for minor rendering variations.Your first comparison
Let’s run a simple comparison using the example PDFs included in the repository.Navigate to the project directory
Make sure you’re in the project root and your virtual environment is activated:
Run the comparison
Compare two example PDF files:This command compares a working government letter template against a broken version.
Understanding the results
When differences are found
The tool generates annotated images with red highlights marking the exact locations of differences:- Diff images: Named
diff_page_N.pngwhere N is the page number - Console output: Lists all pages with differences
- JSON report: Contains structured data about the comparison
When PDFs are identical
If no differences are detected, you’ll see:results.json with "status": "success".
The output directory uses timestamps (e.g.,
20261202_171728_diff) to preserve comparison history and prevent overwriting previous results.Custom output directory
Specify a custom directory for saving diff images:my_results/ (e.g., my_results/20261202_171728_diff/).
Adjusting sensitivity
The--threshold parameter controls comparison sensitivity. The default value of 1.0 requires perfect matches.
Example: Allow minor rendering differences
- Comparing PDFs generated on different systems
- Minor font rendering variations are acceptable
- Anti-aliasing differences should be ignored
Working with the JSON output
Every comparison generates aresults.json file with detailed metadata:
status:"success"if identical,"error"if differences foundidentical: Boolean indicating if PDFs are visually the samediff_pages: Array of page numbers with differencesextra_pages: Pages that exist in only one PDFextra_pages_in: Which PDF has extra pages (“PDF1” or “PDF2”)
Parse this JSON file in your CI/CD pipeline to automatically fail builds when visual regressions are detected.
Handling page count mismatches
When PDFs have different page counts, the tool compares up to the shorter document’s length:extra_page_4_only_in_pdf2.pngextra_page_5_only_in_pdf2.png
Running tests
Verify your installation by running the included test suite:- Sets up test PDFs using the
create_test_pdfs.pyscript - Runs unit tests from
tests/test_diff_script.py - Validates that the comparison logic works correctly
Clean up test files
Clean up test files
Remove all generated test files and outputs:This removes:
tests/test_output/- Test comparison resultstests/test_pdfs/- Generated test PDFsdiff_output/- Any diff output from manual tests__pycache__/- Python cache files
Integration with CI/CD
Integrate the tool into your continuous integration pipeline:The tool prints results to stdout and generates JSON for programmatic access, making it easy to integrate with any CI/CD system.
Next steps
Now that you’ve run your first comparison, explore:- Integrate the tool into your testing workflow
- Set up automated comparisons in your CI/CD pipeline
- Adjust threshold values for your specific use case
- Parse JSON output for custom reporting
For questions or issues, refer to the project repository or file an issue on GitHub.