Getting started
We welcome contributions to improve the PDF visual diff tool! Whether you’re fixing bugs, adding features, or improving documentation, your help is appreciated.Prerequisites
Before contributing, ensure you have:- Python 3.7 or higher
- Git for version control
- A text editor or IDE
- Basic understanding of Python and PDF processing
Initial setup
Install dependencies
Install the required Python packages:Dependencies from
requirements.txt:PyMuPDF- PDF renderingscikit-image- SSIM algorithmPillow- Image processingnumpy- Numerical operationsreportlab- Test PDF generation
Codebase structure
The project follows a simple, focused structure:Core modules
pdf_visual_diff.py
pdf_visual_diff.py
The main entry point containing all comparison logic.Key functions:
compare_pdfs(pdf1_path, pdf2_path, output_dir, threshold)- Core comparison function (lines 10-136)main()- CLI argument parsing and script entry (lines 137-148)
- PDF loading and validation (lines 19-30)
- Page rendering loop (lines 34-69)
- Extra page handling (lines 72-89)
- Results generation (lines 97-126)
tests/test_diff_script.py
tests/test_diff_script.py
Integration tests using subprocess to test the CLI.Test class:
TestPdfVisualDiff- Main test suite with three test methods
test_identical_pdfs()- Verifies identical PDFs passtest_different_text_pdfs()- Checks diff detectiontest_different_page_count_pdfs()- Tests page count handling
tests/create_test_pdfs.py
tests/create_test_pdfs.py
Test fixture generator using ReportLab.Functions:
create_test_pdf(filename, text_content)- Creates a simple one-page PDFsetup_test_files()- Generates all test fixtures
Makefile
Makefile
Build automation with common development tasks.Targets:
make install- Install dependenciesmake test- Run test suitemake setup- Generate test PDFsmake clean- Remove generated files
Development workflow
Making changes
Create a feature branch
Always work on a separate branch:Use descriptive branch names:
feature/add-threshold-auto-detectfix/memory-leak-large-pdfsdocs/improve-readme
Make your changes
Edit the relevant files. Common areas:
- Core logic: Modify
pdf_visual_diff.py - Tests: Add/update
tests/test_diff_script.py - Test fixtures: Update
tests/create_test_pdfs.py - Dependencies: Update
requirements.txtif needed
Commit your changes
Write clear, descriptive commit messages:Good commit messages explain why, not just what.
Code style guidelines
Follow these conventions to maintain consistency:Python style
- Follow PEP 8 style guide
- Use 4 spaces for indentation (no tabs)
- Maximum line length: 100 characters
- Use descriptive variable names
- Add docstrings to all functions
Testing conventions
- Write tests for all new features
- Use descriptive test method names
- Include docstrings explaining what each test verifies
- Follow the Arrange-Act-Assert pattern
Common contribution areas
Feature additions
Potential features to implement:- Multi-format support: Export diffs as PDF, HTML reports
- Threshold auto-tuning: Automatically determine optimal threshold
- Batch comparison: Compare multiple PDF pairs
- Ignore regions: Mask specific areas from comparison
- Performance optimization: Parallel page processing
- CI/CD integration: GitHub Actions workflow examples
Bug fixes
When fixing bugs:- Create a test that reproduces the bug
- Verify the test fails before your fix
- Implement the fix
- Verify the test passes
- Check that existing tests still pass
Documentation improvements
- Improve code comments
- Add usage examples to README
- Create troubleshooting guides
- Document edge cases
Reviewing code
When reviewing contributions, check for:- Correctness: Does it solve the stated problem?
- Tests: Are there tests covering the new code?
- Style: Does it follow project conventions?
- Performance: Are there any obvious bottlenecks?
- Documentation: Are changes documented?
Release process
For maintainers releasing new versions:Getting help
If you need assistance:- Issues: Open a GitHub issue for bugs or feature requests
- Discussions: Use GitHub Discussions for questions
- Code review: Tag maintainers in your PR for review
Before opening an issue, search existing issues to avoid duplicates. Provide:
- Clear description of the problem
- Steps to reproduce
- Expected vs actual behavior
- System information (OS, Python version)
- Sample PDFs if applicable (without sensitive data)
Code of conduct
We expect all contributors to:- Be respectful and constructive
- Welcome newcomers and help them get started
- Focus on what’s best for the project and community
- Accept constructive criticism gracefully
- Show empathy towards other community members