2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00
2026-01-03 22:05:49 +07:00

Data Regression Testing Framework

A comprehensive framework for validating data integrity during code migration and system updates by comparing data outputs between Baseline (Production) and Target (Test) SQL Server databases.

Features

  • Automated Discovery - Scan databases and auto-generate configuration files
  • Multiple Comparison Types - Row counts, schema validation, aggregate sums
  • Investigation Queries - Execute diagnostic SQL queries from regression analysis
  • Flexible Configuration - YAML-based setup with extensive customization
  • Rich Reporting - HTML, CSV, and PDF reports with detailed results
  • Windows Authentication - Secure, credential-free database access
  • Read-Only Operations - All queries are SELECT-only for safety
  • Comprehensive Logging - Detailed execution logs with timestamps

🚀 Quick Start

Prerequisites

  • Python 3.9+
  • Microsoft ODBC Driver 17+ for SQL Server
  • Windows environment with domain authentication (or Linux with Kerberos)
  • Read access to SQL Server databases

Installation

# Clone the repository
git clone <repository-url>
cd data_regression_testing

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install the framework
pip install -e .

# Verify installation
drt --version

Basic Usage

# 1. Discover tables from baseline database
drt discover --server <YOUR_SERVER> --database <YOUR_BASELINE_DB> --output config.yaml

# 2. Edit config.yaml to add target database connection

# 3. Validate configuration
drt validate --config config.yaml

# 4. Run comparison
drt compare --config config.yaml

# 5. (Optional) Investigate regression issues
drt investigate --analysis-dir analysis/output_<TIMESTAMP>/ --config config.yaml

📦 Platform-Specific Installation

Windows

  1. Install Python 3.9+ from https://www.python.org/downloads/
  2. ODBC Driver is usually pre-installed on Windows
  3. Install Framework:
    python -m venv venv
    venv\Scripts\activate
    pip install -e .
    

Linux (Debian/Ubuntu)

# Install ODBC Driver
curl -fsSL https://packages.microsoft.com/keys/microsoft.asc | sudo gpg --dearmor -o /usr/share/keyrings/microsoft-prod.gpg
curl https://packages.microsoft.com/config/debian/12/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 unixodbc-dev

# Install Kerberos for Windows Authentication
sudo apt-get install -y krb5-user

# Configure /etc/krb5.conf with your domain settings
# Then obtain ticket: kinit username@YOUR_DOMAIN.COM

# Install framework
python3 -m venv venv
source venv/bin/activate
pip install -e .

📋 Commands

Discovery

Automatically scan databases and generate configuration files.

drt discover --server <YOUR_SERVER> --database <YOUR_DATABASE> [OPTIONS]

Options:

  • --server TEXT - SQL Server hostname (required)
  • --database TEXT - Database name (required)
  • --output, -o TEXT - Output file (default: config_discovered.yaml)
  • --schemas TEXT - Specific schemas to include
  • --verbose, -v - Enable verbose output

Validate

Validate configuration file syntax and database connectivity.

drt validate --config <CONFIG_FILE> [OPTIONS]

Options:

  • --config, -c PATH - Configuration file (required)
  • --verbose, -v - Enable verbose output

Compare

Execute data comparison between baseline and target databases.

drt compare --config <CONFIG_FILE> [OPTIONS]

Options:

  • --config, -c PATH - Configuration file (required)
  • --verbose, -v - Enable verbose output
  • --dry-run - Show what would be compared without executing

Investigate

Execute diagnostic queries from regression analysis.

drt investigate --analysis-dir <ANALYSIS_DIR> --config <CONFIG_FILE> [OPTIONS]

Options:

  • --analysis-dir, -a PATH - Analysis output directory containing *_investigate.sql files (required)
  • --config, -c PATH - Configuration file (required)
  • --output-dir, -o PATH - Output directory for reports (default: ./investigation_reports)
  • --verbose, -v - Enable verbose output
  • --dry-run - Show what would be executed without running

Example:

drt investigate -a analysis/output_20251209_184032/ -c config.yaml
drt investigate -a analysis/output_20251209_184032/ -c config.yaml -o ./my_reports

What it does:

  • Discovers all *_investigate.sql files in the analysis directory
  • Parses SQL files (handles markdown, multiple queries per file)
  • Executes queries on both baseline and target databases
  • Handles errors gracefully (continues on failures)
  • Generates HTML and CSV reports with side-by-side comparisons

⚙️ Configuration

Database Connections

database_pairs:
  - name: "DWH_Comparison"
    enabled: true
    baseline:
      server: "<YOUR_SERVER>"
      database: "<YOUR_BASELINE_DB>"
      timeout:
        connection: 30
        query: 300
    target:
      server: "<YOUR_SERVER>"
      database: "<YOUR_TARGET_DB>"

Comparison Settings

comparison:
  mode: "health_check"  # or "full"
  row_count:
    enabled: true
    tolerance_percent: 0.0
  schema:
    enabled: true
    checks:
      column_names: true
      data_types: true
  aggregates:
    enabled: true
    tolerance_percent: 0.01

Table Configuration

tables:
  - schema: "dbo"
    name: "FactTable1"
    enabled: true
    expected_in_target: true
    aggregate_columns:
      - "Amount"
      - "Quantity"

Output Directories

reporting:
  output_dir: "./reports"
  investigation_dir: "./investigation_reports"

logging:
  output_dir: "./logs"

discovery:
  analysis_directory: "./analysis"

Benefits:

  • Centralized storage of all output files
  • Easy cleanup and management of generated files
  • Configuration flexibility via YAML
  • Backward compatibility with CLI overrides

📊 Reports

Comparison Reports

The framework generates comprehensive reports in multiple formats:

  • HTML Report - Visual summary with color-coded results and detailed breakdowns
  • CSV Report - Machine-readable format for Excel or databases
  • PDF Report - Professional formatted output (requires weasyprint)

Reports are saved to ./reports/ with timestamps.

Investigation Reports

  • HTML Report - Interactive report with collapsible query results, side-by-side baseline vs target comparison
  • CSV Report - Flattened structure with one row per query execution

Investigation reports are saved to ./investigation_reports/ with timestamps.

🔄 Exit Codes

Code Meaning
0 Success - all comparisons passed
1 Failures detected - one or more FAIL results
2 Execution error - configuration or connection issues

🧪 Testing

Docker Test Environment

# Start test SQL Server containers
bash test_data/setup_test_environment.sh

# Test discovery
drt discover --server localhost,1433 --database TestDB_Baseline --output test.yaml

# Test comparison
drt compare --config config.test.yaml

# Cleanup
docker-compose -f docker-compose.test.yml down -v

Manual Testing

# Connect to test databases (use SA_PASSWORD environment variable)
docker exec -it drt-sqlserver-baseline /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P "$SA_PASSWORD"

# Run queries to verify data
SELECT COUNT(*) FROM dbo.FactTable1;

🚢 Deployment

Scheduled Execution

Windows Task Scheduler:

@echo off
cd C:\path\to\framework
call venv\Scripts\activate.bat
drt compare --config config.yaml
if %ERRORLEVEL% NEQ 0 (
    echo Test failed with exit code %ERRORLEVEL%
    exit /b %ERRORLEVEL%
)

Linux Cron:

# Run daily at 2 AM
0 2 * * * /path/to/venv/bin/drt compare --config /path/to/config.yaml >> /path/to/logs/cron.log 2>&1

Monitoring

# Watch logs
tail -f logs/drt_*.log

# Search for failures
grep -i "FAIL\|ERROR" logs/drt_*.log

🏗️ Architecture

src/drt/
├── cli/              # Command-line interface
│   └── commands/     # CLI commands (compare, discover, validate, investigate)
├── config/           # Configuration management
├── database/         # Database connectivity (READ ONLY)
├── models/           # Data models
├── reporting/        # Report generators
├── services/         # Business logic
│   ├── checkers/     # Comparison checkers
│   ├── investigation.py  # Investigation service
│   └── sql_parser.py     # SQL file parser
└── utils/            # Utilities

🔒 Security

  • Windows Authentication Only - No stored credentials
  • Read-Only Operations - All queries are SELECT-only
  • Minimal Permissions - Only requires db_datareader role
  • No Data Logging - Sensitive data never logged

🔧 Troubleshooting

Connection Failed

# Test connectivity
drt discover --server <YOUR_SERVER> --database master

# Verify ODBC driver
odbcinst -q -d

# Check permissions
# User needs db_datareader role on target databases

Query Timeout

Increase timeout in configuration:

baseline:
  timeout:
    query: 600  # 10 minutes

Linux Kerberos Issues

# Check ticket
klist

# Renew if expired
kinit username@YOUR_DOMAIN.COM

# Verify ticket is valid
klist

Performance

Diagnostic Logging

Enable verbose mode to see detailed timing:

drt compare --config config.yaml --verbose

This shows:

  • Per-check timing (existence, row count, schema, aggregates)
  • Query execution times
  • Parallelization opportunities

Optimization Tips

  • Disable aggregate checks for surrogate keys
  • Increase query timeouts for large tables
  • Use table filtering to focus on critical tables
  • Consider parallel execution for multiple database pairs

👨‍💻 Development

Getting Started

  1. Fork the repository on GitHub
  2. Clone your fork locally:
    git clone https://git.informatica-polet.com/drt/drt.git
    cd data_regression_testing
    
  3. Create a virtual environment:
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  4. Install dependencies:
    pip install -r requirements.txt
    pip install -e .
    
  5. Install development dependencies:
    pip install pytest pytest-cov black flake8 mypy
    

Development Workflow

1. Create a Branch

git checkout -b feature/your-feature-name
# or
git checkout -b bugfix/issue-description

2. Make Your Changes

  • Write clean, readable code
  • Follow the existing code style
  • Add docstrings to all functions and classes
  • Update documentation as needed

3. Run Tests

# All tests
pytest

# With coverage
pytest --cov=src/drt --cov-report=html

# Specific test file
pytest tests/test_models.py

4. Code Quality Checks

# Format code with black
black src/ tests/

# Check code style with flake8
flake8 src/ tests/

# Type checking with mypy
mypy src/

5. Commit Your Changes

Write clear, descriptive commit messages:

git add .
git commit -m "Add feature: description of your changes"

Commit message guidelines:

  • Use present tense ("Add feature" not "Added feature")
  • Use imperative mood ("Move cursor to..." not "Moves cursor to...")
  • Limit first line to 72 characters
  • Reference issues and pull requests when relevant

6. Push and Create Pull Request

git push origin feature/your-feature-name

Create a pull request on GitHub with:

  • Clear title and description
  • Reference to related issues
  • Screenshots (if applicable)
  • Test results

Code Style Guidelines

Python Style

  • Follow PEP 8 style guide
  • Use type hints for function parameters and return values
  • Maximum line length: 100 characters
  • Use meaningful variable and function names

Example:

def calculate_row_count_difference(
    baseline_count: int,
    target_count: int,
    tolerance_percent: float
) -> tuple[bool, float]:
    """
    Calculate if row count difference is within tolerance.
    
    Args:
        baseline_count: Row count from baseline database
        target_count: Row count from target database
        tolerance_percent: Acceptable difference percentage
        
    Returns:
        Tuple of (is_within_tolerance, actual_difference_percent)
    """
    # Implementation here
    pass

Documentation

  • Add docstrings to all public functions, classes, and modules
  • Use Google-style docstrings
  • Include examples in docstrings when helpful
  • Update README.md for user-facing changes

Testing

  • Write unit tests for all new functionality
  • Aim for >80% code coverage
  • Use descriptive test names
  • Follow AAA pattern (Arrange, Act, Assert)

Example:

def test_row_count_checker_exact_match():
    """Test row count checker with exact match"""
    # Arrange
    checker = RowCountChecker(tolerance_percent=0.0)
    
    # Act
    result = checker.check(baseline_count=1000, target_count=1000)
    
    # Assert
    assert result.status == Status.PASS
    assert result.baseline_value == 1000
    assert result.target_value == 1000

Adding New Features

New Checker Type

To add a new comparison checker:

  1. Create new checker in src/drt/services/checkers/
  2. Inherit from BaseChecker
  3. Implement check() method
  4. Add new CheckType enum value
  5. Register in ComparisonService
  6. Add tests in tests/test_checkers.py
  7. Update documentation

New Report Format

To add a new report format:

  1. Create new reporter in src/drt/reporting/
  2. Implement generate() method
  3. Add format option to configuration
  4. Update ReportGenerator to use new format
  5. Add tests
  6. Update documentation

Testing

Unit Tests

Run the test suite:

# All tests
pytest

# With coverage report
pytest --cov=src/drt --cov-report=html

# Specific test file
pytest tests/test_models.py -v

# Specific test function
pytest tests/test_models.py::test_status_enum -v

Integration Tests

Use the Docker test environment:

# Start test databases
bash test_data/setup_test_environment.sh

# Run integration tests
drt discover --server localhost,1433 --database TestDB_Baseline --output test.yaml
drt compare --config config.test.yaml

# Cleanup
docker-compose -f docker-compose.test.yml down -v

Manual Testing

# Test against real databases (requires access)
drt discover --server <YOUR_SERVER> --database <YOUR_DB> --output manual_test.yaml
drt validate --config manual_test.yaml
drt compare --config manual_test.yaml --dry-run

Reporting Issues

When reporting issues, please include:

  • Clear description of the problem
  • Steps to reproduce
  • Expected vs actual behavior
  • Environment details (OS, Python version, ODBC driver version)
  • Relevant logs or error messages
  • Configuration file (sanitized - remove server names/credentials)

Example:

**Description:** Row count comparison fails with timeout error

**Steps to Reproduce:**
1. Configure comparison for large table (>1M rows)
2. Run `drt compare --config config.yaml`
3. Observe timeout error

**Expected:** Comparison completes successfully
**Actual:** Query timeout after 300 seconds

**Environment:**
- OS: Windows 10
- Python: 3.9.7
- ODBC Driver: 17 for SQL Server

**Logs:**

ERROR: Query timeout on table dbo.FactTable1

Feature Requests

For feature requests, please:

  • Check if feature already exists or is planned
  • Describe the use case clearly
  • Explain why it would be valuable
  • Provide examples if possible

Code Review Process

All contributions go through code review:

  1. Automated checks must pass (tests, linting)
  2. At least one maintainer approval required
  3. Address review feedback promptly
  4. Keep pull requests focused and reasonably sized

Release Process

Releases follow semantic versioning (MAJOR.MINOR.PATCH):

  • MAJOR - Breaking changes
  • MINOR - New features (backward compatible)
  • PATCH - Bug fixes (backward compatible)

Development Tips

Debugging

# Enable verbose logging
drt compare --config config.yaml --verbose

# Use dry-run to test without execution
drt compare --config config.yaml --dry-run

# Check configuration validity
drt validate --config config.yaml

Performance Profiling

# Enable diagnostic logging
drt compare --config config.yaml --verbose

# Look for timing information in logs
grep "execution time" logs/drt_*.log

Docker Development

# Build and test in Docker
docker build -t drt:dev .
docker run -v $(pwd)/config.yaml:/app/config.yaml drt:dev compare --config /app/config.yaml

📝 License

MIT License - see LICENSE file for details

📞 Support

For issues and questions:

Description
No description provided
Readme MIT 105 KiB
Languages
Python 90.8%
Shell 5.2%
TSQL 4%