Initial commit

2026-01-03 22:05:49 +07:00
commit 2f8859dbe8
63 changed files with 6708 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,97 @@
 # Security: Sensitive Files and Credentials
 # Add these patterns to your .gitignore to prevent accidental commits of sensitive data
 # Environment variables
 .env
 .env.local
 .env.*.local
 # Configuration files with credentials
 config.*.yaml
 !config.example.yaml
 !config.quickstart.yaml
 !config.test.yaml
 # Logs (may contain sensitive information)
 logs/
 *.log
 # Reports and analysis output
 reports/
 investigation_reports/
 analysis/
 # IDE and editor files
 .vscode/
 .idea/
 *.swp
 *.swo
 *~
 # Python
 __pycache__/
 *.py[cod]
 *$py.class
 *.so
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 # Virtual environments
 venv/
 ENV/
 env/
 # Testing
 .pytest_cache/
 .coverage
 htmlcov/
 # OS
 .DS_Store
 Thumbs.db
 # Temporary files
 *.tmp
 *.bak
 *.backup
 *~
 # Database files
 *.db
 *.sqlite
 *.sqlite3
 # Docker
 .dockerignore
 docker-compose.override.yml
 # Credentials and secrets (CRITICAL)
 **/secrets/
 **/credentials/
 **/.aws/
 **/.azure/
 **/.gcp/
 **/private_key*
 **/secret_key*
 **/api_key*
 **/token*
 **/password*
 # Configuration with real values
 config.prod.yaml
 config.production.yaml
 config.live.yaml
--- a/21
+++ b/21
@@ -0,0 +1,21 @@
 MIT License
 Copyright (c) 2024 QA Engineering Team
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,741 @@
 # Data Regression Testing Framework
 A comprehensive framework for validating data integrity during code migration and system updates by comparing data outputs between Baseline (Production) and Target (Test) SQL Server databases.
 ## ✨ Features
 - **Automated Discovery** - Scan databases and auto-generate configuration files
 - **Multiple Comparison Types** - Row counts, schema validation, aggregate sums
 - **Investigation Queries** - Execute diagnostic SQL queries from regression analysis
 - **Flexible Configuration** - YAML-based setup with extensive customization
 - **Rich Reporting** - HTML, CSV, and PDF reports with detailed results
 - **Windows Authentication** - Secure, credential-free database access
 - **Read-Only Operations** - All queries are SELECT-only for safety
 - **Comprehensive Logging** - Detailed execution logs with timestamps
 ## 🚀 Quick Start
 ### Prerequisites
 - Python 3.9+
 - Microsoft ODBC Driver 17+ for SQL Server
 - Windows environment with domain authentication (or Linux with Kerberos)
 - Read access to SQL Server databases
 ### Installation
 ```bash
 # Clone the repository
 git clone <repository-url>
 cd data_regression_testing
 # Create virtual environment
 python -m venv venv
 source venv/bin/activate  # On Windows: venv\Scripts\activate
 # Install the framework
 pip install -e .
 # Verify installation
 drt --version
 ```
 ### Basic Usage
 ```bash
 # 1. Discover tables from baseline database
 drt discover --server <YOUR_SERVER> --database <YOUR_BASELINE_DB> --output config.yaml
 # 2. Edit config.yaml to add target database connection
 # 3. Validate configuration
 drt validate --config config.yaml
 # 4. Run comparison
 drt compare --config config.yaml
 # 5. (Optional) Investigate regression issues
 drt investigate --analysis-dir analysis/output_<TIMESTAMP>/ --config config.yaml
 ```
 ## 📦 Platform-Specific Installation
 ### Windows
 1. Install Python 3.9+ from https://www.python.org/downloads/
 2. ODBC Driver is usually pre-installed on Windows
 3. Install Framework:
   ```cmd
   python -m venv venv
   venv\Scripts\activate
   pip install -e .
   ```
 ### Linux (Debian/Ubuntu)
 ```bash
 # Install ODBC Driver
 curl -fsSL https://packages.microsoft.com/keys/microsoft.asc | sudo gpg --dearmor -o /usr/share/keyrings/microsoft-prod.gpg
 curl https://packages.microsoft.com/config/debian/12/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
 sudo apt-get update
 sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 unixodbc-dev
 # Install Kerberos for Windows Authentication
 sudo apt-get install -y krb5-user
 # Configure /etc/krb5.conf with your domain settings
 # Then obtain ticket: kinit username@YOUR_DOMAIN.COM
 # Install framework
 python3 -m venv venv
 source venv/bin/activate
 pip install -e .
 ```
 ## 📋 Commands
 ### Discovery
 Automatically scan databases and generate configuration files.
 ```bash
 drt discover --server <YOUR_SERVER> --database <YOUR_DATABASE> [OPTIONS]
 ```
 **Options:**
 - `--server TEXT` - SQL Server hostname (required)
 - `--database TEXT` - Database name (required)
 - `--output, -o TEXT` - Output file (default: config_discovered.yaml)
 - `--schemas TEXT` - Specific schemas to include
 - `--verbose, -v` - Enable verbose output
 ### Validate
 Validate configuration file syntax and database connectivity.
 ```bash
 drt validate --config <CONFIG_FILE> [OPTIONS]
 ```
 **Options:**
 - `--config, -c PATH` - Configuration file (required)
 - `--verbose, -v` - Enable verbose output
 ### Compare
 Execute data comparison between baseline and target databases.
 ```bash
 drt compare --config <CONFIG_FILE> [OPTIONS]
 ```
 **Options:**
 - `--config, -c PATH` - Configuration file (required)
 - `--verbose, -v` - Enable verbose output
 - `--dry-run` - Show what would be compared without executing
 ### Investigate
 Execute diagnostic queries from regression analysis.
 ```bash
 drt investigate --analysis-dir <ANALYSIS_DIR> --config <CONFIG_FILE> [OPTIONS]
 ```
 **Options:**
 - `--analysis-dir, -a PATH` - Analysis output directory containing `*_investigate.sql` files (required)
 - `--config, -c PATH` - Configuration file (required)
 - `--output-dir, -o PATH` - Output directory for reports (default: ./investigation_reports)
 - `--verbose, -v` - Enable verbose output
 - `--dry-run` - Show what would be executed without running
 **Example:**
 ```bash
 drt investigate -a analysis/output_20251209_184032/ -c config.yaml
 drt investigate -a analysis/output_20251209_184032/ -c config.yaml -o ./my_reports
 ```
 **What it does:**
 - Discovers all `*_investigate.sql` files in the analysis directory
 - Parses SQL files (handles markdown, multiple queries per file)
 - Executes queries on both baseline and target databases
 - Handles errors gracefully (continues on failures)
 - Generates HTML and CSV reports with side-by-side comparisons
 ## ⚙️ Configuration
 ### Database Connections
 ```yaml
 database_pairs:
  - name: "DWH_Comparison"
    enabled: true
    baseline:
      server: "<YOUR_SERVER>"
      database: "<YOUR_BASELINE_DB>"
      timeout:
        connection: 30
        query: 300
    target:
      server: "<YOUR_SERVER>"
      database: "<YOUR_TARGET_DB>"
 ```
 ### Comparison Settings
 ```yaml
 comparison:
  mode: "health_check"  # or "full"
  row_count:
    enabled: true
    tolerance_percent: 0.0
  schema:
    enabled: true
    checks:
      column_names: true
      data_types: true
  aggregates:
    enabled: true
    tolerance_percent: 0.01
 ```
 ### Table Configuration
 ```yaml
 tables:
  - schema: "dbo"
    name: "FactTable1"
    enabled: true
    expected_in_target: true
    aggregate_columns:
      - "Amount"
      - "Quantity"
 ```
 ### Output Directories
 ```yaml
 reporting:
  output_dir: "./reports"
  investigation_dir: "./investigation_reports"
 logging:
  output_dir: "./logs"
 discovery:
  analysis_directory: "./analysis"
 ```
 **Benefits:**
 - Centralized storage of all output files
 - Easy cleanup and management of generated files
 - Configuration flexibility via YAML
 - Backward compatibility with CLI overrides
 ## 📊 Reports
 ### Comparison Reports
 The framework generates comprehensive reports in multiple formats:
 - **HTML Report** - Visual summary with color-coded results and detailed breakdowns
 - **CSV Report** - Machine-readable format for Excel or databases
 - **PDF Report** - Professional formatted output (requires weasyprint)
 Reports are saved to `./reports/` with timestamps.
 ### Investigation Reports
 - **HTML Report** - Interactive report with collapsible query results, side-by-side baseline vs target comparison
 - **CSV Report** - Flattened structure with one row per query execution
 Investigation reports are saved to `./investigation_reports/` with timestamps.
 ## 🔄 Exit Codes
 | Code | Meaning |
 |------|---------|
 | 0 | Success - all comparisons passed |
 | 1 | Failures detected - one or more FAIL results |
 | 2 | Execution error - configuration or connection issues |
 ## 🧪 Testing
 ### Docker Test Environment
 ```bash
 # Start test SQL Server containers
 bash test_data/setup_test_environment.sh
 # Test discovery
 drt discover --server localhost,1433 --database TestDB_Baseline --output test.yaml
 # Test comparison
 drt compare --config config.test.yaml
 # Cleanup
 docker-compose -f docker-compose.test.yml down -v
 ```
 ### Manual Testing
 ```bash
 # Connect to test databases (use SA_PASSWORD environment variable)
 docker exec -it drt-sqlserver-baseline /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P "$SA_PASSWORD"
 # Run queries to verify data
 SELECT COUNT(*) FROM dbo.FactTable1;
 ```
 ## 🚢 Deployment
 ### Scheduled Execution
 **Windows Task Scheduler:**
 ```batch
@echo off
 cd C:\path\to\framework
 call venv\Scripts\activate.bat
 drt compare --config config.yaml
 if %ERRORLEVEL% NEQ 0 (
    echo Test failed with exit code %ERRORLEVEL%
    exit /b %ERRORLEVEL%
 )
 ```
 **Linux Cron:**
 ```bash
 # Run daily at 2 AM
 0 2 * * * /path/to/venv/bin/drt compare --config /path/to/config.yaml >> /path/to/logs/cron.log 2>&1
 ```
 ### Monitoring
 ```bash
 # Watch logs
 tail -f logs/drt_*.log
 # Search for failures
 grep -i "FAIL\|ERROR" logs/drt_*.log
 ```
 ## 🏗️ Architecture
 ```
 src/drt/
 ├── cli/              # Command-line interface
 │   └── commands/     # CLI commands (compare, discover, validate, investigate)
 ├── config/           # Configuration management
 ├── database/         # Database connectivity (READ ONLY)
 ├── models/           # Data models
 ├── reporting/        # Report generators
 ├── services/         # Business logic
 │   ├── checkers/     # Comparison checkers
 │   ├── investigation.py  # Investigation service
 │   └── sql_parser.py     # SQL file parser
 └── utils/            # Utilities
 ```
 ## 🔒 Security
 - **Windows Authentication Only** - No stored credentials
 - **Read-Only Operations** - All queries are SELECT-only
 - **Minimal Permissions** - Only requires db_datareader role
 - **No Data Logging** - Sensitive data never logged
 ## 🔧 Troubleshooting
 ### Connection Failed
 ```bash
 # Test connectivity
 drt discover --server <YOUR_SERVER> --database master
 # Verify ODBC driver
 odbcinst -q -d
 # Check permissions
 # User needs db_datareader role on target databases
 ```
 ### Query Timeout
 Increase timeout in configuration:
 ```yaml
 baseline:
  timeout:
    query: 600  # 10 minutes
 ```
 ### Linux Kerberos Issues
 ```bash
 # Check ticket
 klist
 # Renew if expired
 kinit username@YOUR_DOMAIN.COM
 # Verify ticket is valid
 klist
 ```
 ## ⚡ Performance
 ### Diagnostic Logging
 Enable verbose mode to see detailed timing:
 ```bash
 drt compare --config config.yaml --verbose
 ```
 This shows:
 - Per-check timing (existence, row count, schema, aggregates)
 - Query execution times
 - Parallelization opportunities
 ### Optimization Tips
 - Disable aggregate checks for surrogate keys
 - Increase query timeouts for large tables
 - Use table filtering to focus on critical tables
 - Consider parallel execution for multiple database pairs
 ## 👨‍💻 Development
 ### Getting Started
 1. Fork the repository on GitHub
 2. Clone your fork locally:
   ```bash
   git clone https://github.com/your-username/data_regression_testing.git
   cd data_regression_testing
   ```
 3. Create a virtual environment:
   ```bash
   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   ```
 4. Install dependencies:
   ```bash
   pip install -r requirements.txt
   pip install -e .
   ```
 5. Install development dependencies:
   ```bash
   pip install pytest pytest-cov black flake8 mypy
   ```
 ### Development Workflow
 #### 1. Create a Branch
 ```bash
 git checkout -b feature/your-feature-name
 # or
 git checkout -b bugfix/issue-description
 ```
 #### 2. Make Your Changes
 - Write clean, readable code
 - Follow the existing code style
 - Add docstrings to all functions and classes
 - Update documentation as needed
 #### 3. Run Tests
 ```bash
 # All tests
 pytest
 # With coverage
 pytest --cov=src/drt --cov-report=html
 # Specific test file
 pytest tests/test_models.py
 ```
 #### 4. Code Quality Checks
 ```bash
 # Format code with black
 black src/ tests/
 # Check code style with flake8
 flake8 src/ tests/
 # Type checking with mypy
 mypy src/
 ```
 #### 5. Commit Your Changes
 Write clear, descriptive commit messages:
 ```bash
 git add .
 git commit -m "Add feature: description of your changes"
 ```
 **Commit message guidelines:**
 - Use present tense ("Add feature" not "Added feature")
 - Use imperative mood ("Move cursor to..." not "Moves cursor to...")
 - Limit first line to 72 characters
 - Reference issues and pull requests when relevant
 #### 6. Push and Create Pull Request
 ```bash
 git push origin feature/your-feature-name
 ```
 Create a pull request on GitHub with:
 - Clear title and description
 - Reference to related issues
 - Screenshots (if applicable)
 - Test results
 ### Code Style Guidelines
 #### Python Style
 - Follow PEP 8 style guide
 - Use type hints for function parameters and return values
 - Maximum line length: 100 characters
 - Use meaningful variable and function names
 **Example:**
 ```python
 def calculate_row_count_difference(
    baseline_count: int,
    target_count: int,
    tolerance_percent: float
 ) -> tuple[bool, float]:
    """
    Calculate if row count difference is within tolerance.
    Args:
        baseline_count: Row count from baseline database
        target_count: Row count from target database
        tolerance_percent: Acceptable difference percentage
    Returns:
        Tuple of (is_within_tolerance, actual_difference_percent)
    """
    # Implementation here
    pass
 ```
 #### Documentation
 - Add docstrings to all public functions, classes, and modules
 - Use Google-style docstrings
 - Include examples in docstrings when helpful
 - Update README.md for user-facing changes
 #### Testing
 - Write unit tests for all new functionality
 - Aim for >80% code coverage
 - Use descriptive test names
 - Follow AAA pattern (Arrange, Act, Assert)
 **Example:**
 ```python
 def test_row_count_checker_exact_match():
    """Test row count checker with exact match"""
    # Arrange
    checker = RowCountChecker(tolerance_percent=0.0)
    # Act
    result = checker.check(baseline_count=1000, target_count=1000)
    # Assert
    assert result.status == Status.PASS
    assert result.baseline_value == 1000
    assert result.target_value == 1000
 ```
 ### Adding New Features
 #### New Checker Type
 To add a new comparison checker:
 1. Create new checker in `src/drt/services/checkers/`
 2. Inherit from `BaseChecker`
 3. Implement `check()` method
 4. Add new `CheckType` enum value
 5. Register in `ComparisonService`
 6. Add tests in `tests/test_checkers.py`
 7. Update documentation
 #### New Report Format
 To add a new report format:
 1. Create new reporter in `src/drt/reporting/`
 2. Implement `generate()` method
 3. Add format option to configuration
 4. Update `ReportGenerator` to use new format
 5. Add tests
 6. Update documentation
 ### Testing
 #### Unit Tests
 Run the test suite:
 ```bash
 # All tests
 pytest
 # With coverage report
 pytest --cov=src/drt --cov-report=html
 # Specific test file
 pytest tests/test_models.py -v
 # Specific test function
 pytest tests/test_models.py::test_status_enum -v
 ```
 #### Integration Tests
 Use the Docker test environment:
 ```bash
 # Start test databases
 bash test_data/setup_test_environment.sh
 # Run integration tests
 drt discover --server localhost,1433 --database TestDB_Baseline --output test.yaml
 drt compare --config config.test.yaml
 # Cleanup
 docker-compose -f docker-compose.test.yml down -v
 ```
 #### Manual Testing
 ```bash
 # Test against real databases (requires access)
 drt discover --server <YOUR_SERVER> --database <YOUR_DB> --output manual_test.yaml
 drt validate --config manual_test.yaml
 drt compare --config manual_test.yaml --dry-run
 ```
 ### Reporting Issues
 When reporting issues, please include:
 - Clear description of the problem
 - Steps to reproduce
 - Expected vs actual behavior
 - Environment details (OS, Python version, ODBC driver version)
 - Relevant logs or error messages
 - Configuration file (sanitized - remove server names/credentials)
 **Example:**
 ```markdown
 **Description:** Row count comparison fails with timeout error
 **Steps to Reproduce:**
 1. Configure comparison for large table (>1M rows)
 2. Run `drt compare --config config.yaml`
 3. Observe timeout error
 **Expected:** Comparison completes successfully
 **Actual:** Query timeout after 300 seconds
 **Environment:**
 - OS: Windows 10
 - Python: 3.9.7
 - ODBC Driver: 17 for SQL Server
 **Logs:**
 ```
 ERROR: Query timeout on table dbo.FactTable1
 ```
 ```
 ### Feature Requests
 For feature requests, please:
 - Check if feature already exists or is planned
 - Describe the use case clearly
 - Explain why it would be valuable
 - Provide examples if possible
 ### Code Review Process
 All contributions go through code review:
 1. Automated checks must pass (tests, linting)
 2. At least one maintainer approval required
 3. Address review feedback promptly
 4. Keep pull requests focused and reasonably sized
 ### Release Process
 Releases follow semantic versioning (MAJOR.MINOR.PATCH):
 - **MAJOR** - Breaking changes
 - **MINOR** - New features (backward compatible)
 - **PATCH** - Bug fixes (backward compatible)
 ### Development Tips
 #### Debugging
 ```bash
 # Enable verbose logging
 drt compare --config config.yaml --verbose
 # Use dry-run to test without execution
 drt compare --config config.yaml --dry-run
 # Check configuration validity
 drt validate --config config.yaml
 ```
 #### Performance Profiling
 ```bash
 # Enable diagnostic logging
 drt compare --config config.yaml --verbose
 # Look for timing information in logs
 grep "execution time" logs/drt_*.log
 ```
 #### Docker Development
 ```bash
 # Build and test in Docker
 docker build -t drt:dev .
 docker run -v $(pwd)/config.yaml:/app/config.yaml drt:dev compare --config /app/config.yaml
 ```
 ## 📝 License
 MIT License - see LICENSE file for details
 ## 📞 Support
 For issues and questions:
 - GitHub Issues: <repository-url>/issues
 - Check logs in `./logs/`
 - Review configuration with `drt validate`
 - Test connectivity with `drt discover`
 ## 👥 Authors
 QA Engineering Team
 ## 📌 Version
 Current version: 1.0.0
--- a/config.example.yaml
+++ b/config.example.yaml
@@ -0,0 +1,286 @@
 # Data Regression Testing Framework - Example Configuration
 # This file demonstrates all available configuration options
 # ============================================================================
 # DATABASE PAIRS
 # Define baseline (production) and target (test) database connections
 # ============================================================================
 database_pairs:
  # Example 1: Data Warehouse Comparison
  - name: "DWH_Comparison"
    enabled: true
    description: "Compare production and test data warehouse"
    baseline:
      server: "<YOUR_SERVER_NAME>"
      database: "<YOUR_BASELINE_DB>"
      timeout:
        connection: 30  # seconds
        query: 300      # seconds (5 minutes)
    target:
      server: "<YOUR_SERVER_NAME>"
      database: "<YOUR_TARGET_DB>"
      timeout:
        connection: 30
        query: 300
  # Example 2: Operational Database Comparison (disabled)
  - name: "OPS_Comparison"
    enabled: false
    description: "Compare operational databases (currently disabled)"
    baseline:
      server: "<YOUR_SERVER_NAME>"
      database: "<YOUR_BASELINE_DB_2>"
    target:
      server: "<YOUR_SERVER_NAME>"
      database: "<YOUR_TARGET_DB_2>"
 # ============================================================================
 # COMPARISON SETTINGS
 # Configure what types of comparisons to perform
 # ============================================================================
 comparison:
  # Comparison mode: "health_check" or "full"
  # - health_check: Quick validation (row counts, schema)
  # - full: Comprehensive validation (includes aggregates)
  mode: "health_check"
  # Row Count Comparison
  row_count:
    enabled: true
    tolerance_percent: 0.0  # 0% = exact match required
    # Examples:
    # 0.0 = exact match
    # 0.1 = allow 0.1% difference
    # 1.0 = allow 1% difference
  # Schema Comparison
  schema:
    enabled: true
    checks:
      column_names: true    # Verify column names match
      data_types: true      # Verify data types match
      nullable: true        # Verify nullable constraints match
      primary_keys: true    # Verify primary keys match
  # Aggregate Comparison (sums of numeric columns)
  aggregates:
    enabled: true
    tolerance_percent: 0.01  # 0.01% tolerance for rounding differences
    # Note: Only applies when mode is "full"
 # ============================================================================
 # TABLES TO COMPARE
 # List all tables to include in comparison
 # ============================================================================
 tables:
  # Example 1: Fact table with aggregates
  - schema: "dbo"
    name: "FactTable1"
    enabled: true
    expected_in_target: true
    aggregate_columns:
      - "Amount1"
      - "Amount2"
      - "Amount3"
      - "Quantity"
    notes: "Example fact table with numeric aggregates"
  # Example 2: Dimension table without aggregates
  - schema: "dbo"
    name: "DimTable1"
    enabled: true
    expected_in_target: true
    aggregate_columns: []
    notes: "Example dimension table - no numeric aggregates"
  # Example 3: Table expected to be missing in target
  - schema: "dbo"
    name: "TempTable1"
    enabled: true
    expected_in_target: false
    aggregate_columns: []
    notes: "Example temporary table - should not exist in target"
  # Example 4: Disabled table (skipped during comparison)
  - schema: "dbo"
    name: "Table4"
    enabled: false
    expected_in_target: true
    aggregate_columns: []
    notes: "Example disabled table - excluded from comparison"
  # Example 5: Table with multiple schemas
  - schema: "staging"
    name: "StagingTable1"
    enabled: true
    expected_in_target: true
    aggregate_columns:
      - "Amount"
    notes: "Example staging table"
  # Example 6: Large fact table
  - schema: "dbo"
    name: "FactTable2"
    enabled: true
    expected_in_target: true
    aggregate_columns:
      - "Amount"
      - "Fee"
      - "NetAmount"
    notes: "Example high-volume fact table"
  # Example 7: Reference data table
  - schema: "ref"
    name: "RefTable1"
    enabled: true
    expected_in_target: true
    aggregate_columns: []
    notes: "Example reference data table"
 # ============================================================================
 # REPORTING SETTINGS
 # Configure report generation and output
 # ============================================================================
 reporting:
  # Output directory for reports (use relative path or set via environment variable)
  output_dir: "./reports"
  # Output directory for investigation reports (use relative path or set via environment variable)
  investigation_dir: "./investigation_reports"
  # Report formats to generate
  formats:
    html: true   # Rich HTML report with styling
    csv: true    # CSV report for Excel/analysis
    pdf: false   # PDF report (requires weasyprint)
  # Report naming
  filename_prefix: "regression_test"
  include_timestamp: true  # Append YYYYMMDD_HHMMSS to filename
  # Report content options
  include_passed: true     # Include passed checks in report
  include_warnings: true   # Include warnings in report
  summary_only: false      # Only show summary (no details)
 # ============================================================================
 # LOGGING SETTINGS
 # Configure logging behavior
 # ============================================================================
 logging:
  # Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL
  level: "INFO"
  # Log output directory (use relative path or set via environment variable)
  output_dir: "./logs"
  # Log file naming
  filename_prefix: "drt"
  include_timestamp: true
  # Console output
  console:
    enabled: true
    level: "INFO"
    colored: true  # Use colored output (if terminal supports it)
  # File output
  file:
    enabled: true
    level: "DEBUG"
    max_size_mb: 10      # Rotate after 10MB
    backup_count: 5      # Keep 5 backup files
 # ============================================================================
 # EXECUTION SETTINGS
 # Configure execution behavior
 # ============================================================================
 execution:
  # Parallel execution (future feature)
  parallel:
    enabled: false
    max_workers: 4
  # Retry settings for transient failures
  retry:
    enabled: true
    max_attempts: 3
    delay_seconds: 5
  # Performance settings
  performance:
    batch_size: 1000        # Rows per batch for large queries
    use_nolock: true        # Use NOLOCK hints (read uncommitted)
    connection_pooling: true
 # ============================================================================
 # FILTERS
 # Global filters applied to all tables
 # ============================================================================
 filters:
  # Schema filters (include/exclude patterns)
  schemas:
    include:
      - "dbo"
      - "staging"
      - "ref"
    exclude:
      - "sys"
      - "temp"
  # Table name filters (wildcard patterns)
  tables:
    include:
      - "*"  # Include all tables
    exclude:
      - "tmp_*"      # Exclude temporary tables
      - "backup_*"   # Exclude backup tables
      - "archive_*"  # Exclude archive tables
  # Column filters for aggregate comparisons
  columns:
    exclude_patterns:
      - "*_id"       # Exclude ID columns
      - "*_key"      # Exclude key columns
      - "created_*"  # Exclude audit columns
      - "modified_*" # Exclude audit columns
 # ============================================================================
 # NOTIFICATIONS (future feature)
 # Configure notifications for test results
 # ============================================================================
 notifications:
  enabled: false
  # Email notifications
  email:
    enabled: false
    smtp_server: "smtp.company.com"
    smtp_port: 587
    from_address: "drt@company.com"
    to_addresses:
      - "qa-team@company.com"
    on_failure_only: true
  # Slack notifications
  slack:
    enabled: false
    webhook_url: "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
    channel: "#qa-alerts"
    on_failure_only: true
 # ============================================================================
 # METADATA
 # Optional metadata about this configuration
 # ============================================================================
 metadata:
  version: "1.0"
  created_by: "QA Team"
  created_date: "2024-01-15"
  description: "Standard regression test configuration for DWH migration"
  project: "DWH Migration Phase 2"
  environment: "UAT"
  tags:
    - "migration"
    - "data-quality"
    - "regression"
--- a/config.quickstart.yaml
+++ b/config.quickstart.yaml
@@ -0,0 +1,46 @@
 # Quick Start Configuration
 # Minimal configuration to get started quickly
 database_pairs:
  - name: "Quick_Test"
    enabled: true
    baseline:
      server: "YOUR_SERVER_NAME"
      database: "YOUR_BASELINE_DB"
    target:
      server: "YOUR_SERVER_NAME"
      database: "YOUR_TARGET_DB"
 comparison:
  mode: "health_check"
  row_count:
    enabled: true
    tolerance_percent: 0.0
  schema:
    enabled: true
    checks:
      column_names: true
      data_types: true
  aggregates:
    enabled: false
 tables:
  # Add your tables here after running discovery
  # Example:
  # - schema: "dbo"
  #   name: "YourTable"
  #   enabled: true
  #   expected_in_target: true
  #   aggregate_columns: []
 reporting:
  output_dir: "./reports"
  investigation_dir: "./investigation_reports"
  formats:
    html: true
    csv: true
    pdf: false
 logging:
  level: "INFO"
  output_dir: "./logs"
--- a/config.test.yaml
+++ b/config.test.yaml
@@ -0,0 +1,83 @@
 # Test Configuration for Docker SQL Server Environment
 # Use this configuration with the Docker test environment
 database_pairs:
  - name: "Docker_Test_Comparison"
    enabled: true
    description: "Compare Docker test databases"
    baseline:
      server: "localhost,1433"
      database: "TestDB_Baseline"
      # Use environment variables for credentials: DRT_DB_USERNAME, DRT_DB_PASSWORD
      # username: "${DRT_DB_USERNAME}"
      # password: "${DRT_DB_PASSWORD}"
      timeout:
        connection: 30
        query: 300
    target:
      server: "localhost,1434"
      database: "TestDB_Target"
      # Use environment variables for credentials: DRT_DB_USERNAME, DRT_DB_PASSWORD
      # username: "${DRT_DB_USERNAME}"
      # password: "${DRT_DB_PASSWORD}"
      timeout:
        connection: 30
        query: 300
 comparison:
  mode: "health_check"
  row_count:
    enabled: true
    tolerance_percent: 0.0
  schema:
    enabled: true
    checks:
      column_names: true
      data_types: true
  aggregates:
    enabled: true
    tolerance_percent: 0.01
 tables:
  - schema: "dbo"
    name: "DimTable1"
    enabled: true
    expected_in_target: true
    aggregate_columns: []
    notes: "Example dimension table"
  - schema: "dbo"
    name: "DimTable2"
    enabled: true
    expected_in_target: true
    aggregate_columns: []
    notes: "Example dimension table with schema differences"
  - schema: "dbo"
    name: "FactTable1"
    enabled: true
    expected_in_target: true
    aggregate_columns:
      - "Quantity"
      - "Amount"
      - "Tax"
    notes: "Example fact table with numeric aggregates"
  - schema: "dbo"
    name: "TempTable1"
    enabled: true
    expected_in_target: false
    aggregate_columns: []
    notes: "Example temporary table - only exists in target"
 reporting:
  output_directory: "/home/user/reports"
  investigation_directory: "/home/user/investigation_reports"
  formats: ["html", "csv"]
  filename_template: "test_regression_{timestamp}"
 logging:
  level: "INFO"
  directory: "/home/user/logs"
  filename_template: "drt_test_{timestamp}.log"
  console: true
--- a/config/.gitkeep
+++ b/config/.gitkeep
--- a/docker-compose.test.yml
+++ b/docker-compose.test.yml
@@ -0,0 +1,52 @@
 version: '3.8'
 services:
  # SQL Server 2022 - Baseline (Production)
  sqlserver-baseline:
    image: mcr.microsoft.com/mssql/server:2022-latest
    container_name: drt-sqlserver-baseline
    environment:
      - ACCEPT_EULA=Y
      - SA_PASSWORD=${SA_PASSWORD:-YourStrong!Passw0rd}
      - MSSQL_PID=Developer
    ports:
      - "1433:1433"
    volumes:
      - ./test_data/init_baseline.sql:/docker-entrypoint-initdb.d/init.sql
      - sqlserver_baseline_data:/var/opt/mssql
    healthcheck:
      test: /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P ${SA_PASSWORD:-YourStrong!Passw0rd} -Q "SELECT 1"
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - drt-network
  # SQL Server 2022 - Target (Test)
  sqlserver-target:
    image: mcr.microsoft.com/mssql/server:2022-latest
    container_name: drt-sqlserver-target
    environment:
      - ACCEPT_EULA=Y
      - SA_PASSWORD=${SA_PASSWORD:-YourStrong!Passw0rd}
      - MSSQL_PID=Developer
    ports:
      - "1434:1433"
    volumes:
      - ./test_data/init_target.sql:/docker-entrypoint-initdb.d/init.sql
      - sqlserver_target_data:/var/opt/mssql
    healthcheck:
      test: /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P ${SA_PASSWORD:-YourStrong!Passw0rd} -Q "SELECT 1"
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - drt-network
 volumes:
  sqlserver_baseline_data:
  sqlserver_target_data:
 networks:
  drt-network:
    driver: bridge
--- a/install_docker_debian.sh
+++ b/install_docker_debian.sh
@@ -0,0 +1,121 @@
 #!/bin/bash
 # Docker Installation Script for Debian 12
 set -e
 echo "=========================================="
 echo "Docker Installation for Debian 12"
 echo "=========================================="
 echo ""
 # Check if running as root
 if [ "$EUID" -ne 0 ]; then 
    echo "Please run with sudo: sudo bash install_docker_debian.sh"
    exit 1
 fi
 # Detect OS
 if [ -f /etc/os-release ]; then
    . /etc/os-release
    OS=$ID
    VER=$VERSION_ID
    echo "Detected OS: $PRETTY_NAME"
 else
    echo "Cannot detect OS version"
    exit 1
 fi
 # Remove old versions
 echo ""
 echo "Step 1: Removing old Docker versions (if any)..."
 apt-get remove -y docker docker-engine docker.io containerd runc 2>/dev/null || true
 # Install prerequisites
 echo ""
 echo "Step 2: Installing prerequisites..."
 apt-get update
 apt-get install -y \
    ca-certificates \
    curl \
    gnupg \
    lsb-release
 # Add Docker's official GPG key
 echo ""
 echo "Step 3: Adding Docker GPG key..."
 install -m 0755 -d /etc/apt/keyrings
 curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
 chmod a+r /etc/apt/keyrings/docker.gpg
 # Set up Docker repository
 echo ""
 echo "Step 4: Adding Docker repository..."
 echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  tee /etc/apt/sources.list.d/docker.list > /dev/null
 # Install Docker Engine
 echo ""
 echo "Step 5: Installing Docker Engine..."
 apt-get update
 apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
 # Start Docker service
 echo ""
 echo "Step 6: Starting Docker service..."
 systemctl start docker
 systemctl enable docker
 # Add current user to docker group (if not root)
 if [ -n "$SUDO_USER" ]; then
    echo ""
    echo "Step 7: Adding user $SUDO_USER to docker group..."
    usermod -aG docker $SUDO_USER
    echo "Note: You'll need to log out and back in for group changes to take effect"
 fi
 # Verify installation
 echo ""
 echo "Step 8: Verifying Docker installation..."
 if docker --version; then
    echo "✓ Docker installed successfully"
 else
    echo "✗ Docker installation failed"
    exit 1
 fi
 if docker compose version; then
    echo "✓ Docker Compose installed successfully"
 else
    echo "✗ Docker Compose installation failed"
    exit 1
 fi
 # Test Docker
 echo ""
 echo "Step 9: Testing Docker..."
 if docker run --rm hello-world > /dev/null 2>&1; then
    echo "✓ Docker is working correctly"
 else
    echo "⚠ Docker test failed - you may need to log out and back in"
 fi
 echo ""
 echo "=========================================="
 echo "Installation completed successfully!"
 echo "=========================================="
 echo ""
 echo "Docker version:"
 docker --version
 echo ""
 echo "Docker Compose version:"
 docker compose version
 echo ""
 echo "IMPORTANT: If you're not root, log out and back in for group changes to take effect"
 echo ""
 echo "Next steps:"
 echo "1. Log out and back in (or run: newgrp docker)"
 echo "2. Test Docker: docker run hello-world"
 echo "3. Set up test environment: bash test_data/setup_test_environment.sh"
 echo ""
--- a/install_odbc_debian.sh
+++ b/install_odbc_debian.sh
@@ -0,0 +1,112 @@
 #!/bin/bash
 # ODBC Driver Installation Script for Debian 12
 # This script installs Microsoft ODBC Driver 18 for SQL Server
 set -e
 echo "=========================================="
 echo "ODBC Driver Installation for Debian 12"
 echo "=========================================="
 echo ""
 # Check if running as root
 if [ "$EUID" -ne 0 ]; then 
    echo "Please run with sudo: sudo bash install_odbc_debian.sh"
    exit 1
 fi
 # Detect OS
 if [ -f /etc/os-release ]; then
    . /etc/os-release
    OS=$ID
    VER=$VERSION_ID
    echo "Detected OS: $PRETTY_NAME"
 else
    echo "Cannot detect OS version"
    exit 1
 fi
 # Clean up any corrupted repository files
 echo ""
 echo "Step 1: Cleaning up any previous installation attempts..."
 if [ -f /etc/apt/sources.list.d/mssql-release.list ]; then
    echo "Removing corrupted mssql-release.list..."
    rm -f /etc/apt/sources.list.d/mssql-release.list
 fi
 # Install prerequisites
 echo ""
 echo "Step 2: Installing prerequisites..."
 apt-get update
 apt-get install -y curl gnupg2 apt-transport-https ca-certificates
 # Add Microsoft GPG key
 echo ""
 echo "Step 3: Adding Microsoft GPG key..."
 curl -fsSL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor -o /usr/share/keyrings/microsoft-prod.gpg
 # Add Microsoft repository based on OS
 echo ""
 echo "Step 4: Adding Microsoft repository..."
 if [ "$OS" = "debian" ]; then
    if [ "$VER" = "12" ]; then
        curl https://packages.microsoft.com/config/debian/12/prod.list | tee /etc/apt/sources.list.d/mssql-release.list
    elif [ "$VER" = "11" ]; then
        curl https://packages.microsoft.com/config/debian/11/prod.list | tee /etc/apt/sources.list.d/mssql-release.list
    else
        echo "Unsupported Debian version: $VER"
        exit 1
    fi
 elif [ "$OS" = "ubuntu" ]; then
    curl https://packages.microsoft.com/config/ubuntu/$VER/prod.list | tee /etc/apt/sources.list.d/mssql-release.list
 else
    echo "Unsupported OS: $OS"
    exit 1
 fi
 # Update package list
 echo ""
 echo "Step 5: Updating package list..."
 apt-get update
 # Install ODBC Driver
 echo ""
 echo "Step 6: Installing ODBC Driver 18 for SQL Server..."
 ACCEPT_EULA=Y apt-get install -y msodbcsql18
 # Install unixODBC development headers
 echo ""
 echo "Step 7: Installing unixODBC development headers..."
 apt-get install -y unixodbc-dev
 # Verify installation
 echo ""
 echo "Step 8: Verifying installation..."
 if odbcinst -q -d -n "ODBC Driver 18 for SQL Server" > /dev/null 2>&1; then
    echo "✓ ODBC Driver 18 for SQL Server installed successfully"
    odbcinst -q -d -n "ODBC Driver 18 for SQL Server"
 else
    echo "✗ ODBC Driver installation failed"
    exit 1
 fi
 # Check for ODBC Driver 17 as fallback
 if odbcinst -q -d -n "ODBC Driver 17 for SQL Server" > /dev/null 2>&1; then
    echo "✓ ODBC Driver 17 for SQL Server also available"
 fi
 echo ""
 echo "=========================================="
 echo "Installation completed successfully!"
 echo "=========================================="
 echo ""
 echo "Next steps:"
 echo "1. Install Python dependencies: pip install -r requirements.txt"
 echo "2. Install the framework: pip install -e ."
 echo "3. Test the installation: drt --version"
 echo ""
 echo "For Windows Authentication, you'll also need to:"
 echo "1. Install Kerberos: apt-get install -y krb5-user"
 echo "2. Configure /etc/krb5.conf with your domain settings"
 echo "3. Get a Kerberos ticket: kinit username@YOUR_DOMAIN.COM"
 echo ""
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,73 @@
 [project]
 name = "data-regression-tester"
 version = "1.0.0"
 description = "Data Regression Testing Framework for SQL Server"
 readme = "README.md"
 requires-python = ">=3.9"
 license = {text = "MIT"}
 authors = [
    {name = "QA Engineering Team"}
 ]
 keywords = ["data", "regression", "testing", "sql-server", "comparison"]
 classifiers = [
    "Development Status :: 4 - Beta",
    "Environment :: Console",
    "Intended Audience :: Developers",
    "Operating System :: Microsoft :: Windows",
    "Programming Language :: Python :: 3.9",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Topic :: Database",
    "Topic :: Software Development :: Testing",
 ]
 dependencies = [
    "pandas>=2.0",
    "sqlalchemy>=2.0",
    "pyodbc>=4.0",
    "pyyaml>=6.0",
    "pydantic>=2.0",
    "click>=8.0",
    "rich>=13.0",
    "jinja2>=3.0",
    "weasyprint>=60.0",
 ]
 [project.optional-dependencies]
 dev = [
    "pytest>=7.0",
    "pytest-cov>=4.0",
    "black>=23.0",
    "ruff>=0.1.0",
    "mypy>=1.0",
    "pre-commit>=3.0",
 ]
 [project.scripts]
 drt = "drt.cli.main:cli"
 [build-system]
 requires = ["setuptools>=61.0", "wheel"]
 build-backend = "setuptools.build_meta"
 [tool.setuptools.packages.find]
 where = ["src"]
 [tool.black]
 line-length = 100
 target-version = ["py39", "py310", "py311", "py312"]
 [tool.ruff]
 line-length = 100
 select = ["E", "F", "W", "I", "N", "UP", "B", "C4"]
 [tool.mypy]
 python_version = "3.9"
 warn_return_any = true
 warn_unused_configs = true
 ignore_missing_imports = true
 [tool.pytest.ini_options]
 testpaths = ["tests"]
 addopts = "-v --cov=drt --cov-report=term-missing"
--- a/pytest.ini
+++ b/pytest.ini
@@ -0,0 +1,14 @@
 [pytest]
 testpaths = tests
 python_files = test_*.py
 python_classes = Test*
 python_functions = test_*
 addopts = 
    -v
    --strict-markers
    --tb=short
    --disable-warnings
 markers =
    unit: Unit tests
    integration: Integration tests
    slow: Slow running tests
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,9 @@
 pandas>=2.0
 sqlalchemy>=2.0
 pyodbc>=4.0
 pyyaml>=6.0
 pydantic>=2.0
 click>=8.0
 rich>=13.0
 jinja2>=3.0
 weasyprint>=60.0
--- a/src/drt/init.py
+++ b/src/drt/init.py
@@ -0,0 +1,14 @@
 """
 Data Regression Testing Framework
 A comprehensive framework for validating data integrity during code migration
 and system updates by comparing data outputs between Baseline (Production)
 and Target (Test) SQL Server databases.
 """
 __version__ = "1.0.0"
 __author__ = "QA Engineering Team"
 from drt.models.enums import Status, CheckType
 __all__ = ["__version__", "__author__", "Status", "CheckType"]
--- a/src/drt/main.py
+++ b/src/drt/main.py
@@ -0,0 +1,11 @@
 """
 Entry point for running the framework as a module.
 Usage:
    python -m drt <command> [options]
 """
 from drt.cli.main import cli
 if __name__ == "__main__":
    cli()
--- a/src/drt/cli/init.py
+++ b/src/drt/cli/init.py
@@ -0,0 +1,5 @@
 """Command-line interface for the framework."""
 from drt.cli.main import cli
 __all__ = ["cli"]
--- a/src/drt/cli/commands/init.py
+++ b/src/drt/cli/commands/init.py
@@ -0,0 +1,5 @@
 """CLI commands."""
 from drt.cli.commands import discover, compare, validate, investigate
 __all__ = ["discover", "compare", "validate", "investigate"]
--- a/src/drt/cli/commands/compare.py
+++ b/src/drt/cli/commands/compare.py
@@ -0,0 +1,137 @@
 """Compare command implementation."""
 import click
 import sys
 from pathlib import Path
 from drt.config.loader import load_config
 from drt.services.comparison import ComparisonService
 from drt.reporting.generator import ReportGenerator
 from drt.utils.logging import setup_logging, get_logger
 from drt.utils.timestamps import format_duration
 logger = get_logger(__name__)
@click.command()
@click.option('--config', '-c', required=True, type=click.Path(exists=True), help='Configuration file path')
@click.option('--verbose', '-v', is_flag=True, help='Enable verbose output')
@click.option('--dry-run', is_flag=True, help='Show what would be compared without executing')
 def compare(config, verbose, dry_run):
    """
    Execute comparison between Baseline and Target databases.
    Compares configured tables between baseline and target databases,
    checking for data regression issues.
    Example:
        drt compare --config ./config.yaml
    """
    # Load config first to get log directory
    from drt.config.loader import load_config
    cfg = load_config(config)
    # Setup logging using config
    log_level = "DEBUG" if verbose else "INFO"
    log_dir = cfg.logging.directory
    setup_logging(log_level=log_level, log_dir=log_dir, log_to_file=not dry_run)
    click.echo("=" * 60)
    click.echo("Data Regression Testing Framework")
    click.echo("=" * 60)
    click.echo()
    try:
        # Load configuration
        click.echo(f"Loading configuration: {config}")
        cfg = load_config(config)
        click.echo(f"✓ Configuration loaded")
        click.echo(f"  Database pairs: {len(cfg.database_pairs)}")
        click.echo(f"  Tables configured: {len(cfg.tables)}")
        click.echo()
        if dry_run:
            click.echo("=" * 60)
            click.echo("DRY RUN - Preview Only")
            click.echo("=" * 60)
            for pair in cfg.database_pairs:
                if not pair.enabled:
                    continue
                click.echo(f"\nDatabase Pair: {pair.name}")
                click.echo(f"  Baseline: {pair.baseline.server}.{pair.baseline.database}")
                click.echo(f"  Target:   {pair.target.server}.{pair.target.database}")
                # Count enabled tables
                enabled_tables = [t for t in cfg.tables if t.enabled]
                click.echo(f"  Tables to compare: {len(enabled_tables)}")
            click.echo("\n" + "=" * 60)
            click.echo("Use without --dry-run to execute comparison")
            click.echo("=" * 60)
            sys.exit(0)
        # Execute comparison for each database pair
        all_summaries = []
        for pair in cfg.database_pairs:
            if not pair.enabled:
                click.echo(f"Skipping disabled pair: {pair.name}")
                continue
            click.echo(f"Comparing: {pair.name}")
            click.echo(f"  Baseline: {pair.baseline.server}.{pair.baseline.database}")
            click.echo(f"  Target:   {pair.target.server}.{pair.target.database}")
            click.echo()
            # Run comparison
            comparison_service = ComparisonService(cfg)
            summary = comparison_service.run_comparison(pair)
            all_summaries.append(summary)
            click.echo()
        # Generate reports for all summaries
        if all_summaries:
            click.echo("=" * 60)
            click.echo("Generating Reports")
            click.echo("=" * 60)
            report_gen = ReportGenerator(cfg)
            for summary in all_summaries:
                report_files = report_gen.generate_reports(summary)
                for filepath in report_files:
                    click.echo(f"  ✓ {filepath}")
            click.echo()
        # Display final summary
        click.echo("=" * 60)
        click.echo("EXECUTION COMPLETE")
        click.echo("=" * 60)
        total_passed = sum(s.passed for s in all_summaries)
        total_failed = sum(s.failed for s in all_summaries)
        total_warnings = sum(s.warnings for s in all_summaries)
        total_errors = sum(s.errors for s in all_summaries)
        click.echo(f"  PASS:    {total_passed:3d}")
        click.echo(f"  FAIL:    {total_failed:3d}")
        click.echo(f"  WARNING: {total_warnings:3d}")
        click.echo(f"  ERROR:   {total_errors:3d}")
        click.echo("=" * 60)
        # Exit with appropriate code
        if total_errors > 0 or total_failed > 0:
            click.echo("Status: FAILED ❌")
            sys.exit(1)
        else:
            click.echo("Status: PASSED ✓")
            sys.exit(0)
    except Exception as e:
        logger.error(f"Comparison failed: {e}", exc_info=verbose)
        click.echo(f"✗ Error: {e}", err=True)
        sys.exit(2)
--- a/src/drt/cli/commands/discover.py
+++ b/src/drt/cli/commands/discover.py
@@ -0,0 +1,118 @@
 """Discovery command implementation."""
 import click
 import sys
 from drt.services.discovery import DiscoveryService
 from drt.config.models import ConnectionConfig, Config
 from drt.config.loader import save_config
 from drt.utils.logging import setup_logging, get_logger
 logger = get_logger(__name__)
@click.command()
@click.option('--server', required=True, help='SQL Server hostname or instance')
@click.option('--database', required=True, help='Database name to discover')
@click.option('--output', '-o', default='./config_discovered.yaml', help='Output configuration file')
@click.option('--schemas', multiple=True, help='Specific schemas to include (can specify multiple)')
@click.option('--verbose', '-v', is_flag=True, help='Enable verbose output')
 def discover(server, database, output, schemas, verbose):
    """
    Discover tables and generate configuration file.
    Scans the specified database and automatically generates a configuration
    file with all discovered tables, columns, and metadata.
    Example:
        drt discover --server SQLSERVER01 --database ORBIS_DWH_PROD
    """
    # Setup logging
    log_level = "DEBUG" if verbose else "INFO"
    setup_logging(log_level=log_level)
    click.echo("=" * 60)
    click.echo("Data Regression Testing Framework - Discovery Mode")
    click.echo("=" * 60)
    click.echo()
    try:
        # Create connection config
        conn_config = ConnectionConfig(
            server=server,
            database=database
        )
        # Create base config with schema filters if provided
        config = Config()
        if schemas:
            config.discovery.include_schemas = list(schemas)
        # Initialize discovery service
        click.echo(f"Connecting to {server}.{database}...")
        discovery_service = DiscoveryService(conn_config, config)
        # Test connection
        if not discovery_service.conn_mgr.test_connection():
            click.echo("✗ Connection failed", err=True)
            sys.exit(2)
        click.echo("✓ Connected (Windows Authentication)")
        click.echo()
        # Discover tables
        click.echo("Scanning tables...")
        tables = discovery_service.discover_tables()
        if not tables:
            click.echo("⚠ No tables found", err=True)
            sys.exit(0)
        click.echo(f"✓ Found {len(tables)} tables")
        click.echo()
        # Generate configuration
        click.echo("Generating configuration...")
        generated_config = discovery_service.generate_config(tables)
        # Save configuration
        save_config(generated_config, output)
        click.echo(f"✓ Configuration saved to: {output}")
        click.echo()
        # Display summary
        click.echo("=" * 60)
        click.echo("Discovery Summary")
        click.echo("=" * 60)
        click.echo(f"  Tables discovered: {len(tables)}")
        # Count columns
        total_cols = sum(len(t.columns) for t in tables)
        click.echo(f"  Total columns:     {total_cols}")
        # Count numeric columns
        numeric_cols = sum(len(t.aggregate_columns) for t in tables)
        click.echo(f"  Numeric columns:   {numeric_cols}")
        # Show largest tables
        if tables:
            largest = sorted(tables, key=lambda t: t.estimated_row_count, reverse=True)[:3]
            click.echo()
            click.echo("  Largest tables:")
            for table in largest:
                click.echo(f"    • {table.full_name:40s} {table.estimated_row_count:>12,} rows")
        click.echo()
        click.echo("=" * 60)
        click.echo("Next Steps:")
        click.echo(f"  1. Review {output}")
        click.echo("  2. Configure target database connection")
        click.echo("  3. Set 'expected_in_target: false' for tables being removed")
        click.echo(f"  4. Run: drt compare --config {output}")
        click.echo("=" * 60)
        sys.exit(0)
    except Exception as e:
        logger.error(f"Discovery failed: {e}", exc_info=verbose)
        click.echo(f"✗ Error: {e}", err=True)
        sys.exit(2)
--- a/src/drt/cli/commands/investigate.py
+++ b/src/drt/cli/commands/investigate.py
@@ -0,0 +1,177 @@
 """Investigate command implementation."""
 import click
 import sys
 from pathlib import Path
 from drt.config.loader import load_config
 from drt.services.investigation import InvestigationService
 from drt.reporting.investigation_report import (
    InvestigationHTMLReportGenerator,
    InvestigationCSVReportGenerator
 )
 from drt.utils.logging import setup_logging, get_logger
 from drt.utils.timestamps import get_timestamp
 logger = get_logger(__name__)
@click.command()
@click.option('--analysis-dir', '-a', required=True, type=click.Path(exists=True),
              help='Analysis output directory containing *_investigate.sql files')
@click.option('--config', '-c', required=True, type=click.Path(exists=True),
              help='Configuration file path')
@click.option('--output-dir', '-o', default=None,
              help='Output directory for reports (overrides config setting)')
@click.option('--verbose', '-v', is_flag=True, help='Enable verbose output')
@click.option('--dry-run', is_flag=True, help='Show what would be executed without running')
 def investigate(analysis_dir, config, output_dir, verbose, dry_run):
    """
    Execute investigation queries from regression analysis.
    Processes all *_investigate.sql files in the analysis directory,
    executes queries on both baseline and target databases, and
    generates comprehensive reports.
    Example:
        drt investigate -a /home/user/analysis/output_20251209_184032/ -c config.yaml
    """
    # Load config first to get log directory
    from drt.config.loader import load_config
    cfg = load_config(config)
    # Setup logging using config
    log_level = "DEBUG" if verbose else "INFO"
    log_dir = cfg.logging.directory
    setup_logging(log_level=log_level, log_dir=log_dir, log_to_file=not dry_run)
    click.echo("=" * 60)
    click.echo("Data Regression Testing Framework - Investigation")
    click.echo("=" * 60)
    click.echo()
    try:
        # Use output_dir from CLI if provided, otherwise use config
        if output_dir is None:
            output_dir = cfg.reporting.investigation_directory
        click.echo(f"✓ Configuration loaded")
        click.echo(f"  Database pairs: {len(cfg.database_pairs)}")
        click.echo()
        # Convert paths
        analysis_path = Path(analysis_dir)
        output_path = Path(output_dir)
        # Create output directory
        output_path.mkdir(parents=True, exist_ok=True)
        if dry_run:
            click.echo("=" * 60)
            click.echo("DRY RUN - Preview Only")
            click.echo("=" * 60)
            # Discover SQL files
            from drt.services.sql_parser import discover_sql_files
            sql_files = discover_sql_files(analysis_path)
            click.echo(f"\nAnalysis Directory: {analysis_path}")
            click.echo(f"Found {len(sql_files)} investigation SQL files")
            if sql_files:
                click.echo("\nTables with investigation queries:")
                for schema, table, sql_path in sql_files[:10]:  # Show first 10
                    click.echo(f"  • {schema}.{table}")
                if len(sql_files) > 10:
                    click.echo(f"  ... and {len(sql_files) - 10} more")
            for pair in cfg.database_pairs:
                if not pair.enabled:
                    continue
                click.echo(f"\nDatabase Pair: {pair.name}")
                click.echo(f"  Baseline: {pair.baseline.server}.{pair.baseline.database}")
                click.echo(f"  Target:   {pair.target.server}.{pair.target.database}")
            click.echo(f"\nReports would be saved to: {output_path}")
            click.echo("\n" + "=" * 60)
            click.echo("Use without --dry-run to execute investigation")
            click.echo("=" * 60)
            sys.exit(0)
        # Execute investigation for each database pair
        all_summaries = []
        for pair in cfg.database_pairs:
            if not pair.enabled:
                click.echo(f"Skipping disabled pair: {pair.name}")
                continue
            click.echo(f"Investigating: {pair.name}")
            click.echo(f"  Baseline: {pair.baseline.server}.{pair.baseline.database}")
            click.echo(f"  Target:   {pair.target.server}.{pair.target.database}")
            click.echo()
            # Run investigation
            investigation_service = InvestigationService(cfg)
            summary = investigation_service.run_investigation(analysis_path, pair)
            all_summaries.append(summary)
            click.echo()
        # Generate reports for all summaries
        if all_summaries:
            click.echo("=" * 60)
            click.echo("Generating Reports")
            click.echo("=" * 60)
            for summary in all_summaries:
                timestamp = get_timestamp()
                # Generate HTML report
                html_gen = InvestigationHTMLReportGenerator(cfg)
                html_path = output_path / f"investigation_report_{timestamp}.html"
                html_gen.generate(summary, html_path)
                click.echo(f"  ✓ HTML: {html_path}")
                # Generate CSV report
                csv_gen = InvestigationCSVReportGenerator(cfg)
                csv_path = output_path / f"investigation_report_{timestamp}.csv"
                csv_gen.generate(summary, csv_path)
                click.echo(f"  ✓ CSV:  {csv_path}")
            click.echo()
        # Display final summary
        click.echo("=" * 60)
        click.echo("INVESTIGATION COMPLETE")
        click.echo("=" * 60)
        total_processed = sum(s.tables_processed for s in all_summaries)
        total_successful = sum(s.tables_successful for s in all_summaries)
        total_partial = sum(s.tables_partial for s in all_summaries)
        total_failed = sum(s.tables_failed for s in all_summaries)
        total_queries = sum(s.total_queries_executed for s in all_summaries)
        click.echo(f"  Tables Processed:  {total_processed:3d}")
        click.echo(f"  Successful:        {total_successful:3d}")
        click.echo(f"  Partial:           {total_partial:3d}")
        click.echo(f"  Failed:            {total_failed:3d}")
        click.echo(f"  Total Queries:     {total_queries:3d}")
        click.echo("=" * 60)
        # Exit with appropriate code
        if total_failed > 0:
            click.echo("Status: COMPLETED WITH FAILURES ⚠️")
            sys.exit(1)
        elif total_partial > 0:
            click.echo("Status: COMPLETED WITH PARTIAL RESULTS ◐")
            sys.exit(0)
        else:
            click.echo("Status: SUCCESS ✓")
            sys.exit(0)
    except Exception as e:
        logger.error(f"Investigation failed: {e}", exc_info=verbose)
        click.echo(f"✗ Error: {e}", err=True)
        sys.exit(2)
--- a/src/drt/cli/commands/validate.py
+++ b/src/drt/cli/commands/validate.py
@@ -0,0 +1,92 @@
 """Validate command implementation."""
 import click
 import sys
 from drt.config.loader import load_config
 from drt.config.validator import validate_config
 from drt.utils.logging import setup_logging, get_logger
 logger = get_logger(__name__)
@click.command()
@click.option('--config', '-c', required=True, type=click.Path(exists=True), help='Configuration file path')
@click.option('--verbose', '-v', is_flag=True, help='Enable verbose output')
 def validate(config, verbose):
    """
    Validate configuration file without running comparison.
    Checks configuration for completeness and correctness, reporting
    any errors or warnings.
    Example:
        drt validate --config ./config.yaml
    """
    # Setup logging
    log_level = "DEBUG" if verbose else "INFO"
    setup_logging(log_level=log_level, log_to_console=True, log_to_file=False)
    click.echo("=" * 60)
    click.echo("Configuration Validation")
    click.echo("=" * 60)
    click.echo()
    try:
        # Load configuration
        click.echo(f"Loading: {config}")
        cfg = load_config(config)
        click.echo("✓ YAML syntax valid")
        click.echo("✓ Configuration structure valid")
        click.echo()
        # Validate configuration
        click.echo("Validating configuration...")
        is_valid, errors = validate_config(cfg)
        if errors:
            click.echo()
            click.echo("Validation Errors:")
            for error in errors:
                click.echo(f"  ✗ {error}", err=True)
            click.echo()
        # Display configuration summary
        click.echo("=" * 60)
        click.echo("Configuration Summary")
        click.echo("=" * 60)
        click.echo(f"  Database pairs:    {len(cfg.database_pairs)}")
        click.echo(f"  Tables configured: {len(cfg.tables)}")
        click.echo(f"  Enabled tables:    {sum(1 for t in cfg.tables if t.enabled)}")
        click.echo(f"  Disabled tables:   {sum(1 for t in cfg.tables if not t.enabled)}")
        click.echo()
        # Check for tables not expected in target
        not_expected = sum(1 for t in cfg.tables if not t.expected_in_target)
        if not_expected > 0:
            click.echo(f"  ⚠ {not_expected} table(s) marked as expected_in_target: false")
        # Display database pairs
        click.echo()
        click.echo("Database Pairs:")
        for pair in cfg.database_pairs:
            status = "✓" if pair.enabled else "○"
            click.echo(f"  {status} {pair.name}")
            click.echo(f"      Baseline: {pair.baseline.server}.{pair.baseline.database}")
            click.echo(f"      Target:   {pair.target.server}.{pair.target.database}")
        click.echo()
        click.echo("=" * 60)
        if is_valid:
            click.echo("Configuration is VALID ✓")
            click.echo("=" * 60)
            sys.exit(0)
        else:
            click.echo("Configuration is INVALID ✗")
            click.echo("=" * 60)
            sys.exit(1)
    except Exception as e:
        logger.error(f"Validation failed: {e}", exc_info=verbose)
        click.echo(f"✗ Error: {e}", err=True)
        sys.exit(2)
--- a/src/drt/cli/main.py
+++ b/src/drt/cli/main.py
@@ -0,0 +1,52 @@
 """Main CLI entry point."""
 import click
 import sys
 from drt import __version__
 from drt.cli.commands import discover, compare, validate, investigate
 from drt.utils.logging import setup_logging
@click.group()
@click.version_option(version=__version__, prog_name="drt")
@click.option('--verbose', '-v', is_flag=True, help='Enable verbose output')
@click.pass_context
 def cli(ctx, verbose):
    """
    Data Regression Testing Framework
    A comprehensive framework for validating data integrity during code migration
    and system updates by comparing data outputs between Baseline (Production)
    and Target (Test) SQL Server databases.
    """
    ctx.ensure_object(dict)
    ctx.obj['verbose'] = verbose
    # Setup logging
    log_level = "DEBUG" if verbose else "INFO"
    setup_logging(log_level=log_level, log_to_console=True, log_to_file=False)
@cli.command()
 def version():
    """Display version information."""
    import platform
    click.echo("=" * 60)
    click.echo("Data Regression Testing Framework")
    click.echo("=" * 60)
    click.echo(f"Version:  {__version__}")
    click.echo(f"Python:   {platform.python_version()}")
    click.echo(f"Platform: {platform.platform()}")
    click.echo("=" * 60)
 # Register commands
 cli.add_command(discover.discover)
 cli.add_command(compare.compare)
 cli.add_command(validate.validate)
 cli.add_command(investigate.investigate)
 if __name__ == '__main__':
    cli()
--- a/src/drt/config/init.py
+++ b/src/drt/config/init.py
@@ -0,0 +1,7 @@
 """Configuration management for the framework."""
 from drt.config.loader import load_config
 from drt.config.validator import validate_config
 from drt.config.models import Config
 __all__ = ["load_config", "validate_config", "Config"]
--- a/src/drt/config/loader.py
+++ b/src/drt/config/loader.py
@@ -0,0 +1,84 @@
 """Configuration file loader."""
 import yaml
 from pathlib import Path
 from typing import Union
 from drt.config.models import Config
 from drt.utils.logging import get_logger
 logger = get_logger(__name__)
 def load_config(config_path: Union[str, Path]) -> Config:
    """
    Load configuration from YAML file.
    Args:
        config_path: Path to configuration file
    Returns:
        Parsed configuration object
    Raises:
        FileNotFoundError: If config file doesn't exist
        yaml.YAMLError: If YAML is invalid
        ValueError: If configuration is invalid
    """
    config_path = Path(config_path)
    if not config_path.exists():
        raise FileNotFoundError(f"Configuration file not found: {config_path}")
    logger.info(f"Loading configuration from: {config_path}")
    try:
        with open(config_path, "r", encoding="utf-8") as f:
            config_data = yaml.safe_load(f)
        if not config_data:
            raise ValueError("Configuration file is empty")
        # Parse with Pydantic
        config = Config(**config_data)
        logger.info(f"Configuration loaded successfully")
        logger.info(f"  Database pairs: {len(config.database_pairs)}")
        logger.info(f"  Tables configured: {len(config.tables)}")
        return config
    except yaml.YAMLError as e:
        logger.error(f"YAML parsing error: {e}")
        raise
    except Exception as e:
        logger.error(f"Configuration loading error: {e}")
        raise
 def save_config(config: Config, output_path: Union[str, Path]) -> None:
    """
    Save configuration to YAML file.
    Args:
        config: Configuration object to save
        output_path: Path where to save the configuration
    """
    output_path = Path(output_path)
    output_path.parent.mkdir(parents=True, exist_ok=True)
    logger.info(f"Saving configuration to: {output_path}")
    # Convert to dict and save as YAML
    config_dict = config.model_dump(exclude_none=True)
    with open(output_path, "w", encoding="utf-8") as f:
        yaml.dump(
            config_dict,
            f,
            default_flow_style=False,
            sort_keys=False,
            allow_unicode=True,
            width=100,
        )
    logger.info(f"Configuration saved successfully")
--- a/src/drt/config/models.py
+++ b/src/drt/config/models.py
@@ -0,0 +1,199 @@
 """Pydantic models for configuration."""
 from typing import List, Optional, Dict, Any
 from pydantic import BaseModel, Field, field_validator
 class ConnectionConfig(BaseModel):
    """Database connection configuration."""
    server: str
    database: str
    username: Optional[str] = None
    password: Optional[str] = None
    timeout: Dict[str, int] = Field(default_factory=lambda: {"connection": 30, "query": 300})
 class DatabasePairConfig(BaseModel):
    """Configuration for a database pair to compare."""
    name: str
    enabled: bool = True
    baseline: ConnectionConfig
    target: ConnectionConfig
 class RowCountConfig(BaseModel):
    """Row count comparison configuration."""
    enabled: bool = True
    tolerance_percent: float = 0.0
 class SchemaConfig(BaseModel):
    """Schema comparison configuration."""
    enabled: bool = True
    checks: Dict[str, bool] = Field(default_factory=lambda: {
        "column_names": True,
        "data_types": True,
        "nullability": False,
        "column_order": False
    })
    severity: Dict[str, str] = Field(default_factory=lambda: {
        "missing_column_in_target": "FAIL",
        "extra_column_in_target": "WARNING",
        "data_type_mismatch": "WARNING"
    })
 class AggregatesConfig(BaseModel):
    """Aggregate comparison configuration."""
    enabled: bool = True
    tolerance_percent: float = 0.01
    large_table_threshold: int = 10000000
    sample_size: int = 100000
 class TableExistenceConfig(BaseModel):
    """Table existence check configuration."""
    missing_table_default: str = "FAIL"
    extra_table_action: str = "INFO"
 class ComparisonConfig(BaseModel):
    """Comparison settings."""
    mode: str = "health_check"
    row_count: RowCountConfig = Field(default_factory=RowCountConfig)
    schema_config: SchemaConfig = Field(default_factory=SchemaConfig, alias="schema")
    aggregates: AggregatesConfig = Field(default_factory=AggregatesConfig)
    table_existence: TableExistenceConfig = Field(default_factory=TableExistenceConfig)
    @property
    def schema(self) -> SchemaConfig:
        """Return schema config for backward compatibility."""
        return self.schema_config
    class Config:
        populate_by_name = True
 class ExecutionConfig(BaseModel):
    """Execution settings."""
    continue_on_error: bool = True
    retry: Dict[str, int] = Field(default_factory=lambda: {"attempts": 3, "delay_seconds": 5})
 class TableFilterConfig(BaseModel):
    """Table filtering configuration."""
    mode: str = "all"
    include_list: List[Dict[str, str]] = Field(default_factory=list)
    exclude_patterns: List[str] = Field(default_factory=lambda: [
        "*_TEMP", "*_TMP", "*_BAK", "*_BACKUP", "*_OLD", "tmp*", "temp*", "#*"
    ])
    exclude_schemas: List[str] = Field(default_factory=lambda: [
        "sys", "INFORMATION_SCHEMA", "guest"
    ])
 class TableConfig(BaseModel):
    """Individual table configuration."""
    schema_name: str = Field(..., alias="schema")
    name: str
    enabled: bool = True
    expected_in_target: bool = True
    estimated_row_count: int = 0
    primary_key_columns: List[str] = Field(default_factory=list)
    aggregate_columns: List[str] = Field(default_factory=list)
    notes: str = ""
    @property
    def schema(self) -> str:
        """Return schema name for backward compatibility."""
        return self.schema_name
    class Config:
        populate_by_name = True
 class ReportingConfig(BaseModel):
    """Reporting configuration."""
    output_directory: str = "./reports"
    investigation_directory: str = "./investigation_reports"
    formats: List[str] = Field(default_factory=lambda: ["html", "csv"])
    filename_template: str = "regression_report_{timestamp}"
    html: Dict[str, Any] = Field(default_factory=lambda: {
        "embed_styles": True,
        "include_charts": True,
        "colors": {
            "pass": "#28a745",
            "fail": "#dc3545",
            "warning": "#ffc107",
            "error": "#6f42c1",
            "info": "#17a2b8",
            "skip": "#6c757d"
        }
    })
    csv: Dict[str, Any] = Field(default_factory=lambda: {
        "delimiter": ",",
        "include_header": True,
        "encoding": "utf-8-sig"
    })
    pdf: Dict[str, str] = Field(default_factory=lambda: {
        "page_size": "A4",
        "orientation": "landscape"
    })
 class LoggingConfig(BaseModel):
    """Logging configuration."""
    level: str = "INFO"
    directory: str = "./logs"
    filename_template: str = "drt_{timestamp}.log"
    console: bool = True
    format: str = "%(asctime)s | %(levelname)-8s | %(name)-20s | %(message)s"
    date_format: str = "%Y%m%d_%H%M%S"
 class DiscoveryConfig(BaseModel):
    """Discovery settings."""
    output_file: str = "./config_discovered.yaml"
    analysis_directory: str = "./analysis"
    include_schemas: List[str] = Field(default_factory=list)
    exclude_schemas: List[str] = Field(default_factory=lambda: [
        "sys", "INFORMATION_SCHEMA", "guest"
    ])
    exclude_patterns: List[str] = Field(default_factory=lambda: [
        "*_TEMP", "*_TMP", "*_BAK", "#*"
    ])
    include_row_counts: bool = True
    include_column_details: bool = True
    detect_numeric_columns: bool = True
    detect_primary_keys: bool = True
    default_expected_in_target: bool = True
 class MetadataConfig(BaseModel):
    """Configuration metadata."""
    config_version: str = "1.0"
    generated_date: Optional[str] = None
    generated_by: Optional[str] = None
    framework_version: str = "1.0.0"
 class Config(BaseModel):
    """Main configuration model."""
    metadata: MetadataConfig = Field(default_factory=MetadataConfig)
    connections: Dict[str, ConnectionConfig] = Field(default_factory=dict)
    database_pairs: List[DatabasePairConfig] = Field(default_factory=list)
    comparison: ComparisonConfig = Field(default_factory=ComparisonConfig)
    execution: ExecutionConfig = Field(default_factory=ExecutionConfig)
    table_filters: TableFilterConfig = Field(default_factory=TableFilterConfig)
    tables: List[TableConfig] = Field(default_factory=list)
    reporting: ReportingConfig = Field(default_factory=ReportingConfig)
    logging: LoggingConfig = Field(default_factory=LoggingConfig)
    discovery: DiscoveryConfig = Field(default_factory=DiscoveryConfig)
    @field_validator('database_pairs')
    @classmethod
    def validate_database_pairs(cls, v):
        """Ensure at least one database pair is configured."""
        if not v:
            raise ValueError("At least one database pair must be configured")
        return v
--- a/src/drt/config/validator.py
+++ b/src/drt/config/validator.py
@@ -0,0 +1,79 @@
 """Configuration validator."""
 from typing import List, Tuple
 from drt.config.models import Config
 from drt.utils.logging import get_logger
 logger = get_logger(__name__)
 def validate_config(config: Config) -> Tuple[bool, List[str]]:
    """
    Validate configuration for completeness and correctness.
    Args:
        config: Configuration to validate
    Returns:
        Tuple of (is_valid, list_of_errors)
    """
    errors = []
    warnings = []
    # Check database pairs
    if not config.database_pairs:
        errors.append("No database pairs configured")
    for pair in config.database_pairs:
        if not pair.baseline.server or not pair.baseline.database:
            errors.append(f"Database pair '{pair.name}': Baseline connection incomplete")
        if not pair.target.server or not pair.target.database:
            errors.append(f"Database pair '{pair.name}': Target connection incomplete")
    # Check comparison mode
    valid_modes = ["health_check", "detailed"]
    if config.comparison.mode not in valid_modes:
        errors.append(f"Invalid comparison mode: {config.comparison.mode}. Must be one of {valid_modes}")
    # Check table configuration
    if config.table_filters.mode == "include_list" and not config.table_filters.include_list:
        warnings.append("Table filter mode is 'include_list' but include_list is empty")
    # Check for tables marked as not expected in target
    not_expected_count = sum(1 for t in config.tables if not t.expected_in_target)
    if not_expected_count > 0:
        warnings.append(f"{not_expected_count} table(s) marked as expected_in_target: false")
    # Check for disabled tables
    disabled_count = sum(1 for t in config.tables if not t.enabled)
    if disabled_count > 0:
        warnings.append(f"{disabled_count} table(s) disabled (enabled: false)")
    # Check reporting formats
    valid_formats = ["html", "csv", "pdf"]
    for fmt in config.reporting.formats:
        if fmt not in valid_formats:
            errors.append(f"Invalid report format: {fmt}. Must be one of {valid_formats}")
    # Check logging level
    valid_levels = ["DEBUG", "INFO", "WARNING", "ERROR"]
    if config.logging.level.upper() not in valid_levels:
        errors.append(f"Invalid logging level: {config.logging.level}. Must be one of {valid_levels}")
    # Log results
    if errors:
        logger.error(f"Configuration validation failed with {len(errors)} error(s)")
        for error in errors:
            logger.error(f"  ❌ {error}")
    if warnings:
        logger.warning(f"Configuration has {len(warnings)} warning(s)")
        for warning in warnings:
            logger.warning(f"  ⚠️  {warning}")
    if not errors and not warnings:
        logger.info("✓ Configuration is valid")
    elif not errors:
        logger.info("✓ Configuration is valid (with warnings)")
    return len(errors) == 0, errors
--- a/src/drt/database/init.py
+++ b/src/drt/database/init.py
@@ -0,0 +1,7 @@
 """Database access layer."""
 from drt.database.connection import ConnectionManager
 from drt.database.executor import QueryExecutor
 from drt.database.queries import SQLQueries
 __all__ = ["ConnectionManager", "QueryExecutor", "SQLQueries"]
--- a/src/drt/database/connection.py
+++ b/src/drt/database/connection.py
@@ -0,0 +1,176 @@
 """Database connection management."""
 import pyodbc
 import platform
 from typing import Optional
 from contextlib import contextmanager
 from drt.config.models import ConnectionConfig
 from drt.utils.logging import get_logger
 logger = get_logger(__name__)
 def get_odbc_driver() -> str:
    """
    Detect available ODBC driver for SQL Server.
    Returns:
        ODBC driver name
    """
    # Get list of available drivers
    drivers = [driver for driver in pyodbc.drivers() if 'SQL Server' in driver]
    # Prefer newer drivers
    preferred_order = [
        'ODBC Driver 18 for SQL Server',
        'ODBC Driver 17 for SQL Server',
        'ODBC Driver 13 for SQL Server',
        'SQL Server Native Client 11.0',
        'SQL Server'
    ]
    for preferred in preferred_order:
        if preferred in drivers:
            logger.debug(f"Using ODBC driver: {preferred}")
            return preferred
    # Fallback to first available
    if drivers:
        logger.warning(f"Using fallback driver: {drivers[0]}")
        return drivers[0]
    # Default fallback
    logger.warning("No SQL Server ODBC driver found, using default")
    return 'ODBC Driver 17 for SQL Server'
 class ConnectionManager:
    """Manages database connections using Windows Authentication."""
    def __init__(self, config: ConnectionConfig):
        """
        Initialize connection manager.
        Args:
            config: Connection configuration
        """
        self.config = config
        self._connection: Optional[pyodbc.Connection] = None
    def connect(self) -> pyodbc.Connection:
        """
        Establish database connection using Windows or SQL Authentication.
        Returns:
            Database connection
        Raises:
            pyodbc.Error: If connection fails
        """
        if self._connection and not self._connection.closed:
            return self._connection
        try:
            # Detect available ODBC driver
            driver = get_odbc_driver()
            # Build connection string
            conn_str_parts = [
                f"DRIVER={{{driver}}}",
                f"SERVER={self.config.server}",
                f"DATABASE={self.config.database}",
                f"Connection Timeout={self.config.timeout.get('connection', 30)}"
            ]
            # Check if username/password are provided for SQL Authentication
            if hasattr(self.config, 'username') and self.config.username:
                conn_str_parts.append(f"UID={self.config.username}")
                conn_str_parts.append(f"PWD={self.config.password}")
                auth_type = "SQL Authentication"
            else:
                # Use Windows Authentication
                conn_str_parts.append("Trusted_Connection=yes")
                auth_type = "Windows Authentication"
            # Add TrustServerCertificate on Linux for self-signed certs
            if platform.system() != 'Windows':
                conn_str_parts.append("TrustServerCertificate=yes")
            conn_str = ";".join(conn_str_parts) + ";"
            logger.info(f"Connecting to {self.config.server}.{self.config.database}")
            logger.debug(f"Connection string: {conn_str.replace(self.config.server, 'SERVER').replace(self.config.password if hasattr(self.config, 'password') and self.config.password else '', '***')}")
            self._connection = pyodbc.connect(conn_str)
            # Set query timeout
            query_timeout = self.config.timeout.get('query', 300)
            self._connection.timeout = query_timeout
            logger.info(f"✓ Connected ({auth_type})")
            return self._connection
        except pyodbc.Error as e:
            logger.error(f"Connection failed: {e}")
            raise
    def disconnect(self) -> None:
        """Close database connection."""
        if self._connection and not self._connection.closed:
            self._connection.close()
            logger.info("Connection closed")
            self._connection = None
    @contextmanager
    def get_connection(self):
        """
        Context manager for database connections.
        Yields:
            Database connection
        Example:
            with conn_mgr.get_connection() as conn:
                cursor = conn.cursor()
                cursor.execute("SELECT 1")
        """
        conn = self.connect()
        try:
            yield conn
        finally:
            # Don't close connection here - reuse it
            pass
    def test_connection(self) -> bool:
        """
        Test database connectivity.
        Returns:
            True if connection successful, False otherwise
        """
        try:
            with self.get_connection() as conn:
                cursor = conn.cursor()
                cursor.execute("SELECT 1")
                cursor.fetchone()
                return True
        except Exception as e:
            logger.error(f"Connection test failed: {e}")
            return False
    @property
    def is_connected(self) -> bool:
        """Check if connection is active."""
        return self._connection is not None and not self._connection.closed
    def __enter__(self):
        """Context manager entry."""
        self.connect()
        return self
    def __exit__(self, exc_type, exc_val, exc_tb):
        """Context manager exit."""
        self.disconnect()
    def __del__(self):
        """Cleanup on deletion."""
        self.disconnect()
--- a/src/drt/database/executor.py
+++ b/src/drt/database/executor.py
@@ -0,0 +1,267 @@
 """Query executor for READ ONLY database operations."""
 import pandas as pd
 import time
 from typing import Any, Dict, List, Optional, Tuple
 from drt.database.connection import ConnectionManager
 from drt.database.queries import SQLQueries
 from drt.models.enums import Status
 from drt.utils.logging import get_logger
 logger = get_logger(__name__)
 class QueryExecutor:
    """Executes READ ONLY queries against the database."""
    def __init__(self, connection_manager: ConnectionManager):
        """
        Initialize query executor.
        Args:
            connection_manager: Connection manager instance
        """
        self.conn_mgr = connection_manager
    def execute_query(self, query: str, params: tuple = None) -> pd.DataFrame:
        """
        Execute a SELECT query and return results as DataFrame.
        Args:
            query: SQL query string (SELECT only)
            params: Query parameters
        Returns:
            Query results as pandas DataFrame
        Raises:
            ValueError: If query is not a SELECT statement
            Exception: If query execution fails
        """
        # Safety check - only allow SELECT queries
        query_upper = query.strip().upper()
        if not query_upper.startswith('SELECT'):
            raise ValueError("Only SELECT queries are allowed (READ ONLY)")
        try:
            with self.conn_mgr.get_connection() as conn:
                if params:
                    df = pd.read_sql(query, conn, params=params)
                else:
                    df = pd.read_sql(query, conn)
                return df
        except Exception as e:
            logger.error(f"Query execution failed: {e}")
            logger.debug(f"Query: {query}")
            raise
    def execute_scalar(self, query: str, params: tuple = None) -> Any:
        """
        Execute query and return single scalar value.
        Args:
            query: SQL query string
            params: Query parameters
        Returns:
            Single scalar value
        """
        df = self.execute_query(query, params)
        if df.empty:
            return None
        return df.iloc[0, 0]
    def get_row_count(self, schema: str, table: str) -> int:
        """
        Get row count for a table.
        Args:
            schema: Schema name
            table: Table name
        Returns:
            Row count
        """
        query = SQLQueries.build_row_count_query(schema, table)
        count = self.execute_scalar(query)
        return int(count) if count is not None else 0
    def table_exists(self, schema: str, table: str) -> bool:
        """
        Check if table exists.
        Args:
            schema: Schema name
            table: Table name
        Returns:
            True if table exists, False otherwise
        """
        count = self.execute_scalar(SQLQueries.CHECK_TABLE_EXISTS, (schema, table))
        return int(count) > 0 if count is not None else False
    def get_all_tables(self) -> List[Dict[str, Any]]:
        """
        Get list of all user tables in the database.
        Returns:
            List of table information dictionaries
        """
        df = self.execute_query(SQLQueries.GET_ALL_TABLES)
        return df.to_dict('records')
    def get_columns(self, schema: str, table: str) -> List[Dict[str, Any]]:
        """
        Get column information for a table.
        Args:
            schema: Schema name
            table: Table name
        Returns:
            List of column information dictionaries
        """
        df = self.execute_query(SQLQueries.GET_COLUMNS, (schema, table))
        return df.to_dict('records')
    def get_primary_keys(self, schema: str, table: str) -> List[str]:
        """
        Get primary key columns for a table.
        Args:
            schema: Schema name
            table: Table name
        Returns:
            List of primary key column names
        """
        # Diagnostic: Check what columns are available in CONSTRAINT_COLUMN_USAGE
        try:
            logger.debug("Checking CONSTRAINT_COLUMN_USAGE schema...")
            constraint_cols_df = self.execute_query(SQLQueries.GET_CONSTRAINT_COLUMNS_SCHEMA)
            logger.debug(f"CONSTRAINT_COLUMN_USAGE columns: {constraint_cols_df['COLUMN_NAME'].tolist()}")
        except Exception as e:
            logger.debug(f"Could not query CONSTRAINT_COLUMN_USAGE schema: {e}")
        # Diagnostic: Check what columns are available in KEY_COLUMN_USAGE
        try:
            logger.debug("Checking KEY_COLUMN_USAGE schema...")
            key_cols_df = self.execute_query(SQLQueries.GET_KEY_COLUMNS_SCHEMA)
            logger.debug(f"KEY_COLUMN_USAGE columns: {key_cols_df['COLUMN_NAME'].tolist()}")
        except Exception as e:
            logger.debug(f"Could not query KEY_COLUMN_USAGE schema: {e}")
        df = self.execute_query(SQLQueries.GET_PRIMARY_KEYS, (schema, table))
        return df['COLUMN_NAME'].tolist() if not df.empty else []
    def get_aggregate_sums(self, schema: str, table: str, columns: List[str]) -> Dict[str, float]:
        """
        Get aggregate sums for numeric columns.
        Args:
            schema: Schema name
            table: Table name
            columns: List of column names to aggregate
        Returns:
            Dictionary mapping column names to their sums
        """
        if not columns:
            return {}
        query = SQLQueries.build_aggregate_query(schema, table, columns)
        if not query:
            return {}
        df = self.execute_query(query)
        if df.empty:
            return {col: 0.0 for col in columns}
        # Extract results
        results = {}
        for col in columns:
            sum_col = f"{col}_sum"
            if sum_col in df.columns:
                value = df.iloc[0][sum_col]
                results[col] = float(value) if pd.notna(value) else 0.0
            else:
                results[col] = 0.0
        return results
    def execute_investigation_query(
        self,
        query: str,
        timeout: Optional[int] = None
    ) -> Tuple[Status, Optional[pd.DataFrame], Optional[str], int]:
        """
        Execute investigation query with comprehensive error handling.
        This method is specifically for investigation queries and does NOT
        enforce the SELECT-only restriction. It handles errors gracefully
        and returns detailed status information.
        Args:
            query: SQL query to execute
            timeout: Query timeout in seconds (optional)
        Returns:
            Tuple of (status, result_df, error_message, execution_time_ms)
        """
        start_time = time.time()
        try:
            # Execute query
            with self.conn_mgr.get_connection() as conn:
                if timeout:
                    # Set query timeout if supported
                    try:
                        cursor = conn.cursor()
                        cursor.execute(f"SET QUERY_TIMEOUT {timeout}")
                    except Exception:
                        # Timeout setting not supported, continue anyway
                        pass
                df = pd.read_sql(query, conn)
            execution_time = int((time.time() - start_time) * 1000)
            return (Status.PASS, df, None, execution_time)
        except Exception as e:
            execution_time = int((time.time() - start_time) * 1000)
            error_msg = str(e)
            error_type = type(e).__name__
            # Categorize error
            if any(phrase in error_msg.lower() for phrase in [
                'does not exist',
                'invalid object name',
                'could not find',
                'not found'
            ]):
                status = Status.SKIP
                message = f"Object not found: {error_msg}"
            elif 'timeout' in error_msg.lower():
                status = Status.FAIL
                message = f"Query timeout: {error_msg}"
            elif any(phrase in error_msg.lower() for phrase in [
                'syntax error',
                'incorrect syntax'
            ]):
                status = Status.FAIL
                message = f"Syntax error: {error_msg}"
            elif 'permission' in error_msg.lower():
                status = Status.FAIL
                message = f"Permission denied: {error_msg}"
            else:
                status = Status.FAIL
                message = f"{error_type}: {error_msg}"
            logger.debug(f"Query execution failed: {message}")
            return (status, None, message, execution_time)
--- a/src/drt/database/queries.py
+++ b/src/drt/database/queries.py
@@ -0,0 +1,128 @@
 """SQL query templates for database operations."""
 class SQLQueries:
    """Collection of SQL query templates (READ ONLY)."""
    # Table discovery queries
    GET_ALL_TABLES = """
        SELECT 
            s.name AS schema_name,
            t.name AS table_name,
            SUM(p.rows) AS estimated_rows
        FROM sys.tables t WITH (NOLOCK)
        INNER JOIN sys.schemas s WITH (NOLOCK) ON t.schema_id = s.schema_id
        INNER JOIN sys.partitions p WITH (NOLOCK) ON t.object_id = p.object_id
        WHERE t.type = 'U' 
          AND p.index_id IN (0, 1)
        GROUP BY s.name, t.name
        ORDER BY s.name, t.name
    """
    GET_COLUMNS = """
        SELECT 
            COLUMN_NAME,
            DATA_TYPE,
            CHARACTER_MAXIMUM_LENGTH,
            NUMERIC_PRECISION,
            NUMERIC_SCALE,
            IS_NULLABLE,
            ORDINAL_POSITION
        FROM INFORMATION_SCHEMA.COLUMNS WITH (NOLOCK)
        WHERE TABLE_SCHEMA = ?
          AND TABLE_NAME = ?
        ORDER BY ORDINAL_POSITION
    """
    # Diagnostic query to check available columns in CONSTRAINT_COLUMN_USAGE
    GET_CONSTRAINT_COLUMNS_SCHEMA = """
        SELECT COLUMN_NAME
        FROM INFORMATION_SCHEMA.COLUMNS WITH (NOLOCK)
        WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'
          AND TABLE_NAME = 'CONSTRAINT_COLUMN_USAGE'
        ORDER BY ORDINAL_POSITION
    """
    # Diagnostic query to check available columns in KEY_COLUMN_USAGE
    GET_KEY_COLUMNS_SCHEMA = """
        SELECT COLUMN_NAME
        FROM INFORMATION_SCHEMA.COLUMNS WITH (NOLOCK)
        WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'
          AND TABLE_NAME = 'KEY_COLUMN_USAGE'
        ORDER BY ORDINAL_POSITION
    """
    GET_PRIMARY_KEYS = """
        SELECT
            c.COLUMN_NAME
        FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc WITH (NOLOCK)
        INNER JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE c WITH (NOLOCK)
            ON tc.CONSTRAINT_NAME = c.CONSTRAINT_NAME
        WHERE tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
          AND tc.TABLE_SCHEMA = ?
          AND tc.TABLE_NAME = ?
    """
    # Comparison queries
    GET_ROW_COUNT = """
        SELECT COUNT(*) AS row_count 
        FROM [{schema}].[{table}] WITH (NOLOCK)
    """
    CHECK_TABLE_EXISTS = """
        SELECT COUNT(*) AS table_exists
        FROM INFORMATION_SCHEMA.TABLES WITH (NOLOCK)
        WHERE TABLE_SCHEMA = ?
          AND TABLE_NAME = ?
    """
    GET_AGGREGATE_SUMS = """
        SELECT {column_expressions}
        FROM [{schema}].[{table}] WITH (NOLOCK)
    """
    @staticmethod
    def build_row_count_query(schema: str, table: str) -> str:
        """Build row count query for a specific table."""
        return SQLQueries.GET_ROW_COUNT.format(schema=schema, table=table)
    @staticmethod
    def build_aggregate_query(schema: str, table: str, columns: list[str]) -> str:
        """
        Build aggregate query for numeric columns.
        Args:
            schema: Schema name
            table: Table name
            columns: List of column names to aggregate
        Returns:
            SQL query string
        """
        if not columns:
            return None
        # Build column expressions
        column_expressions = []
        for col in columns:
            # Cast to FLOAT to handle different numeric types
            expr = f"SUM(CAST([{col}] AS FLOAT)) AS [{col}_sum]"
            column_expressions.append(expr)
        column_expr_str = ",\n            ".join(column_expressions)
        return SQLQueries.GET_AGGREGATE_SUMS.format(
            schema=schema,
            table=table,
            column_expressions=column_expr_str
        )
    @staticmethod
    def is_numeric_type(data_type: str) -> bool:
        """Check if a data type is numeric."""
        numeric_types = {
            'int', 'bigint', 'smallint', 'tinyint',
            'decimal', 'numeric', 'float', 'real',
            'money', 'smallmoney'
        }
        return data_type.lower() in numeric_types
--- a/src/drt/models/init.py
+++ b/src/drt/models/init.py
@@ -0,0 +1,16 @@
 """Data models for the regression testing framework."""
 from drt.models.enums import Status, CheckType
 from drt.models.table import TableInfo, ColumnInfo
 from drt.models.results import ComparisonResult, CheckResult
 from drt.models.summary import ExecutionSummary
 __all__ = [
    "Status",
    "CheckType",
    "TableInfo",
    "ColumnInfo",
    "ComparisonResult",
    "CheckResult",
    "ExecutionSummary",
 ]
--- a/src/drt/models/enums.py
+++ b/src/drt/models/enums.py
@@ -0,0 +1,49 @@
 """Enumerations for status and check types."""
 from enum import Enum
 class Status(str, Enum):
    """Result status enumeration."""
    PASS = "PASS"
    FAIL = "FAIL"
    WARNING = "WARNING"
    ERROR = "ERROR"
    INFO = "INFO"
    SKIP = "SKIP"
    def __str__(self) -> str:
        return self.value
    @property
    def severity(self) -> int:
        """Return severity level for comparison (higher = more severe)."""
        severity_map = {
            Status.ERROR: 6,
            Status.FAIL: 5,
            Status.WARNING: 4,
            Status.INFO: 3,
            Status.PASS: 2,
            Status.SKIP: 1,
        }
        return severity_map[self]
    @classmethod
    def most_severe(cls, statuses: list["Status"]) -> "Status":
        """Return the most severe status from a list."""
        if not statuses:
            return cls.SKIP
        return max(statuses, key=lambda s: s.severity)
 class CheckType(str, Enum):
    """Type of comparison check."""
    EXISTENCE = "TABLE_EXISTENCE"
    ROW_COUNT = "ROW_COUNT"
    SCHEMA = "SCHEMA"
    AGGREGATE = "AGGREGATE"
    def __str__(self) -> str:
        return self.value
--- a/src/drt/models/investigation.py
+++ b/src/drt/models/investigation.py
@@ -0,0 +1,70 @@
 """Data models for investigation feature."""
 from dataclasses import dataclass, field
 from typing import List, Optional
 import pandas as pd
 from drt.models.enums import Status
@dataclass
 class QueryExecutionResult:
    """Result of executing a single query."""
    query_number: int
    query_text: str
    status: Status
    execution_time_ms: int
    result_data: Optional[pd.DataFrame] = None
    error_message: Optional[str] = None
    row_count: int = 0
@dataclass
 class TableInvestigationResult:
    """Results for all queries in a table's investigation."""
    schema: str
    table: str
    sql_file_path: str
    baseline_results: List[QueryExecutionResult]
    target_results: List[QueryExecutionResult]
    overall_status: Status
    timestamp: str
    @property
    def full_name(self) -> str:
        """Get full table name."""
        return f"{self.schema}.{self.table}"
    @property
    def total_queries(self) -> int:
        """Get total number of queries."""
        return len(self.baseline_results)
    @property
    def successful_queries(self) -> int:
        """Get number of successful queries."""
        all_results = self.baseline_results + self.target_results
        return sum(1 for r in all_results if r.status == Status.PASS)
@dataclass
 class InvestigationSummary:
    """Overall investigation execution summary."""
    start_time: str
    end_time: str
    duration_seconds: int
    analysis_directory: str
    baseline_info: str
    target_info: str
    tables_processed: int
    tables_successful: int
    tables_partial: int
    tables_failed: int
    total_queries_executed: int
    results: List[TableInvestigationResult] = field(default_factory=list)
    @property
    def success_rate(self) -> float:
        """Calculate success rate percentage."""
        if self.tables_processed == 0:
            return 0.0
        return (self.tables_successful / self.tables_processed) * 100
--- a/src/drt/models/results.py
+++ b/src/drt/models/results.py
@@ -0,0 +1,49 @@
 """Result models for comparison operations."""
 from typing import Any, Dict, Optional
 from pydantic import BaseModel, Field
 from drt.models.enums import Status, CheckType
 from drt.models.table import TableInfo
 class CheckResult(BaseModel):
    """Result of a single check operation."""
    check_type: CheckType
    status: Status
    baseline_value: Any = None
    target_value: Any = None
    difference: Any = None
    message: str = ""
    details: Dict[str, Any] = Field(default_factory=dict)
    class Config:
        arbitrary_types_allowed = True
 class ComparisonResult(BaseModel):
    """Result of comparing a single table."""
    table: TableInfo
    overall_status: Status
    check_results: list[CheckResult] = Field(default_factory=list)
    execution_time_ms: int = 0
    error_message: str = ""
    timestamp: str = ""
    def add_check(self, check_result: CheckResult) -> None:
        """Add a check result and update overall status."""
        self.check_results.append(check_result)
        # Update overall status to most severe
        all_statuses = [cr.status for cr in self.check_results]
        self.overall_status = Status.most_severe(all_statuses)
    def get_check(self, check_type: CheckType) -> Optional[CheckResult]:
        """Get check result by type."""
        for check in self.check_results:
            if check.check_type == check_type:
                return check
        return None
    class Config:
        arbitrary_types_allowed = True
--- a/src/drt/models/summary.py
+++ b/src/drt/models/summary.py
@@ -0,0 +1,65 @@
 """Execution summary model."""
 from typing import List
 from pydantic import BaseModel, Field
 from drt.models.results import ComparisonResult
 from drt.models.enums import Status
 class ExecutionSummary(BaseModel):
    """Summary of an entire test execution."""
    start_time: str
    end_time: str
    duration_seconds: int
    total_tables: int = 0
    passed: int = 0
    failed: int = 0
    warnings: int = 0
    errors: int = 0
    skipped: int = 0
    info: int = 0
    results: List[ComparisonResult] = Field(default_factory=list)
    config_file: str = ""
    baseline_info: str = ""
    target_info: str = ""
    def add_result(self, result: ComparisonResult) -> None:
        """Add a comparison result and update counters."""
        self.results.append(result)
        self.total_tables += 1
        # Update status counters
        status = result.overall_status
        if status == Status.PASS:
            self.passed += 1
        elif status == Status.FAIL:
            self.failed += 1
        elif status == Status.WARNING:
            self.warnings += 1
        elif status == Status.ERROR:
            self.errors += 1
        elif status == Status.INFO:
            self.info += 1
        elif status == Status.SKIP:
            self.skipped += 1
    @property
    def has_failures(self) -> bool:
        """Check if there are any failures."""
        return self.failed > 0
    @property
    def has_errors(self) -> bool:
        """Check if there are any errors."""
        return self.errors > 0
    @property
    def success_rate(self) -> float:
        """Calculate success rate percentage."""
        if self.total_tables == 0:
            return 0.0
        return (self.passed / self.total_tables) * 100
    class Config:
        arbitrary_types_allowed = True
--- a/src/drt/models/table.py
+++ b/src/drt/models/table.py
@@ -0,0 +1,53 @@
 """Table and column information models."""
 from typing import List, Optional
 from pydantic import BaseModel, Field
 class ColumnInfo(BaseModel):
    """Information about a database column."""
    name: str
    data_type: str
    max_length: Optional[int] = None
    precision: Optional[int] = None
    scale: Optional[int] = None
    is_nullable: bool = True
    is_numeric: bool = False
    ordinal_position: int
    class Config:
        frozen = True
 class TableInfo(BaseModel):
    """Information about a database table."""
    schema_name: str = Field(..., alias="schema")
    name: str
    estimated_row_count: int = 0
    columns: List[ColumnInfo] = Field(default_factory=list)
    primary_key_columns: List[str] = Field(default_factory=list)
    enabled: bool = True
    expected_in_target: bool = True
    aggregate_columns: List[str] = Field(default_factory=list)
    notes: str = ""
    @property
    def schema(self) -> str:
        """Return schema name for backward compatibility."""
        return self.schema_name
    @property
    def full_name(self) -> str:
        """Return fully qualified table name."""
        return f"{self.schema_name}.{self.name}"
    @property
    def numeric_columns(self) -> List[ColumnInfo]:
        """Return list of numeric columns."""
        return [col for col in self.columns if col.is_numeric]
    class Config:
        frozen = False
        populate_by_name = True  # Allow both 'schema' and 'schema_name'
--- a/src/drt/reporting/init.py
+++ b/src/drt/reporting/init.py
@@ -0,0 +1,7 @@
 """Reporting module for generating test reports."""
 from drt.reporting.generator import ReportGenerator
 from drt.reporting.html import HTMLReportGenerator
 from drt.reporting.csv import CSVReportGenerator
 __all__ = ["ReportGenerator", "HTMLReportGenerator", "CSVReportGenerator"]
--- a/src/drt/reporting/csv.py
+++ b/src/drt/reporting/csv.py
@@ -0,0 +1,97 @@
 """CSV report generator."""
 import csv
 from pathlib import Path
 from drt.models.summary import ExecutionSummary
 from drt.models.enums import CheckType
 from drt.config.models import Config
 from drt.utils.logging import get_logger
 logger = get_logger(__name__)
 class CSVReportGenerator:
    """Generates CSV format reports."""
    def __init__(self, config: Config):
        """
        Initialize CSV generator.
        Args:
            config: Configuration object
        """
        self.config = config
    def generate(self, summary: ExecutionSummary, filepath: Path) -> None:
        """
        Generate CSV report.
        Args:
            summary: Execution summary
            filepath: Output file path
        """
        csv_config = self.config.reporting.csv
        delimiter = csv_config.get("delimiter", ",")
        encoding = csv_config.get("encoding", "utf-8-sig")
        with open(filepath, "w", newline="", encoding=encoding) as f:
            writer = csv.writer(f, delimiter=delimiter)
            # Write header
            writer.writerow([
                "Timestamp",
                "Schema",
                "Table",
                "Overall_Status",
                "Existence_Status",
                "RowCount_Status",
                "Baseline_Rows",
                "Target_Rows",
                "Row_Difference",
                "Row_Diff_Pct",
                "Schema_Status",
                "Schema_Details",
                "Aggregate_Status",
                "Aggregate_Details",
                "Expected_In_Target",
                "Notes",
                "Execution_Time_Ms"
            ])
            # Write data rows
            for result in summary.results:
                # Get check results
                existence = result.get_check(CheckType.EXISTENCE)
                row_count = result.get_check(CheckType.ROW_COUNT)
                schema = result.get_check(CheckType.SCHEMA)
                aggregate = result.get_check(CheckType.AGGREGATE)
                # Extract values
                baseline_rows = row_count.baseline_value if row_count else "N/A"
                target_rows = row_count.target_value if row_count else "N/A"
                row_diff = row_count.difference if row_count else "N/A"
                row_diff_pct = ""
                if row_count and row_count.baseline_value and row_count.baseline_value > 0:
                    row_diff_pct = f"{(row_count.difference / row_count.baseline_value * 100):.2f}%"
                writer.writerow([
                    result.timestamp,
                    result.table.schema,
                    result.table.name,
                    result.overall_status.value,
                    existence.status.value if existence else "N/A",
                    row_count.status.value if row_count else "N/A",
                    baseline_rows,
                    target_rows,
                    row_diff,
                    row_diff_pct,
                    schema.status.value if schema else "N/A",
                    schema.message if schema else "",
                    aggregate.status.value if aggregate else "N/A",
                    aggregate.message if aggregate else "",
                    result.table.expected_in_target,
                    result.table.notes,
                    result.execution_time_ms
                ])
        logger.debug(f"CSV report written to {filepath}")
--- a/src/drt/reporting/generator.py
+++ b/src/drt/reporting/generator.py
@@ -0,0 +1,84 @@
 """Report generator orchestrator."""
 from pathlib import Path
 from typing import List
 from drt.models.summary import ExecutionSummary
 from drt.config.models import Config
 from drt.reporting.html import HTMLReportGenerator
 from drt.reporting.csv import CSVReportGenerator
 from drt.utils.logging import get_logger
 from drt.utils.timestamps import get_timestamp
 logger = get_logger(__name__)
 class ReportGenerator:
    """Orchestrates report generation in multiple formats."""
    def __init__(self, config: Config):
        """
        Initialize report generator.
        Args:
            config: Configuration object
        """
        self.config = config
        # Use absolute path from config
        self.output_dir = Path(config.reporting.output_directory).expanduser().resolve()
        self.output_dir.mkdir(parents=True, exist_ok=True)
    def generate_reports(self, summary: ExecutionSummary) -> List[str]:
        """
        Generate reports in all configured formats.
        Args:
            summary: Execution summary
        Returns:
            List of generated report file paths
        """
        logger.info("Generating reports...")
        generated_files = []
        timestamp = summary.start_time
        # Generate filename
        filename_base = self.config.reporting.filename_template.format(
            timestamp=timestamp,
            config_name="regression"
        )
        for fmt in self.config.reporting.formats:
            try:
                if fmt == "html":
                    filepath = self._generate_html(summary, filename_base)
                    generated_files.append(filepath)
                elif fmt == "csv":
                    filepath = self._generate_csv(summary, filename_base)
                    generated_files.append(filepath)
                elif fmt == "pdf":
                    logger.warning("PDF generation not yet implemented")
                else:
                    logger.warning(f"Unknown report format: {fmt}")
            except Exception as e:
                logger.error(f"Failed to generate {fmt} report: {e}")
        logger.info(f"Generated {len(generated_files)} report(s)")
        return generated_files
    def _generate_html(self, summary: ExecutionSummary, filename_base: str) -> str:
        """Generate HTML report."""
        generator = HTMLReportGenerator(self.config)
        filepath = self.output_dir / f"{filename_base}.html"
        generator.generate(summary, filepath)
        logger.info(f"✓ HTML: {filepath}")
        return str(filepath)
    def _generate_csv(self, summary: ExecutionSummary, filename_base: str) -> str:
        """Generate CSV report."""
        generator = CSVReportGenerator(self.config)
        filepath = self.output_dir / f"{filename_base}.csv"
        generator.generate(summary, filepath)
        logger.info(f"✓ CSV:  {filepath}")
        return str(filepath)
--- a/src/drt/reporting/html.py
+++ b/src/drt/reporting/html.py
@@ -0,0 +1,239 @@
 """HTML report generator."""
 from pathlib import Path
 from drt.models.summary import ExecutionSummary
 from drt.models.enums import Status, CheckType
 from drt.config.models import Config
 from drt.utils.logging import get_logger
 from drt.utils.timestamps import format_duration
 logger = get_logger(__name__)
 class HTMLReportGenerator:
    """Generates HTML format reports."""
    def __init__(self, config: Config):
        """
        Initialize HTML generator.
        Args:
            config: Configuration object
        """
        self.config = config
        self.colors = config.reporting.html.get("colors", {})
    def generate(self, summary: ExecutionSummary, filepath: Path) -> None:
        """
        Generate HTML report.
        Args:
            summary: Execution summary
            filepath: Output file path
        """
        html_content = self._build_html(summary)
        with open(filepath, "w", encoding="utf-8") as f:
            f.write(html_content)
        logger.debug(f"HTML report written to {filepath}")
    def _build_html(self, summary: ExecutionSummary) -> str:
        """Build complete HTML document."""
        return f"""<!DOCTYPE html>
 <html lang="en">
 <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Data Regression Test Report - {summary.start_time}</title>
    {self._get_styles()}
 </head>
 <body>
    <div class="container">
        {self._build_header(summary)}
        {self._build_summary(summary)}
        {self._build_failures(summary)}
        {self._build_warnings(summary)}
        {self._build_detailed_results(summary)}
        {self._build_footer(summary)}
    </div>
 </body>
 </html>"""
    def _get_styles(self) -> str:
        """Get embedded CSS styles."""
        return """<style>
        * { margin: 0; padding: 0; box-sizing: border-box; }
        body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; background: #f5f5f5; padding: 20px; }
        .container { max-width: 1400px; margin: 0 auto; background: white; padding: 30px; border-radius: 8px; box-shadow: 0 2px 10px rgba(0,0,0,0.1); }
        h1 { color: #333; border-bottom: 3px solid #007bff; padding-bottom: 10px; margin-bottom: 20px; }
        h2 { color: #555; margin-top: 30px; margin-bottom: 15px; border-left: 4px solid #007bff; padding-left: 10px; }
        .header { background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 20px; border-radius: 8px; margin-bottom: 30px; }
        .header h1 { color: white; border: none; }
        .info-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 15px; margin: 20px 0; }
        .info-box { background: #f8f9fa; padding: 15px; border-radius: 5px; border-left: 4px solid #007bff; }
        .info-label { font-weight: bold; color: #666; font-size: 0.9em; }
        .info-value { color: #333; font-size: 1.1em; margin-top: 5px; }
        .summary-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(150px, 1fr)); gap: 15px; margin: 20px 0; }
        .summary-box { padding: 20px; border-radius: 8px; text-align: center; color: white; }
        .summary-box.pass { background: #28a745; }
        .summary-box.fail { background: #dc3545; }
        .summary-box.warning { background: #ffc107; color: #333; }
        .summary-box.error { background: #6f42c1; }
        .summary-box.info { background: #17a2b8; }
        .summary-box.skip { background: #6c757d; }
        .summary-number { font-size: 2.5em; font-weight: bold; }
        .summary-label { font-size: 0.9em; margin-top: 5px; }
        .summary-percent { font-size: 0.8em; opacity: 0.9; }
        table { width: 100%; border-collapse: collapse; margin: 20px 0; }
        th { background: #007bff; color: white; padding: 12px; text-align: left; font-weight: 600; }
        td { padding: 10px 12px; border-bottom: 1px solid #dee2e6; }
        tr:hover { background: #f8f9fa; }
        .status-badge { display: inline-block; padding: 4px 12px; border-radius: 12px; font-size: 0.85em; font-weight: 600; }
        .status-PASS { background: #d4edda; color: #155724; }
        .status-FAIL { background: #f8d7da; color: #721c24; }
        .status-WARNING { background: #fff3cd; color: #856404; }
        .status-ERROR { background: #e7d6f5; color: #4a148c; }
        .status-INFO { background: #d1ecf1; color: #0c5460; }
        .status-SKIP { background: #e2e3e5; color: #383d41; }
        .failure-box { background: #fff5f5; border: 1px solid #feb2b2; border-radius: 5px; padding: 15px; margin: 10px 0; }
        .failure-title { font-weight: bold; color: #c53030; margin-bottom: 8px; }
        .failure-detail { color: #666; margin: 5px 0; font-size: 0.95em; }
        .footer { margin-top: 40px; padding-top: 20px; border-top: 1px solid #dee2e6; text-align: center; color: #666; font-size: 0.9em; }
        </style>"""
    def _build_header(self, summary: ExecutionSummary) -> str:
        """Build report header."""
        return f"""<div class="header">
        <h1>📊 Data Regression Test Report</h1>
        <p>Generated: {summary.start_time}</p>
    </div>
    <div class="info-grid">
        <div class="info-box">
            <div class="info-label">Start Time</div>
            <div class="info-value">{summary.start_time}</div>
        </div>
        <div class="info-box">
            <div class="info-label">End Time</div>
            <div class="info-value">{summary.end_time}</div>
        </div>
        <div class="info-box">
            <div class="info-label">Duration</div>
            <div class="info-value">{format_duration(summary.duration_seconds)}</div>
        </div>
        <div class="info-box">
            <div class="info-label">Baseline</div>
            <div class="info-value">{summary.baseline_info}</div>
        </div>
        <div class="info-box">
            <div class="info-label">Target</div>
            <div class="info-value">{summary.target_info}</div>
        </div>
        <div class="info-box">
            <div class="info-label">Total Tables</div>
            <div class="info-value">{summary.total_tables}</div>
        </div>
    </div>"""
    def _build_summary(self, summary: ExecutionSummary) -> str:
        """Build summary section."""
        return f"""<h2>Summary</h2>
    <div class="summary-grid">
        <div class="summary-box pass">
            <div class="summary-number">{summary.passed}</div>
            <div class="summary-label">PASS</div>
            <div class="summary-percent">{(summary.passed/summary.total_tables*100) if summary.total_tables > 0 else 0:.1f}%</div>
        </div>
        <div class="summary-box fail">
            <div class="summary-number">{summary.failed}</div>
            <div class="summary-label">FAIL</div>
            <div class="summary-percent">{(summary.failed/summary.total_tables*100) if summary.total_tables > 0 else 0:.1f}%</div>
        </div>
        <div class="summary-box warning">
            <div class="summary-number">{summary.warnings}</div>
            <div class="summary-label">WARNING</div>
            <div class="summary-percent">{(summary.warnings/summary.total_tables*100) if summary.total_tables > 0 else 0:.1f}%</div>
        </div>
        <div class="summary-box error">
            <div class="summary-number">{summary.errors}</div>
            <div class="summary-label">ERROR</div>
            <div class="summary-percent">{(summary.errors/summary.total_tables*100) if summary.total_tables > 0 else 0:.1f}%</div>
        </div>
        <div class="summary-box info">
            <div class="summary-number">{summary.info}</div>
            <div class="summary-label">INFO</div>
            <div class="summary-percent">{(summary.info/summary.total_tables*100) if summary.total_tables > 0 else 0:.1f}%</div>
        </div>
        <div class="summary-box skip">
            <div class="summary-number">{summary.skipped}</div>
            <div class="summary-label">SKIP</div>
            <div class="summary-percent">{(summary.skipped/summary.total_tables*100) if summary.total_tables > 0 else 0:.1f}%</div>
        </div>
    </div>"""
    def _build_failures(self, summary: ExecutionSummary) -> str:
        """Build failures section."""
        failures = [r for r in summary.results if r.overall_status == Status.FAIL]
        if not failures:
            return ""
        html = '<h2>❌ Failures (Immediate Action Required)</h2>'
        for result in failures:
            html += f"""<div class="failure-box">
            <div class="failure-title">{result.table.full_name}</div>"""
            for check in result.check_results:
                if check.status == Status.FAIL:
                    html += f'<div class="failure-detail">• {check.check_type.value}: {check.message}</div>'
            html += '</div>'
        return html
    def _build_warnings(self, summary: ExecutionSummary) -> str:
        """Build warnings section."""
        warnings = [r for r in summary.results if r.overall_status == Status.WARNING]
        if not warnings:
            return ""
        html = '<h2>⚠️ Warnings</h2><ul>'
        for result in warnings:
            for check in result.check_results:
                if check.status == Status.WARNING:
                    html += f'<li><strong>{result.table.full_name}</strong>: {check.message}</li>'
        html += '</ul>'
        return html
    def _build_detailed_results(self, summary: ExecutionSummary) -> str:
        """Build detailed results table."""
        html = '<h2>Detailed Results</h2><table><thead><tr>'
        html += '<th>Table</th><th>Status</th><th>Row Count</th><th>Schema</th><th>Aggregates</th><th>Time (ms)</th>'
        html += '</tr></thead><tbody>'
        for result in summary.results:
            row_count = result.get_check(CheckType.ROW_COUNT)
            schema = result.get_check(CheckType.SCHEMA)
            aggregate = result.get_check(CheckType.AGGREGATE)
            html += f'<tr><td>{result.table.full_name}</td>'
            html += f'<td><span class="status-badge status-{result.overall_status.value}">{result.overall_status.value}</span></td>'
            html += f'<td><span class="status-badge status-{row_count.status.value if row_count else "SKIP"}">{row_count.status.value if row_count else "SKIP"}</span></td>'
            html += f'<td><span class="status-badge status-{schema.status.value if schema else "SKIP"}">{schema.status.value if schema else "SKIP"}</span></td>'
            html += f'<td><span class="status-badge status-{aggregate.status.value if aggregate else "SKIP"}">{aggregate.status.value if aggregate else "SKIP"}</span></td>'
            html += f'<td>{result.execution_time_ms}</td></tr>'
        html += '</tbody></table>'
        return html
    def _build_footer(self, summary: ExecutionSummary) -> str:
        """Build report footer."""
        return f"""<div class="footer">
        <p>Generated by Data Regression Testing Framework v1.0.0</p>
        <p>Success Rate: {summary.success_rate:.1f}%</p>
    </div>"""
--- a/src/drt/reporting/investigation_report.py
+++ b/src/drt/reporting/investigation_report.py
@@ -0,0 +1,357 @@
 """Investigation report generators for HTML and CSV formats."""
 import csv
 from pathlib import Path
 from typing import Optional
 from drt.models.investigation import InvestigationSummary, QueryExecutionResult
 from drt.models.enums import Status
 from drt.config.models import Config
 from drt.utils.logging import get_logger
 from drt.utils.timestamps import format_duration
 logger = get_logger(__name__)
 class InvestigationHTMLReportGenerator:
    """Generates HTML format investigation reports."""
    def __init__(self, config: Config):
        """
        Initialize HTML generator.
        Args:
            config: Configuration object
        """
        self.config = config
        self.max_rows = 100  # Limit rows displayed in HTML
    def generate(self, summary: InvestigationSummary, filepath: Path) -> None:
        """
        Generate HTML investigation report.
        Args:
            summary: Investigation summary
            filepath: Output file path
        """
        html_content = self._build_html(summary)
        with open(filepath, "w", encoding="utf-8") as f:
            f.write(html_content)
        logger.debug(f"Investigation HTML report written to {filepath}")
    def _build_html(self, summary: InvestigationSummary) -> str:
        """Build complete HTML document."""
        return f"""<!DOCTYPE html>
 <html lang="en">
 <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Investigation Report - {summary.start_time}</title>
    {self._get_styles()}
    {self._get_scripts()}
 </head>
 <body>
    <div class="container">
        {self._build_header(summary)}
        {self._build_summary(summary)}
        {self._build_table_results(summary)}
        {self._build_footer(summary)}
    </div>
 </body>
 </html>"""
    def _get_styles(self) -> str:
        """Get embedded CSS styles."""
        return """<style>
        * { margin: 0; padding: 0; box-sizing: border-box; }
        body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; background: #f5f5f5; padding: 20px; }
        .container { max-width: 1600px; margin: 0 auto; background: white; padding: 30px; border-radius: 8px; box-shadow: 0 2px 10px rgba(0,0,0,0.1); }
        h1 { color: #333; border-bottom: 3px solid #007bff; padding-bottom: 10px; margin-bottom: 20px; }
        h2 { color: #555; margin-top: 30px; margin-bottom: 15px; border-left: 4px solid #007bff; padding-left: 10px; }
        h3 { color: #666; margin-top: 20px; margin-bottom: 10px; }
        .header { background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 20px; border-radius: 8px; margin-bottom: 30px; }
        .header h1 { color: white; border: none; }
        .info-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 15px; margin: 20px 0; }
        .info-box { background: #f8f9fa; padding: 15px; border-radius: 5px; border-left: 4px solid #007bff; }
        .info-label { font-weight: bold; color: #666; font-size: 0.9em; }
        .info-value { color: #333; font-size: 1.1em; margin-top: 5px; }
        .summary-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(150px, 1fr)); gap: 15px; margin: 20px 0; }
        .summary-box { padding: 20px; border-radius: 8px; text-align: center; color: white; }
        .summary-box.success { background: #28a745; }
        .summary-box.partial { background: #ffc107; color: #333; }
        .summary-box.failed { background: #dc3545; }
        .summary-number { font-size: 2.5em; font-weight: bold; }
        .summary-label { font-size: 0.9em; margin-top: 5px; }
        .table-card { background: #fff; border: 1px solid #dee2e6; border-radius: 8px; margin: 20px 0; overflow: hidden; }
        .table-header { background: #f8f9fa; padding: 15px; border-bottom: 2px solid #dee2e6; cursor: pointer; }
        .table-header:hover { background: #e9ecef; }
        .table-name { font-size: 1.2em; font-weight: bold; color: #333; }
        .table-status { display: inline-block; padding: 4px 12px; border-radius: 12px; font-size: 0.85em; font-weight: 600; margin-left: 10px; }
        .status-SUCCESS { background: #d4edda; color: #155724; }
        .status-PASS { background: #d4edda; color: #155724; }
        .status-FAIL { background: #f8d7da; color: #721c24; }
        .status-WARNING { background: #fff3cd; color: #856404; }
        .status-SKIP { background: #e2e3e5; color: #383d41; }
        .table-content { padding: 20px; display: none; }
        .table-content.active { display: block; }
        .query-section { margin: 20px 0; padding: 15px; background: #f8f9fa; border-radius: 5px; }
        .query-header { font-weight: bold; margin-bottom: 10px; color: #555; }
        .comparison-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 20px; margin: 15px 0; }
        .env-section { background: white; padding: 15px; border-radius: 5px; border: 1px solid #dee2e6; }
        .env-title { font-weight: bold; color: #007bff; margin-bottom: 10px; }
        .query-code { background: #2d2d2d; color: #f8f8f2; padding: 15px; border-radius: 5px; overflow-x: auto; font-family: 'Courier New', monospace; font-size: 0.9em; margin: 10px 0; }
        .result-table { width: 100%; border-collapse: collapse; margin: 10px 0; font-size: 0.9em; }
        .result-table th { background: #007bff; color: white; padding: 8px; text-align: left; }
        .result-table td { padding: 8px; border-bottom: 1px solid #dee2e6; }
        .result-table tr:hover { background: #f8f9fa; }
        .error-box { background: #fff5f5; border: 1px solid #feb2b2; border-radius: 5px; padding: 15px; margin: 10px 0; color: #c53030; }
        .result-meta { display: flex; gap: 20px; margin: 10px 0; font-size: 0.9em; color: #666; }
        .footer { margin-top: 40px; padding-top: 20px; border-top: 1px solid #dee2e6; text-align: center; color: #666; font-size: 0.9em; }
        .toggle-icon { float: right; transition: transform 0.3s; }
        .toggle-icon.active { transform: rotate(180deg); }
        </style>"""
    def _get_scripts(self) -> str:
        """Get embedded JavaScript."""
        return """<script>
        function toggleTable(id) {
            const content = document.getElementById('content-' + id);
            const icon = document.getElementById('icon-' + id);
            content.classList.toggle('active');
            icon.classList.toggle('active');
        }
        </script>"""
    def _build_header(self, summary: InvestigationSummary) -> str:
        """Build report header."""
        return f"""<div class="header">
        <h1>🔍 Investigation Report</h1>
        <p>Analysis Directory: {summary.analysis_directory}</p>
    </div>
    <div class="info-grid">
        <div class="info-box">
            <div class="info-label">Start Time</div>
            <div class="info-value">{summary.start_time}</div>
        </div>
        <div class="info-box">
            <div class="info-label">End Time</div>
            <div class="info-value">{summary.end_time}</div>
        </div>
        <div class="info-box">
            <div class="info-label">Duration</div>
            <div class="info-value">{format_duration(summary.duration_seconds)}</div>
        </div>
        <div class="info-box">
            <div class="info-label">Baseline</div>
            <div class="info-value">{summary.baseline_info}</div>
        </div>
        <div class="info-box">
            <div class="info-label">Target</div>
            <div class="info-value">{summary.target_info}</div>
        </div>
        <div class="info-box">
            <div class="info-label">Total Queries</div>
            <div class="info-value">{summary.total_queries_executed}</div>
        </div>
    </div>"""
    def _build_summary(self, summary: InvestigationSummary) -> str:
        """Build summary section."""
        return f"""<h2>Summary</h2>
    <div class="summary-grid">
        <div class="summary-box success">
            <div class="summary-number">{summary.tables_successful}</div>
            <div class="summary-label">Successful</div>
        </div>
        <div class="summary-box partial">
            <div class="summary-number">{summary.tables_partial}</div>
            <div class="summary-label">Partial</div>
        </div>
        <div class="summary-box failed">
            <div class="summary-number">{summary.tables_failed}</div>
            <div class="summary-label">Failed</div>
        </div>
    </div>"""
    def _build_table_results(self, summary: InvestigationSummary) -> str:
        """Build table-by-table results."""
        html = '<h2>Investigation Results</h2>'
        for idx, table_result in enumerate(summary.results):
            html += f"""<div class="table-card">
                <div class="table-header" onclick="toggleTable({idx})">
                    <span class="table-name">{table_result.full_name}</span>
                    <span class="table-status status-{table_result.overall_status.value}">{table_result.overall_status.value}</span>
                    <span class="toggle-icon" id="icon-{idx}">▼</span>
                </div>
                <div class="table-content" id="content-{idx}">
                    <p><strong>SQL File:</strong> {table_result.sql_file_path}</p>
                    <p><strong>Total Queries:</strong> {table_result.total_queries}</p>
                    <p><strong>Successful Queries:</strong> {table_result.successful_queries}</p>
                    {self._build_queries(table_result)}
                </div>
            </div>"""
        return html
    def _build_queries(self, table_result) -> str:
        """Build query results for a table."""
        html = ""
        for i, (baseline_result, target_result) in enumerate(zip(
            table_result.baseline_results,
            table_result.target_results
        ), 1):
            html += f"""<div class="query-section">
                <div class="query-header">Query {baseline_result.query_number}</div>
                <details>
                    <summary>View SQL</summary>
                    <div class="query-code">{self._escape_html(baseline_result.query_text)}</div>
                </details>
                <div class="comparison-grid">
                    {self._build_query_result(baseline_result, "Baseline")}
                    {self._build_query_result(target_result, "Target")}
                </div>
            </div>"""
        return html
    def _build_query_result(self, result: QueryExecutionResult, env: str) -> str:
        """Build single query result."""
        html = f"""<div class="env-section">
            <div class="env-title">{env}</div>
            <span class="table-status status-{result.status.value}">{result.status.value}</span>
            <div class="result-meta">
                <span>⏱️ {result.execution_time_ms}ms</span>
                <span>📊 {result.row_count} rows</span>
            </div>"""
        if result.error_message:
            html += f'<div class="error-box">❌ {self._escape_html(result.error_message)}</div>'
        elif result.result_data is not None and not result.result_data.empty:
            html += self._build_result_table(result)
        html += '</div>'
        return html
    def _build_result_table(self, result: QueryExecutionResult) -> str:
        """Build HTML table from DataFrame."""
        df = result.result_data
        if df is None or df.empty:
            return '<p>No data returned</p>'
        # Limit rows
        display_df = df.head(self.max_rows)
        html = '<table class="result-table"><thead><tr>'
        for col in display_df.columns:
            html += f'<th>{self._escape_html(str(col))}</th>'
        html += '</tr></thead><tbody>'
        for _, row in display_df.iterrows():
            html += '<tr>'
            for val in row:
                html += f'<td>{self._escape_html(str(val))}</td>'
            html += '</tr>'
        html += '</tbody></table>'
        if len(df) > self.max_rows:
            html += f'<p><em>Showing first {self.max_rows} of {len(df)} rows</em></p>'
        return html
    def _escape_html(self, text: str) -> str:
        """Escape HTML special characters."""
        return (text
                .replace('&', '&amp;')
                .replace('<', '&lt;')
                .replace('>', '&gt;')
                .replace('"', '&quot;')
                .replace("'", '&#39;'))
    def _build_footer(self, summary: InvestigationSummary) -> str:
        """Build report footer."""
        return f"""<div class="footer">
        <p>Generated by Data Regression Testing Framework - Investigation Module</p>
        <p>Success Rate: {summary.success_rate:.1f}%</p>
    </div>"""
 class InvestigationCSVReportGenerator:
    """Generates CSV format investigation reports."""
    def __init__(self, config: Config):
        """
        Initialize CSV generator.
        Args:
            config: Configuration object
        """
        self.config = config
    def generate(self, summary: InvestigationSummary, filepath: Path) -> None:
        """
        Generate CSV investigation report.
        Args:
            summary: Investigation summary
            filepath: Output file path
        """
        csv_config = self.config.reporting.csv
        delimiter = csv_config.get("delimiter", ",")
        encoding = csv_config.get("encoding", "utf-8-sig")
        with open(filepath, "w", newline="", encoding=encoding) as f:
            writer = csv.writer(f, delimiter=delimiter)
            # Write header
            writer.writerow([
                "Timestamp",
                "Schema",
                "Table",
                "Query_Number",
                "Environment",
                "Status",
                "Row_Count",
                "Execution_Time_Ms",
                "Error_Message",
                "SQL_File_Path"
            ])
            # Write data rows
            for table_result in summary.results:
                # Baseline results
                for query_result in table_result.baseline_results:
                    writer.writerow([
                        table_result.timestamp,
                        table_result.schema,
                        table_result.table,
                        query_result.query_number,
                        "baseline",
                        query_result.status.value,
                        query_result.row_count,
                        query_result.execution_time_ms,
                        query_result.error_message or "",
                        table_result.sql_file_path
                    ])
                # Target results
                for query_result in table_result.target_results:
                    writer.writerow([
                        table_result.timestamp,
                        table_result.schema,
                        table_result.table,
                        query_result.query_number,
                        "target",
                        query_result.status.value,
                        query_result.row_count,
                        query_result.execution_time_ms,
                        query_result.error_message or "",
                        table_result.sql_file_path
                    ])
        logger.debug(f"Investigation CSV report written to {filepath}")
--- a/src/drt/services/init.py
+++ b/src/drt/services/init.py
@@ -0,0 +1,6 @@
 """Business logic services."""
 from drt.services.discovery import DiscoveryService
 from drt.services.comparison import ComparisonService
 __all__ = ["DiscoveryService", "ComparisonService"]
--- a/src/drt/services/checkers/init.py
+++ b/src/drt/services/checkers/init.py
@@ -0,0 +1,15 @@
 """Comparison checkers."""
 from drt.services.checkers.base import BaseChecker
 from drt.services.checkers.existence import ExistenceChecker
 from drt.services.checkers.row_count import RowCountChecker
 from drt.services.checkers.schema import SchemaChecker
 from drt.services.checkers.aggregate import AggregateChecker
 __all__ = [
    "BaseChecker",
    "ExistenceChecker",
    "RowCountChecker",
    "SchemaChecker",
    "AggregateChecker",
 ]
--- a/src/drt/services/checkers/aggregate.py
+++ b/src/drt/services/checkers/aggregate.py
@@ -0,0 +1,111 @@
 """Aggregate checker."""
 import time
 from drt.services.checkers.base import BaseChecker
 from drt.models.results import CheckResult
 from drt.models.table import TableInfo
 from drt.models.enums import Status, CheckType
 from drt.utils.logging import get_logger
 logger = get_logger(__name__)
 class AggregateChecker(BaseChecker):
    """Checks aggregate sums for numeric columns."""
    def check(self, table: TableInfo) -> CheckResult:
        """
        Check aggregate sums.
        Args:
            table: Table information
        Returns:
            Check result
        """
        if not self.config.comparison.aggregates.enabled:
            return CheckResult(
                check_type=CheckType.AGGREGATE,
                status=Status.SKIP,
                message="Aggregate check disabled"
            )
        if not table.aggregate_columns:
            return CheckResult(
                check_type=CheckType.AGGREGATE,
                status=Status.SKIP,
                message="No aggregate columns configured"
            )
        try:
            # Time baseline query
            baseline_start = time.time()
            baseline_sums = self.baseline_executor.get_aggregate_sums(
                table.schema, table.name, table.aggregate_columns
            )
            baseline_time = (time.time() - baseline_start) * 1000
            logger.debug(f"    └─ Baseline aggregate query: {baseline_time:.0f}ms")
            # Time target query
            target_start = time.time()
            target_sums = self.target_executor.get_aggregate_sums(
                table.schema, table.name, table.aggregate_columns
            )
            target_time = (time.time() - target_start) * 1000
            logger.debug(f"    └─ Target aggregate query: {target_time:.0f}ms")
            logger.debug(f"    └─ Total aggregate time: {baseline_time + target_time:.0f}ms (could be parallelized)")
            tolerance_pct = self.config.comparison.aggregates.tolerance_percent
            issues = []
            statuses = []
            for col in table.aggregate_columns:
                baseline_val = baseline_sums.get(col, 0.0)
                target_val = target_sums.get(col, 0.0)
                if baseline_val == target_val:
                    continue
                # Calculate percentage difference
                if baseline_val != 0:
                    pct_diff = abs((target_val - baseline_val) / baseline_val * 100)
                else:
                    pct_diff = 100.0 if target_val != 0 else 0.0
                if pct_diff > tolerance_pct:
                    statuses.append(Status.FAIL)
                    issues.append(
                        f"Column '{col}': SUM differs by {pct_diff:.2f}% "
                        f"(Baseline: {baseline_val:,.2f}, Target: {target_val:,.2f})"
                    )
            # Determine overall status
            if not statuses:
                status = Status.PASS
                message = f"All {len(table.aggregate_columns)} aggregate(s) match"
            else:
                status = Status.most_severe(statuses)
                message = "; ".join(issues)
            return CheckResult(
                check_type=CheckType.AGGREGATE,
                status=status,
                baseline_value=baseline_sums,
                target_value=target_sums,
                message=message,
                details={
                    "baseline_sums": baseline_sums,
                    "target_sums": target_sums,
                    "tolerance_percent": tolerance_pct,
                    "columns_checked": table.aggregate_columns,
                    "issues": issues
                }
            )
        except Exception as e:
            logger.error(f"Aggregate check failed for {table.full_name}: {e}")
            return CheckResult(
                check_type=CheckType.AGGREGATE,
                status=Status.ERROR,
                message=f"Aggregate check error: {str(e)}"
            )
--- a/src/drt/services/checkers/base.py
+++ b/src/drt/services/checkers/base.py
@@ -0,0 +1,42 @@
 """Base checker class."""
 from abc import ABC, abstractmethod
 from drt.models.results import CheckResult
 from drt.models.table import TableInfo
 from drt.database.executor import QueryExecutor
 from drt.config.models import Config
 class BaseChecker(ABC):
    """Abstract base class for all checkers."""
    def __init__(
        self,
        baseline_executor: QueryExecutor,
        target_executor: QueryExecutor,
        config: Config
    ):
        """
        Initialize checker.
        Args:
            baseline_executor: Query executor for baseline database
            target_executor: Query executor for target database
            config: Configuration object
        """
        self.baseline_executor = baseline_executor
        self.target_executor = target_executor
        self.config = config
    @abstractmethod
    def check(self, table: TableInfo) -> CheckResult:
        """
        Perform the check.
        Args:
            table: Table information
        Returns:
            Check result
        """
        pass
--- a/src/drt/services/checkers/existence.py
+++ b/src/drt/services/checkers/existence.py
@@ -0,0 +1,78 @@
 """Table existence checker."""
 import time
 from drt.services.checkers.base import BaseChecker
 from drt.models.results import CheckResult
 from drt.models.table import TableInfo
 from drt.models.enums import Status, CheckType
 from drt.utils.logging import get_logger
 logger = get_logger(__name__)
 class ExistenceChecker(BaseChecker):
    """Checks if table exists in both baseline and target."""
    def check(self, table: TableInfo) -> CheckResult:
        """
        Check table existence.
        Args:
            table: Table information
        Returns:
            Check result
        """
        try:
            # Time baseline query
            baseline_start = time.time()
            baseline_exists = self.baseline_executor.table_exists(table.schema, table.name)
            baseline_time = (time.time() - baseline_start) * 1000
            logger.debug(f"    └─ Baseline existence query: {baseline_time:.0f}ms")
            # Time target query
            target_start = time.time()
            target_exists = self.target_executor.table_exists(table.schema, table.name)
            target_time = (time.time() - target_start) * 1000
            logger.debug(f"    └─ Target existence query: {target_time:.0f}ms")
            logger.debug(f"    └─ Total existence time: {baseline_time + target_time:.0f}ms (could be parallelized)")
            # Determine status
            if baseline_exists and target_exists:
                status = Status.PASS
                message = "Table exists in both databases"
            elif baseline_exists and not target_exists:
                # Table missing in target
                if table.expected_in_target:
                    status = Status.FAIL
                    message = "Table exists in Baseline but missing in Target (REGRESSION)"
                else:
                    status = Status.INFO
                    message = "Table removed from Target (expected per configuration)"
            elif not baseline_exists and target_exists:
                status = Status.INFO
                message = "Table exists only in Target (new table)"
            else:
                status = Status.ERROR
                message = "Table does not exist in either database"
            return CheckResult(
                check_type=CheckType.EXISTENCE,
                status=status,
                baseline_value=baseline_exists,
                target_value=target_exists,
                message=message,
                details={
                    "baseline_exists": baseline_exists,
                    "target_exists": target_exists,
                    "expected_in_target": table.expected_in_target
                }
            )
        except Exception as e:
            logger.error(f"Existence check failed for {table.full_name}: {e}")
            return CheckResult(
                check_type=CheckType.EXISTENCE,
                status=Status.ERROR,
                message=f"Existence check error: {str(e)}"
            )
--- a/src/drt/services/checkers/row_count.py
+++ b/src/drt/services/checkers/row_count.py
@@ -0,0 +1,90 @@
 """Row count checker."""
 import time
 from drt.services.checkers.base import BaseChecker
 from drt.models.results import CheckResult
 from drt.models.table import TableInfo
 from drt.models.enums import Status, CheckType
 from drt.utils.logging import get_logger
 logger = get_logger(__name__)
 class RowCountChecker(BaseChecker):
    """Checks row count differences between baseline and target."""
    def check(self, table: TableInfo) -> CheckResult:
        """
        Check row counts.
        Args:
            table: Table information
        Returns:
            Check result
        """
        if not self.config.comparison.row_count.enabled:
            return CheckResult(
                check_type=CheckType.ROW_COUNT,
                status=Status.SKIP,
                message="Row count check disabled"
            )
        try:
            # Time baseline query
            baseline_start = time.time()
            baseline_count = self.baseline_executor.get_row_count(table.schema, table.name)
            baseline_time = (time.time() - baseline_start) * 1000
            logger.debug(f"    └─ Baseline row count query: {baseline_time:.0f}ms")
            # Time target query
            target_start = time.time()
            target_count = self.target_executor.get_row_count(table.schema, table.name)
            target_time = (time.time() - target_start) * 1000
            logger.debug(f"    └─ Target row count query: {target_time:.0f}ms")
            logger.debug(f"    └─ Total row count time: {baseline_time + target_time:.0f}ms (could be parallelized)")
            difference = target_count - baseline_count
            tolerance_pct = self.config.comparison.row_count.tolerance_percent
            # Determine status
            if baseline_count == target_count:
                status = Status.PASS
                message = f"Row counts match: {baseline_count:,}"
            elif target_count > baseline_count:
                pct_diff = (difference / baseline_count * 100) if baseline_count > 0 else 0
                status = Status.WARNING
                message = f"Target has {difference:,} more rows (+{pct_diff:.2f}%)"
            else:  # target_count < baseline_count
                pct_diff = abs(difference / baseline_count * 100) if baseline_count > 0 else 0
                if pct_diff <= tolerance_pct:
                    status = Status.WARNING
                    message = f"Target has {abs(difference):,} fewer rows (-{pct_diff:.2f}%) - within tolerance"
                else:
                    status = Status.FAIL
                    message = f"Target missing {abs(difference):,} rows (-{pct_diff:.2f}%) - REGRESSION"
            return CheckResult(
                check_type=CheckType.ROW_COUNT,
                status=status,
                baseline_value=baseline_count,
                target_value=target_count,
                difference=difference,
                message=message,
                details={
                    "baseline_count": baseline_count,
                    "target_count": target_count,
                    "difference": difference,
                    "percent_difference": (difference / baseline_count * 100) if baseline_count > 0 else 0,
                    "tolerance_percent": tolerance_pct
                }
            )
        except Exception as e:
            logger.error(f"Row count check failed for {table.full_name}: {e}")
            return CheckResult(
                check_type=CheckType.ROW_COUNT,
                status=Status.ERROR,
                message=f"Row count check error: {str(e)}"
            )
--- a/src/drt/services/checkers/schema.py
+++ b/src/drt/services/checkers/schema.py
@@ -0,0 +1,132 @@
 """Schema checker."""
 import time
 from typing import Set
 from drt.services.checkers.base import BaseChecker
 from drt.models.results import CheckResult
 from drt.models.table import TableInfo
 from drt.models.enums import Status, CheckType
 from drt.utils.logging import get_logger
 logger = get_logger(__name__)
 class SchemaChecker(BaseChecker):
    """Checks schema differences between baseline and target."""
    def check(self, table: TableInfo) -> CheckResult:
        """
        Check schema compatibility.
        Args:
            table: Table information
        Returns:
            Check result
        """
        if not self.config.comparison.schema.enabled:
            return CheckResult(
                check_type=CheckType.SCHEMA,
                status=Status.SKIP,
                message="Schema check disabled"
            )
        try:
            # Time baseline query
            baseline_start = time.time()
            baseline_cols = self.baseline_executor.get_columns(table.schema, table.name)
            baseline_time = (time.time() - baseline_start) * 1000
            logger.debug(f"    └─ Baseline schema query: {baseline_time:.0f}ms")
            # Time target query
            target_start = time.time()
            target_cols = self.target_executor.get_columns(table.schema, table.name)
            target_time = (time.time() - target_start) * 1000
            logger.debug(f"    └─ Target schema query: {target_time:.0f}ms")
            logger.debug(f"    └─ Total schema time: {baseline_time + target_time:.0f}ms (could be parallelized)")
            baseline_col_names = {col['COLUMN_NAME'] for col in baseline_cols}
            target_col_names = {col['COLUMN_NAME'] for col in target_cols}
            missing_in_target = baseline_col_names - target_col_names
            extra_in_target = target_col_names - baseline_col_names
            issues = []
            statuses = []
            # Check for missing columns
            if missing_in_target:
                severity = self.config.comparison.schema.severity.get(
                    "missing_column_in_target", "FAIL"
                )
                statuses.append(Status[severity])
                issues.append(f"Missing columns in Target: {', '.join(sorted(missing_in_target))}")
            # Check for extra columns
            if extra_in_target:
                severity = self.config.comparison.schema.severity.get(
                    "extra_column_in_target", "WARNING"
                )
                statuses.append(Status[severity])
                issues.append(f"Extra columns in Target: {', '.join(sorted(extra_in_target))}")
            # Check data types for matching columns
            if self.config.comparison.schema.checks.get("data_types", True):
                type_mismatches = self._check_data_types(baseline_cols, target_cols)
                if type_mismatches:
                    severity = self.config.comparison.schema.severity.get(
                        "data_type_mismatch", "WARNING"
                    )
                    statuses.append(Status[severity])
                    issues.extend(type_mismatches)
            # Determine overall status
            if not statuses:
                status = Status.PASS
                message = f"Schema matches: {len(baseline_col_names)} columns"
            else:
                status = Status.most_severe(statuses)
                message = "; ".join(issues)
            return CheckResult(
                check_type=CheckType.SCHEMA,
                status=status,
                baseline_value=len(baseline_col_names),
                target_value=len(target_col_names),
                message=message,
                details={
                    "baseline_columns": sorted(baseline_col_names),
                    "target_columns": sorted(target_col_names),
                    "missing_in_target": sorted(missing_in_target),
                    "extra_in_target": sorted(extra_in_target),
                    "issues": issues
                }
            )
        except Exception as e:
            logger.error(f"Schema check failed for {table.full_name}: {e}")
            return CheckResult(
                check_type=CheckType.SCHEMA,
                status=Status.ERROR,
                message=f"Schema check error: {str(e)}"
            )
    def _check_data_types(self, baseline_cols: list, target_cols: list) -> list:
        """Check for data type mismatches."""
        mismatches = []
        # Create lookup dictionaries
        baseline_types = {col['COLUMN_NAME']: col['DATA_TYPE'] for col in baseline_cols}
        target_types = {col['COLUMN_NAME']: col['DATA_TYPE'] for col in target_cols}
        # Check common columns
        common_cols = set(baseline_types.keys()) & set(target_types.keys())
        for col in sorted(common_cols):
            if baseline_types[col] != target_types[col]:
                mismatches.append(
                    f"Column '{col}': type mismatch "
                    f"(Baseline: {baseline_types[col]}, Target: {target_types[col]})"
                )
        return mismatches
--- a/src/drt/services/comparison.py
+++ b/src/drt/services/comparison.py
@@ -0,0 +1,250 @@
 """Comparison service for executing database comparisons."""
 import time
 from typing import List
 from drt.database.connection import ConnectionManager
 from drt.database.executor import QueryExecutor
 from drt.config.models import Config, DatabasePairConfig
 from drt.models.table import TableInfo
 from drt.models.results import ComparisonResult
 from drt.models.summary import ExecutionSummary
 from drt.models.enums import Status
 from drt.services.checkers import (
    ExistenceChecker,
    RowCountChecker,
    SchemaChecker,
    AggregateChecker
 )
 from drt.utils.logging import get_logger
 from drt.utils.timestamps import get_timestamp
 from drt.utils.patterns import matches_pattern
 logger = get_logger(__name__)
 class ComparisonService:
    """Service for comparing baseline and target databases."""
    def __init__(self, config: Config):
        """
        Initialize comparison service.
        Args:
            config: Configuration object
        """
        self.config = config
    def run_comparison(self, db_pair: DatabasePairConfig) -> ExecutionSummary:
        """
        Run comparison for a database pair.
        Args:
            db_pair: Database pair configuration
        Returns:
            Execution summary with results
        """
        start_time = get_timestamp()
        start_ts = time.time()
        logger.info("=" * 60)
        logger.info(f"Starting comparison: {db_pair.name}")
        logger.info("=" * 60)
        # Initialize connections
        baseline_mgr = ConnectionManager(db_pair.baseline)
        target_mgr = ConnectionManager(db_pair.target)
        try:
            # Connect to databases
            baseline_mgr.connect()
            target_mgr.connect()
            # Create executors
            baseline_executor = QueryExecutor(baseline_mgr)
            target_executor = QueryExecutor(target_mgr)
            # Initialize checkers
            existence_checker = ExistenceChecker(baseline_executor, target_executor, self.config)
            row_count_checker = RowCountChecker(baseline_executor, target_executor, self.config)
            schema_checker = SchemaChecker(baseline_executor, target_executor, self.config)
            aggregate_checker = AggregateChecker(baseline_executor, target_executor, self.config)
            # Get tables to compare
            tables = self._get_tables_to_compare()
            logger.info(f"Tables to compare: {len(tables)}")
            # Create summary
            summary = ExecutionSummary(
                start_time=start_time,
                end_time="",
                duration_seconds=0,
                config_file=self.config.metadata.generated_date or "",
                baseline_info=f"{db_pair.baseline.server}.{db_pair.baseline.database}",
                target_info=f"{db_pair.target.server}.{db_pair.target.database}"
            )
            # Compare each table
            for idx, table in enumerate(tables, 1):
                if not table.enabled:
                    logger.info(f"[{idx:3d}/{len(tables)}] {table.full_name:40s} SKIP (disabled)")
                    result = self._create_skipped_result(table)
                    summary.add_result(result)
                    continue
                logger.info(f"[{idx:3d}/{len(tables)}] {table.full_name:40s} ...", extra={'end': ''})
                result = self._compare_table(
                    table,
                    existence_checker,
                    row_count_checker,
                    schema_checker,
                    aggregate_checker
                )
                summary.add_result(result)
                # Log result
                status_symbol = self._get_status_symbol(result.overall_status)
                logger.info(f" {status_symbol} {result.overall_status.value}")
                if not self.config.execution.continue_on_error and result.overall_status == Status.ERROR:
                    logger.error("Stopping due to error (continue_on_error=False)")
                    break
            # Finalize summary
            end_time = get_timestamp()
            duration = int(time.time() - start_ts)
            summary.end_time = end_time
            summary.duration_seconds = duration
            # Log summary
            self._log_summary(summary)
            return summary
        finally:
            baseline_mgr.disconnect()
            target_mgr.disconnect()
    def _compare_table(
        self,
        table: TableInfo,
        existence_checker: ExistenceChecker,
        row_count_checker: RowCountChecker,
        schema_checker: SchemaChecker,
        aggregate_checker: AggregateChecker
    ) -> ComparisonResult:
        """Compare a single table."""
        start_ms = time.time() * 1000
        result = ComparisonResult(
            table=table,
            overall_status=Status.PASS,
            timestamp=get_timestamp()
        )
        try:
            # Check existence first
            check_start = time.time()
            existence_result = existence_checker.check(table)
            existence_time = (time.time() - check_start) * 1000
            logger.debug(f"  └─ Existence check: {existence_time:.0f}ms")
            result.add_check(existence_result)
            # Only proceed with other checks if table exists in both
            if existence_result.status == Status.PASS:
                # Row count check
                check_start = time.time()
                row_count_result = row_count_checker.check(table)
                row_count_time = (time.time() - check_start) * 1000
                logger.debug(f"  └─ Row count check: {row_count_time:.0f}ms")
                result.add_check(row_count_result)
                # Schema check
                check_start = time.time()
                schema_result = schema_checker.check(table)
                schema_time = (time.time() - check_start) * 1000
                logger.debug(f"  └─ Schema check: {schema_time:.0f}ms")
                result.add_check(schema_result)
                # Aggregate check
                check_start = time.time()
                aggregate_result = aggregate_checker.check(table)
                aggregate_time = (time.time() - check_start) * 1000
                logger.debug(f"  └─ Aggregate check: {aggregate_time:.0f}ms")
                result.add_check(aggregate_result)
        except Exception as e:
            logger.error(f"Comparison failed for {table.full_name}: {e}")
            result.overall_status = Status.ERROR
            result.error_message = str(e)
        result.execution_time_ms = int(time.time() * 1000 - start_ms)
        logger.debug(f"  └─ Total table time: {result.execution_time_ms}ms")
        return result
    def _get_tables_to_compare(self) -> List[TableInfo]:
        """Get list of tables to compare based on configuration."""
        tables = []
        for table_config in self.config.tables:
            table = TableInfo(
                schema=table_config.schema,
                name=table_config.name,
                enabled=table_config.enabled,
                expected_in_target=table_config.expected_in_target,
                estimated_row_count=table_config.estimated_row_count,
                primary_key_columns=table_config.primary_key_columns,
                aggregate_columns=table_config.aggregate_columns,
                notes=table_config.notes
            )
            tables.append(table)
        # Apply filters
        if self.config.table_filters.mode == "include_list":
            if self.config.table_filters.include_list:
                include_names = {f"{t['schema']}.{t['name']}" for t in self.config.table_filters.include_list}
                tables = [t for t in tables if t.full_name in include_names]
        # Apply exclusions
        tables = [
            t for t in tables
            if not matches_pattern(t.name, self.config.table_filters.exclude_patterns)
            and t.schema not in self.config.table_filters.exclude_schemas
        ]
        return tables
    def _create_skipped_result(self, table: TableInfo) -> ComparisonResult:
        """Create a skipped result for disabled tables."""
        return ComparisonResult(
            table=table,
            overall_status=Status.SKIP,
            timestamp=get_timestamp()
        )
    def _get_status_symbol(self, status: Status) -> str:
        """Get symbol for status."""
        symbols = {
            Status.PASS: "✓",
            Status.FAIL: "✗",
            Status.WARNING: "⚠",
            Status.ERROR: "🔴",
            Status.INFO: "ℹ",
            Status.SKIP: "○"
        }
        return symbols.get(status, "?")
    def _log_summary(self, summary: ExecutionSummary) -> None:
        """Log execution summary."""
        logger.info("=" * 60)
        logger.info("COMPARISON SUMMARY")
        logger.info("=" * 60)
        logger.info(f"  PASS:    {summary.passed:3d} | FAIL:    {summary.failed:3d}")
        logger.info(f"  WARNING: {summary.warnings:3d} | ERROR:   {summary.errors:3d}")
        logger.info(f"  INFO:    {summary.info:3d} | SKIP:    {summary.skipped:3d}")
        logger.info("=" * 60)
        logger.info(f"Duration: {summary.duration_seconds} seconds")
        logger.info(f"Success Rate: {summary.success_rate:.1f}%")
        logger.info("=" * 60)
--- a/src/drt/services/discovery.py
+++ b/src/drt/services/discovery.py
@@ -0,0 +1,192 @@
 """Discovery service for auto-generating configuration."""
 from typing import List
 from drt.database.connection import ConnectionManager
 from drt.database.executor import QueryExecutor
 from drt.database.queries import SQLQueries
 from drt.models.table import TableInfo, ColumnInfo
 from drt.config.models import Config, TableConfig, MetadataConfig, ConnectionConfig
 from drt.utils.logging import get_logger
 from drt.utils.timestamps import get_timestamp
 from drt.utils.patterns import matches_pattern
 logger = get_logger(__name__)
 class DiscoveryService:
    """Service for discovering database tables and generating configuration."""
    def __init__(self, connection_config: ConnectionConfig, config: Config = None):
        """
        Initialize discovery service.
        Args:
            connection_config: Connection configuration for baseline database
            config: Optional existing configuration for discovery settings
        """
        self.conn_config = connection_config
        self.config = config or Config()
        self.conn_mgr = ConnectionManager(connection_config)
        self.executor = QueryExecutor(self.conn_mgr)
    def discover_tables(self) -> List[TableInfo]:
        """
        Discover all tables in the database.
        Returns:
            List of discovered tables
        """
        logger.info("Starting table discovery...")
        try:
            # Get all tables
            tables_data = self.executor.get_all_tables()
            logger.info(f"Found {len(tables_data)} tables")
            discovered_tables = []
            for table_data in tables_data:
                schema = table_data['schema_name']
                name = table_data['table_name']
                estimated_rows = table_data.get('estimated_rows', 0)
                # Apply filters
                if self._should_exclude_table(schema, name):
                    logger.debug(f"Excluding table: {schema}.{name}")
                    continue
                # Get column information
                columns = self._discover_columns(schema, name)
                # Get primary keys
                pk_columns = self.executor.get_primary_keys(schema, name)
                # Identify numeric columns for aggregation
                aggregate_cols = [
                    col.name for col in columns 
                    if col.is_numeric and self.config.discovery.detect_numeric_columns
                ]
                table_info = TableInfo(
                    schema=schema,
                    name=name,
                    estimated_row_count=estimated_rows,
                    columns=columns,
                    primary_key_columns=pk_columns,
                    enabled=True,
                    expected_in_target=self.config.discovery.default_expected_in_target,
                    aggregate_columns=aggregate_cols,
                    notes=""
                )
                discovered_tables.append(table_info)
                logger.debug(f"Discovered: {table_info.full_name} ({estimated_rows:,} rows)")
            logger.info(f"Discovery complete: {len(discovered_tables)} tables discovered")
            return discovered_tables
        except Exception as e:
            logger.error(f"Discovery failed: {e}")
            raise
    def _discover_columns(self, schema: str, table: str) -> List[ColumnInfo]:
        """Discover columns for a table."""
        import math
        columns_data = self.executor.get_columns(schema, table)
        columns = []
        for idx, col_data in enumerate(columns_data, 1):
            is_numeric = SQLQueries.is_numeric_type(col_data['DATA_TYPE'])
            # Convert nan to None for Pydantic validation
            # Pandas converts SQL NULL to nan, but Pydantic v2 rejects nan for Optional[int]
            max_length = col_data.get('CHARACTER_MAXIMUM_LENGTH')
            if isinstance(max_length, float) and math.isnan(max_length):
                max_length = None
            precision = col_data.get('NUMERIC_PRECISION')
            if isinstance(precision, float) and math.isnan(precision):
                precision = None
            scale = col_data.get('NUMERIC_SCALE')
            if isinstance(scale, float) and math.isnan(scale):
                scale = None
            # DEBUG: Log converted values to verify fix
            logger.debug(f"Column {col_data['COLUMN_NAME']}: max_length={max_length} (converted from {col_data.get('CHARACTER_MAXIMUM_LENGTH')}), "
                        f"precision={precision}, scale={scale}, data_type={col_data['DATA_TYPE']}")
            column = ColumnInfo(
                name=col_data['COLUMN_NAME'],
                data_type=col_data['DATA_TYPE'],
                max_length=max_length,
                precision=precision,
                scale=scale,
                is_nullable=col_data['IS_NULLABLE'] == 'YES',
                is_numeric=is_numeric,
                ordinal_position=col_data.get('ORDINAL_POSITION', idx)
            )
            columns.append(column)
        return columns
    def _should_exclude_table(self, schema: str, table: str) -> bool:
        """Check if table should be excluded based on filters."""
        # Check schema exclusions
        if schema in self.config.discovery.exclude_schemas:
            return True
        # Check table name patterns
        if matches_pattern(table, self.config.discovery.exclude_patterns):
            return True
        # Check schema inclusions (if specified)
        if self.config.discovery.include_schemas:
            if schema not in self.config.discovery.include_schemas:
                return True
        return False
    def generate_config(self, tables: List[TableInfo]) -> Config:
        """
        Generate configuration from discovered tables.
        Args:
            tables: List of discovered tables
        Returns:
            Generated configuration
        """
        logger.info("Generating configuration...")
        # Create table configs
        table_configs = [
            TableConfig(
                schema=table.schema,
                name=table.name,
                enabled=table.enabled,
                expected_in_target=table.expected_in_target,
                estimated_row_count=table.estimated_row_count,
                primary_key_columns=table.primary_key_columns,
                aggregate_columns=table.aggregate_columns,
                notes=table.notes
            )
            for table in tables
        ]
        # Update metadata
        metadata = MetadataConfig(
            config_version="1.0",
            generated_date=get_timestamp(),
            generated_by="discovery",
            framework_version="1.0.0"
        )
        # Create new config with discovered tables
        config = Config(
            metadata=metadata,
            tables=table_configs
        )
        logger.info(f"Configuration generated with {len(table_configs)} tables")
        return config
--- a/src/drt/services/investigation.py
+++ b/src/drt/services/investigation.py
@@ -0,0 +1,297 @@
 """Investigation service for executing investigation queries."""
 import time
 from pathlib import Path
 from typing import List, Tuple
 from drt.database.connection import ConnectionManager
 from drt.database.executor import QueryExecutor
 from drt.config.models import Config, DatabasePairConfig
 from drt.models.investigation import (
    QueryExecutionResult,
    TableInvestigationResult,
    InvestigationSummary
 )
 from drt.models.enums import Status
 from drt.services.sql_parser import SQLParser, discover_sql_files
 from drt.utils.logging import get_logger
 from drt.utils.timestamps import get_timestamp
 logger = get_logger(__name__)
 class InvestigationService:
    """Service for executing investigation queries."""
    def __init__(self, config: Config):
        """
        Initialize investigation service.
        Args:
            config: Configuration object
        """
        self.config = config
        self.parser = SQLParser()
    def run_investigation(
        self,
        analysis_dir: Path,
        db_pair: DatabasePairConfig
    ) -> InvestigationSummary:
        """
        Run investigation for all SQL files in analysis directory.
        Args:
            analysis_dir: Path to analysis output directory
            db_pair: Database pair configuration
        Returns:
            Investigation summary with all results
        """
        start_time = get_timestamp()
        start_ts = time.time()
        logger.info("=" * 60)
        logger.info(f"Starting investigation: {analysis_dir.name}")
        logger.info("=" * 60)
        # Initialize connections
        baseline_mgr = ConnectionManager(db_pair.baseline)
        target_mgr = ConnectionManager(db_pair.target)
        try:
            # Connect to databases
            baseline_mgr.connect()
            target_mgr.connect()
            # Create executors
            baseline_executor = QueryExecutor(baseline_mgr)
            target_executor = QueryExecutor(target_mgr)
            # Discover SQL files
            sql_files = discover_sql_files(analysis_dir)
            logger.info(f"Found {len(sql_files)} investigation files")
            # Create summary
            summary = InvestigationSummary(
                start_time=start_time,
                end_time="",
                duration_seconds=0,
                analysis_directory=str(analysis_dir),
                baseline_info=f"{db_pair.baseline.server}.{db_pair.baseline.database}",
                target_info=f"{db_pair.target.server}.{db_pair.target.database}",
                tables_processed=0,
                tables_successful=0,
                tables_partial=0,
                tables_failed=0,
                total_queries_executed=0,
                results=[]
            )
            # Process each SQL file
            for idx, (schema, table, sql_path) in enumerate(sql_files, 1):
                logger.info(f"[{idx:3d}/{len(sql_files)}] {schema}.{table:40s} ...")
                result = self._investigate_table(
                    schema,
                    table,
                    sql_path,
                    baseline_executor,
                    target_executor
                )
                summary.results.append(result)
                summary.tables_processed += 1
                # Update counters
                if result.overall_status == Status.PASS:
                    summary.tables_successful += 1
                elif result.overall_status == Status.SKIP:
                    # Don't count skipped tables in partial/failed
                    pass
                elif result.overall_status in [Status.WARNING, Status.INFO]:
                    # Treat WARNING/INFO as partial success
                    summary.tables_partial += 1
                elif self._is_partial_status(result):
                    summary.tables_partial += 1
                else:
                    summary.tables_failed += 1
                # Count queries
                summary.total_queries_executed += len(result.baseline_results)
                summary.total_queries_executed += len(result.target_results)
                logger.info(f" {self._get_status_symbol(result.overall_status)} "
                          f"{result.overall_status.value}")
            # Finalize summary
            end_time = get_timestamp()
            duration = int(time.time() - start_ts)
            summary.end_time = end_time
            summary.duration_seconds = duration
            self._log_summary(summary)
            return summary
        finally:
            baseline_mgr.disconnect()
            target_mgr.disconnect()
    def _investigate_table(
        self,
        schema: str,
        table: str,
        sql_path: Path,
        baseline_executor: QueryExecutor,
        target_executor: QueryExecutor
    ) -> TableInvestigationResult:
        """Execute investigation queries for a single table."""
        # Parse SQL file
        queries = self.parser.parse_sql_file(sql_path)
        if not queries:
            logger.warning(f"No valid queries found in {sql_path.name}")
            return TableInvestigationResult(
                schema=schema,
                table=table,
                sql_file_path=str(sql_path),
                baseline_results=[],
                target_results=[],
                overall_status=Status.SKIP,
                timestamp=get_timestamp()
            )
        logger.debug(f"  └─ Executing {len(queries)} queries")
        # Execute on baseline
        baseline_results = self._execute_queries(
            queries,
            baseline_executor,
            "baseline"
        )
        # Execute on target
        target_results = self._execute_queries(
            queries,
            target_executor,
            "target"
        )
        # Determine overall status
        overall_status = self._determine_overall_status(
            baseline_results,
            target_results
        )
        return TableInvestigationResult(
            schema=schema,
            table=table,
            sql_file_path=str(sql_path),
            baseline_results=baseline_results,
            target_results=target_results,
            overall_status=overall_status,
            timestamp=get_timestamp()
        )
    def _execute_queries(
        self,
        queries: List[Tuple[int, str]],
        executor: QueryExecutor,
        environment: str
    ) -> List[QueryExecutionResult]:
        """Execute list of queries on one environment."""
        results = []
        for query_num, query_text in queries:
            logger.debug(f"    └─ Query {query_num} on {environment}")
            status, result_df, error_msg, exec_time = \
                executor.execute_investigation_query(query_text)
            result = QueryExecutionResult(
                query_number=query_num,
                query_text=query_text,
                status=status,
                execution_time_ms=exec_time,
                result_data=result_df,
                error_message=error_msg,
                row_count=len(result_df) if result_df is not None else 0
            )
            results.append(result)
            logger.debug(f"      └─ {status.value} ({exec_time}ms, "
                        f"{result.row_count} rows)")
        return results
    def _determine_overall_status(
        self,
        baseline_results: List[QueryExecutionResult],
        target_results: List[QueryExecutionResult]
    ) -> Status:
        """Determine overall status for table investigation."""
        all_results = baseline_results + target_results
        if not all_results:
            return Status.SKIP
        success_count = sum(1 for r in all_results if r.status == Status.PASS)
        failed_count = sum(1 for r in all_results if r.status == Status.FAIL)
        skipped_count = sum(1 for r in all_results if r.status == Status.SKIP)
        # All successful
        if success_count == len(all_results):
            return Status.PASS
        # All failed
        if failed_count == len(all_results):
            return Status.FAIL
        # All skipped
        if skipped_count == len(all_results):
            return Status.SKIP
        # Mixed results - use WARNING to indicate partial success
        if success_count > 0:
            return Status.WARNING
        else:
            return Status.FAIL
    def _is_partial_status(self, result: TableInvestigationResult) -> bool:
        """Check if result represents partial success."""
        all_results = result.baseline_results + result.target_results
        if not all_results:
            return False
        success_count = sum(1 for r in all_results if r.status == Status.PASS)
        return 0 < success_count < len(all_results)
    def _get_status_symbol(self, status: Status) -> str:
        """Get symbol for status."""
        symbols = {
            Status.PASS: "✓",
            Status.FAIL: "✗",
            Status.WARNING: "◐",
            Status.SKIP: "○",
            Status.ERROR: "🔴",
            Status.INFO: "ℹ"
        }
        return symbols.get(status, "?")
    def _log_summary(self, summary: InvestigationSummary) -> None:
        """Log investigation summary."""
        logger.info("=" * 60)
        logger.info("INVESTIGATION SUMMARY")
        logger.info("=" * 60)
        logger.info(f"  Tables Processed:  {summary.tables_processed}")
        logger.info(f"  Successful:        {summary.tables_successful}")
        logger.info(f"  Partial:           {summary.tables_partial}")
        logger.info(f"  Failed:            {summary.tables_failed}")
        logger.info(f"  Total Queries:     {summary.total_queries_executed}")
        logger.info("=" * 60)
        logger.info(f"Duration: {summary.duration_seconds} seconds")
        logger.info(f"Success Rate: {summary.success_rate:.1f}%")
        logger.info("=" * 60)
--- a/src/drt/services/sql_parser.py
+++ b/src/drt/services/sql_parser.py
@@ -0,0 +1,173 @@
 """SQL file parser for investigation queries."""
 import re
 from pathlib import Path
 from typing import List, Tuple
 from drt.utils.logging import get_logger
 logger = get_logger(__name__)
 class SQLParser:
    """Parser for investigation SQL files."""
    @staticmethod
    def parse_sql_file(file_path: Path) -> List[Tuple[int, str]]:
        """
        Parse SQL file into individual queries with their numbers.
        Args:
            file_path: Path to SQL file
        Returns:
            List of tuples (query_number, query_text)
        Example:
            >>> queries = SQLParser.parse_sql_file(Path("investigate.sql"))
            >>> for num, query in queries:
            ...     print(f"Query {num}: {query[:50]}...")
        """
        try:
            content = file_path.read_text(encoding='utf-8')
            # Step 1: Remove markdown code blocks
            content = SQLParser._remove_markdown(content)
            # Step 2: Split into queries
            queries = SQLParser._split_queries(content)
            # Step 3: Clean and validate
            cleaned_queries = []
            for num, query in queries:
                cleaned = SQLParser._clean_query(query)
                if cleaned and SQLParser._is_valid_query(cleaned):
                    cleaned_queries.append((num, cleaned))
                else:
                    logger.debug(f"Skipped invalid query {num} in {file_path.name}")
            logger.info(f"Parsed {len(cleaned_queries)} queries from {file_path.name}")
            return cleaned_queries
        except Exception as e:
            logger.error(f"Failed to parse {file_path}: {e}")
            return []
    @staticmethod
    def _remove_markdown(content: str) -> str:
        """Remove markdown code blocks from content."""
        # Remove opening ```sql
        content = re.sub(r'```sql\s*\n?', '', content, flags=re.IGNORECASE)
        # Remove closing ```
        content = re.sub(r'```\s*\n?', '', content)
        return content
    @staticmethod
    def _split_queries(content: str) -> List[Tuple[int, str]]:
        """
        Split content into individual queries.
        Looks for patterns like:
        -- Query 1: Description
        -- Query 2: Description
        """
        queries = []
        current_query = []
        current_number = 0
        for line in content.split('\n'):
            # Check if line is a query separator
            match = re.match(r'^\s*--\s*Query\s+(\d+):', line, re.IGNORECASE)
            if match:
                # Save previous query if exists
                if current_query and current_number > 0:
                    query_text = '\n'.join(current_query).strip()
                    if query_text:
                        queries.append((current_number, query_text))
                # Start new query
                current_number = int(match.group(1))
                current_query = []
            else:
                # Add line to current query
                current_query.append(line)
        # Don't forget the last query
        if current_query and current_number > 0:
            query_text = '\n'.join(current_query).strip()
            if query_text:
                queries.append((current_number, query_text))
        return queries
    @staticmethod
    def _clean_query(query: str) -> str:
        """Clean query text."""
        # Remove leading/trailing whitespace
        query = query.strip()
        # Remove comment-only lines at start
        lines = query.split('\n')
        while lines and lines[0].strip().startswith('--'):
            lines.pop(0)
        # Remove empty lines at start and end
        while lines and not lines[0].strip():
            lines.pop(0)
        while lines and not lines[-1].strip():
            lines.pop()
        return '\n'.join(lines)
    @staticmethod
    def _is_valid_query(query: str) -> bool:
        """Check if query is valid (not empty, not just comments)."""
        if not query:
            return False
        # Remove all comments and whitespace
        cleaned = re.sub(r'--.*$', '', query, flags=re.MULTILINE)
        cleaned = cleaned.strip()
        # Must have some SQL content
        return len(cleaned) > 0
 def discover_sql_files(analysis_dir: Path) -> List[Tuple[str, str, Path]]:
    """
    Discover all *_investigate.sql files in analysis directory.
    Args:
        analysis_dir: Root analysis directory
    Returns:
        List of tuples (schema, table, file_path)
    Example:
        >>> files = discover_sql_files(Path("analysis/output_20251209_184032"))
        >>> for schema, table, path in files:
        ...     print(f"{schema}.{table}: {path}")
    """
    sql_files = []
    # Pattern: dbo.TableName/dbo.TableName_investigate.sql
    pattern = "**/*_investigate.sql"
    for sql_file in analysis_dir.glob(pattern):
        # Extract schema and table from filename
        # Example: dbo.A_COREC_NACES2008_investigate.sql
        filename = sql_file.stem  # Remove .sql
        if filename.endswith('_investigate'):
            # Remove _investigate suffix
            full_name = filename[:-12]  # len('_investigate') = 12
            # Split schema.table
            if '.' in full_name:
                schema, table = full_name.split('.', 1)
                sql_files.append((schema, table, sql_file))
            else:
                logger.warning(f"Could not parse schema.table from {filename}")
    logger.info(f"Discovered {len(sql_files)} investigation SQL files")
    return sql_files
--- a/src/drt/utils/init.py
+++ b/src/drt/utils/init.py
@@ -0,0 +1,7 @@
 """Utility functions and helpers."""
 from drt.utils.timestamps import get_timestamp, format_duration
 from drt.utils.patterns import matches_pattern
 from drt.utils.logging import setup_logging
 __all__ = ["get_timestamp", "format_duration", "matches_pattern", "setup_logging"]
--- a/src/drt/utils/logging.py
+++ b/src/drt/utils/logging.py
@@ -0,0 +1,75 @@
 """Logging configuration and setup."""
 import logging
 import sys
 from pathlib import Path
 from typing import Optional
 from drt.utils.timestamps import get_timestamp
 def setup_logging(
    log_level: str = "INFO",
    log_dir: str = "./logs",
    log_to_console: bool = True,
    log_to_file: bool = True,
 ) -> logging.Logger:
    """
    Configure logging for the framework.
    Args:
        log_level: Logging level (DEBUG, INFO, WARNING, ERROR)
        log_dir: Directory for log files
        log_to_console: Whether to log to console
        log_to_file: Whether to log to file
    Returns:
        Configured logger instance
    """
    # Create logger
    logger = logging.getLogger("drt")
    logger.setLevel(getattr(logging, log_level.upper()))
    # Remove existing handlers
    logger.handlers.clear()
    # Create formatter
    log_format = "%(asctime)s | %(levelname)-8s | %(name)-20s | %(message)s"
    date_format = "%Y%m%d_%H%M%S"
    formatter = logging.Formatter(log_format, datefmt=date_format)
    # Console handler
    if log_to_console:
        console_handler = logging.StreamHandler(sys.stdout)
        console_handler.setLevel(getattr(logging, log_level.upper()))
        console_handler.setFormatter(formatter)
        logger.addHandler(console_handler)
    # File handler
    if log_to_file:
        log_path = Path(log_dir)
        log_path.mkdir(parents=True, exist_ok=True)
        timestamp = get_timestamp()
        log_file = log_path / f"drt_{timestamp}.log"
        file_handler = logging.FileHandler(log_file, encoding="utf-8")
        file_handler.setLevel(logging.DEBUG)  # Always log everything to file
        file_handler.setFormatter(formatter)
        logger.addHandler(file_handler)
        logger.info(f"Logging to file: {log_file}")
    return logger
 def get_logger(name: str) -> logging.Logger:
    """
    Get a logger instance for a specific module.
    Args:
        name: Logger name (typically __name__)
    Returns:
        Logger instance
    """
    return logging.getLogger(f"drt.{name}")
--- a/src/drt/utils/patterns.py
+++ b/src/drt/utils/patterns.py
@@ -0,0 +1,58 @@
 """Pattern matching utilities for wildcard support."""
 import fnmatch
 from typing import List
 def matches_pattern(text: str, patterns: List[str]) -> bool:
    """
    Check if text matches any of the given wildcard patterns.
    Args:
        text: Text to match
        patterns: List of wildcard patterns (e.g., "*_TEMP", "tmp*")
    Returns:
        True if text matches any pattern, False otherwise
    Examples:
        >>> matches_pattern("Orders_TEMP", ["*_TEMP", "*_TMP"])
        True
        >>> matches_pattern("Orders", ["*_TEMP", "*_TMP"])
        False
    """
    if not patterns:
        return False
    for pattern in patterns:
        if fnmatch.fnmatch(text.upper(), pattern.upper()):
            return True
    return False
 def filter_by_patterns(
    items: List[str], include_patterns: List[str] = None, exclude_patterns: List[str] = None
 ) -> List[str]:
    """
    Filter items by include and exclude patterns.
    Args:
        items: List of items to filter
        include_patterns: Patterns to include (if None, include all)
        exclude_patterns: Patterns to exclude
    Returns:
        Filtered list of items
    """
    result = items.copy()
    # Apply include patterns if specified
    if include_patterns:
        result = [item for item in result if matches_pattern(item, include_patterns)]
    # Apply exclude patterns
    if exclude_patterns:
        result = [item for item in result if not matches_pattern(item, exclude_patterns)]
    return result
--- a/src/drt/utils/timestamps.py
+++ b/src/drt/utils/timestamps.py
@@ -0,0 +1,59 @@
 """Timestamp utilities using YYYYMMDD_HHMMSS format."""
 from datetime import datetime
 def get_timestamp() -> str:
    """
    Get current timestamp in YYYYMMDD_HHMMSS format.
    Returns:
        Formatted timestamp string
    """
    return datetime.now().strftime("%Y%m%d_%H%M%S")
 def format_duration(seconds: int) -> str:
    """
    Format duration in seconds to human-readable string.
    Args:
        seconds: Duration in seconds
    Returns:
        Formatted duration string (e.g., "4 minutes 38 seconds")
    """
    if seconds < 60:
        return f"{seconds} second{'s' if seconds != 1 else ''}"
    minutes = seconds // 60
    remaining_seconds = seconds % 60
    if minutes < 60:
        if remaining_seconds == 0:
            return f"{minutes} minute{'s' if minutes != 1 else ''}"
        return f"{minutes} minute{'s' if minutes != 1 else ''} {remaining_seconds} second{'s' if remaining_seconds != 1 else ''}"
    hours = minutes // 60
    remaining_minutes = minutes % 60
    parts = [f"{hours} hour{'s' if hours != 1 else ''}"]
    if remaining_minutes > 0:
        parts.append(f"{remaining_minutes} minute{'s' if remaining_minutes != 1 else ''}")
    if remaining_seconds > 0:
        parts.append(f"{remaining_seconds} second{'s' if remaining_seconds != 1 else ''}")
    return " ".join(parts)
 def parse_timestamp(timestamp_str: str) -> datetime:
    """
    Parse timestamp string in YYYYMMDD_HHMMSS format.
    Args:
        timestamp_str: Timestamp string to parse
    Returns:
        datetime object
    """
    return datetime.strptime(timestamp_str, "%Y%m%d_%H%M%S")
--- a/test_data/init_baseline.sql
+++ b/test_data/init_baseline.sql
@@ -0,0 +1,117 @@
 -- Baseline Database Initialization Script
 -- This creates a sample database structure for testing
 USE master;
 GO
 -- Create test database
 IF NOT EXISTS (SELECT name FROM sys.databases WHERE name = 'TestDB_Baseline')
 BEGIN
    CREATE DATABASE TestDB_Baseline;
 END
 GO
 USE TestDB_Baseline;
 GO
 -- Create sample tables
 -- Dimension: Customers
 CREATE TABLE dbo.DimCustomer (
    CustomerID INT PRIMARY KEY IDENTITY(1,1),
    CustomerName NVARCHAR(100) NOT NULL,
    Email NVARCHAR(100),
    City NVARCHAR(50),
    Country NVARCHAR(50),
    CreatedDate DATETIME DEFAULT GETDATE()
 );
 -- Dimension: Products
 CREATE TABLE dbo.DimProduct (
    ProductID INT PRIMARY KEY IDENTITY(1,1),
    ProductName NVARCHAR(100) NOT NULL,
    Category NVARCHAR(50),
    UnitPrice DECIMAL(10,2),
    IsActive BIT DEFAULT 1
 );
 -- Fact: Sales
 CREATE TABLE dbo.FactSales (
    SaleID INT PRIMARY KEY IDENTITY(1,1),
    CustomerID INT,
    ProductID INT,
    SaleDate DATE,
    Quantity INT,
    UnitPrice DECIMAL(10,2),
    TotalAmount DECIMAL(10,2),
    TaxAmount DECIMAL(10,2),
    FOREIGN KEY (CustomerID) REFERENCES dbo.DimCustomer(CustomerID),
    FOREIGN KEY (ProductID) REFERENCES dbo.DimProduct(ProductID)
 );
 -- Insert sample data (TEST DATA ONLY - NOT REAL CUSTOMERS)
 -- Customers
 INSERT INTO dbo.DimCustomer (CustomerName, Email, City, Country) VALUES
 ('TestCustomer1', 'test1@test.local', 'City1', 'Country1'),
 ('TestCustomer2', 'test2@test.local', 'City2', 'Country2'),
 ('TestCustomer3', 'test3@test.local', 'City3', 'Country3'),
 ('TestCustomer4', 'test4@test.local', 'City4', 'Country4'),
 ('TestCustomer5', 'test5@test.local', 'City5', 'Country5');
 -- Products
 INSERT INTO dbo.DimProduct (ProductName, Category, UnitPrice, IsActive) VALUES
 ('Laptop', 'Electronics', 999.99, 1),
 ('Mouse', 'Electronics', 29.99, 1),
 ('Keyboard', 'Electronics', 79.99, 1),
 ('Monitor', 'Electronics', 299.99, 1),
 ('Desk Chair', 'Furniture', 199.99, 1),
 ('Desk', 'Furniture', 399.99, 1),
 ('Notebook', 'Stationery', 4.99, 1),
 ('Pen Set', 'Stationery', 12.99, 1);
 -- Sales (100 records)
 DECLARE @i INT = 1;
 WHILE @i <= 100
 BEGIN
    INSERT INTO dbo.FactSales (CustomerID, ProductID, SaleDate, Quantity, UnitPrice, TotalAmount, TaxAmount)
    VALUES (
        (ABS(CHECKSUM(NEWID())) % 5) + 1,  -- Random CustomerID 1-5
        (ABS(CHECKSUM(NEWID())) % 8) + 1,  -- Random ProductID 1-8
        DATEADD(DAY, -ABS(CHECKSUM(NEWID())) % 365, GETDATE()),  -- Random date in last year
        (ABS(CHECKSUM(NEWID())) % 10) + 1,  -- Random Quantity 1-10
        (ABS(CHECKSUM(NEWID())) % 900) + 100.00,  -- Random price 100-1000
        0,  -- Will be calculated
        0   -- Will be calculated
    );
    -- Calculate amounts
    UPDATE dbo.FactSales
    SET TotalAmount = Quantity * UnitPrice,
        TaxAmount = Quantity * UnitPrice * 0.1
    WHERE SaleID = @i;
    SET @i = @i + 1;
 END
 GO
 -- Create some views for testing
 CREATE VIEW dbo.vw_SalesSummary AS
 SELECT 
    c.CustomerName,
    p.ProductName,
    s.SaleDate,
    s.Quantity,
    s.TotalAmount
 FROM dbo.FactSales s
 JOIN dbo.DimCustomer c ON s.CustomerID = c.CustomerID
 JOIN dbo.DimProduct p ON s.ProductID = p.ProductID;
 GO
 -- Create statistics
 CREATE STATISTICS stat_sales_date ON dbo.FactSales(SaleDate);
 CREATE STATISTICS stat_customer_country ON dbo.DimCustomer(Country);
 GO
 PRINT 'Baseline database initialized successfully';
 GO
--- a/test_data/init_target.sql
+++ b/test_data/init_target.sql
@@ -0,0 +1,131 @@
 -- Target Database Initialization Script
 -- This creates a similar structure with some intentional differences for testing
 USE master;
 GO
 -- Create test database
 IF NOT EXISTS (SELECT name FROM sys.databases WHERE name = 'TestDB_Target')
 BEGIN
    CREATE DATABASE TestDB_Target;
 END
 GO
 USE TestDB_Target;
 GO
 -- Create sample tables (similar to baseline with some differences)
 -- Dimension: Customers (same structure)
 CREATE TABLE dbo.DimCustomer (
    CustomerID INT PRIMARY KEY IDENTITY(1,1),
    CustomerName NVARCHAR(100) NOT NULL,
    Email NVARCHAR(100),
    City NVARCHAR(50),
    Country NVARCHAR(50),
    CreatedDate DATETIME DEFAULT GETDATE()
 );
 -- Dimension: Products (slightly different - added column)
 CREATE TABLE dbo.DimProduct (
    ProductID INT PRIMARY KEY IDENTITY(1,1),
    ProductName NVARCHAR(100) NOT NULL,
    Category NVARCHAR(50),
    UnitPrice DECIMAL(10,2),
    IsActive BIT DEFAULT 1,
    LastModified DATETIME DEFAULT GETDATE()  -- Extra column for testing
 );
 -- Fact: Sales (same structure)
 CREATE TABLE dbo.FactSales (
    SaleID INT PRIMARY KEY IDENTITY(1,1),
    CustomerID INT,
    ProductID INT,
    SaleDate DATE,
    Quantity INT,
    UnitPrice DECIMAL(10,2),
    TotalAmount DECIMAL(10,2),
    TaxAmount DECIMAL(10,2),
    FOREIGN KEY (CustomerID) REFERENCES dbo.DimCustomer(CustomerID),
    FOREIGN KEY (ProductID) REFERENCES dbo.DimProduct(ProductID)
 );
 -- Insert sample data (TEST DATA ONLY - NOT REAL CUSTOMERS)
 -- Customers
 INSERT INTO dbo.DimCustomer (CustomerName, Email, City, Country) VALUES
 ('TestCustomer1', 'test1@test.local', 'City1', 'Country1'),
 ('TestCustomer2', 'test2@test.local', 'City2', 'Country2'),
 ('TestCustomer3', 'test3@test.local', 'City3', 'Country3'),
 ('TestCustomer4', 'test4@test.local', 'City4', 'Country4'),
 ('TestCustomer5', 'test5@test.local', 'City5', 'Country5');
 -- Products (with LastModified)
 INSERT INTO dbo.DimProduct (ProductName, Category, UnitPrice, IsActive, LastModified) VALUES
 ('Laptop', 'Electronics', 999.99, 1, GETDATE()),
 ('Mouse', 'Electronics', 29.99, 1, GETDATE()),
 ('Keyboard', 'Electronics', 79.99, 1, GETDATE()),
 ('Monitor', 'Electronics', 299.99, 1, GETDATE()),
 ('Desk Chair', 'Furniture', 199.99, 1, GETDATE()),
 ('Desk', 'Furniture', 399.99, 1, GETDATE()),
 ('Notebook', 'Stationery', 4.99, 1, GETDATE()),
 ('Pen Set', 'Stationery', 12.99, 1, GETDATE());
 -- Sales (95 records - 5 fewer than baseline for testing)
 DECLARE @i INT = 1;
 WHILE @i <= 95
 BEGIN
    INSERT INTO dbo.FactSales (CustomerID, ProductID, SaleDate, Quantity, UnitPrice, TotalAmount, TaxAmount)
    VALUES (
        (ABS(CHECKSUM(NEWID())) % 5) + 1,
        (ABS(CHECKSUM(NEWID())) % 8) + 1,
        DATEADD(DAY, -ABS(CHECKSUM(NEWID())) % 365, GETDATE()),
        (ABS(CHECKSUM(NEWID())) % 10) + 1,
        (ABS(CHECKSUM(NEWID())) % 900) + 100.00,
        0,
        0
    );
    -- Calculate amounts
    UPDATE dbo.FactSales
    SET TotalAmount = Quantity * UnitPrice,
        TaxAmount = Quantity * UnitPrice * 0.1
    WHERE SaleID = @i;
    SET @i = @i + 1;
 END
 GO
 -- Create the same view
 CREATE VIEW dbo.vw_SalesSummary AS
 SELECT 
    c.CustomerName,
    p.ProductName,
    s.SaleDate,
    s.Quantity,
    s.TotalAmount
 FROM dbo.FactSales s
 JOIN dbo.DimCustomer c ON s.CustomerID = c.CustomerID
 JOIN dbo.DimProduct p ON s.ProductID = p.ProductID;
 GO
 -- Create an extra table that doesn't exist in baseline
 CREATE TABLE dbo.TempProcessing (
    ProcessID INT PRIMARY KEY IDENTITY(1,1),
    ProcessName NVARCHAR(100),
    Status NVARCHAR(20),
    CreatedDate DATETIME DEFAULT GETDATE()
 );
 INSERT INTO dbo.TempProcessing (ProcessName, Status) VALUES
 ('DataLoad', 'Completed'),
 ('Validation', 'In Progress');
 GO
 -- Create statistics
 CREATE STATISTICS stat_sales_date ON dbo.FactSales(SaleDate);
 CREATE STATISTICS stat_customer_country ON dbo.DimCustomer(Country);
 GO
 PRINT 'Target database initialized successfully';
 GO
--- a/test_data/setup_test_environment.sh
+++ b/test_data/setup_test_environment.sh
@@ -0,0 +1,97 @@
 #!/bin/bash
 # Setup script for test SQL Server environment
 set -e
 echo "=========================================="
 echo "SQL Server Test Environment Setup"
 echo "=========================================="
 echo ""
 # Check if Docker is installed
 if ! command -v docker &> /dev/null; then
    echo "Error: Docker is not installed"
    echo "Please install Docker first: https://docs.docker.com/get-docker/"
    exit 1
 fi
 # Check if Docker Compose is available (either standalone or plugin)
 if ! command -v docker-compose &> /dev/null && ! docker compose version &> /dev/null; then
    echo "Error: Docker Compose is not installed"
    echo "Please install Docker Compose first"
    exit 1
 fi
 # Determine which compose command to use
 if docker compose version &> /dev/null; then
    COMPOSE_CMD="docker compose"
 else
    COMPOSE_CMD="docker-compose"
 fi
 echo "Step 1: Starting SQL Server containers..."
 $COMPOSE_CMD -f docker-compose.test.yml up -d
 echo ""
 echo "Step 2: Waiting for SQL Server to be ready..."
 echo "This may take 30-60 seconds..."
 # Set default password if not provided
 SA_PASSWORD=${SA_PASSWORD:-YourStrong!Passw0rd}
 # Wait for baseline server
 echo -n "Waiting for baseline server"
 for i in {1..30}; do
    if docker exec drt-sqlserver-baseline /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P "$SA_PASSWORD" -C -Q "SELECT 1" &> /dev/null; then
        echo " ✓"
        break
    fi
    echo -n "."
    sleep 2
 done
 # Wait for target server
 echo -n "Waiting for target server"
 for i in {1..30}; do
    if docker exec drt-sqlserver-target /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P "$SA_PASSWORD" -C -Q "SELECT 1" &> /dev/null; then
        echo " ✓"
        break
    fi
    echo -n "."
    sleep 2
 done
 echo ""
 echo "Step 3: Initializing baseline database..."
 docker exec -i drt-sqlserver-baseline /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P "$SA_PASSWORD" -C < test_data/init_baseline.sql
 echo ""
 echo "Step 4: Initializing target database..."
 docker exec -i drt-sqlserver-target /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P "$SA_PASSWORD" -C < test_data/init_target.sql
 echo ""
 echo "=========================================="
 echo "Setup completed successfully!"
 echo "=========================================="
 echo ""
 echo "SQL Server instances are running:"
 echo "  Baseline: localhost:1433"
 echo "  Target:   localhost:1434"
 echo ""
 echo "Credentials:"
 echo "  Username: sa"
 echo "  Password: (set via SA_PASSWORD environment variable)"
 echo ""
 echo "Test databases:"
 echo "  Baseline: TestDB_Baseline"
 echo "  Target:   TestDB_Target"
 echo ""
 echo "To test the connection:"
 echo "  drt discover --server localhost --database TestDB_Baseline --output config_test.yaml"
 echo ""
 echo "To stop the servers:"
 echo "  $COMPOSE_CMD -f docker-compose.test.yml down"
 echo ""
 echo "To stop and remove all data:"
 echo "  $COMPOSE_CMD -f docker-compose.test.yml down -v"
 echo ""
--- a/tests/init.py
+++ b/tests/init.py
@@ -0,0 +1,3 @@
 """
 Test suite for Data Regression Testing Framework
 """
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -0,0 +1,207 @@
 """
 Unit tests for configuration management
 """
 import pytest
 from pathlib import Path
 from drt.config.models import (
    DatabaseConnection,
    DatabasePair,
    ComparisonSettings,
    RowCountSettings,
    SchemaSettings,
    AggregateSettings,
    ReportingSettings,
    LoggingSettings,
    Config
 )
 class TestDatabaseConnection:
    """Test DatabaseConnection model"""
    def test_database_connection_minimal(self):
        """Test creating a minimal database connection"""
        conn = DatabaseConnection(
            server="SQLSERVER01",
            database="TestDB"
        )
        assert conn.server == "SQLSERVER01"
        assert conn.database == "TestDB"
        assert conn.timeout.connection == 30
        assert conn.timeout.query == 300
    def test_database_connection_with_timeout(self):
        """Test database connection with custom timeout"""
        conn = DatabaseConnection(
            server="SQLSERVER01",
            database="TestDB",
            timeout={"connection": 60, "query": 600}
        )
        assert conn.timeout.connection == 60
        assert conn.timeout.query == 600
 class TestDatabasePair:
    """Test DatabasePair model"""
    def test_database_pair_creation(self):
        """Test creating a database pair"""
        pair = DatabasePair(
            name="Test_Pair",
            enabled=True,
            baseline=DatabaseConnection(
                server="SQLSERVER01",
                database="PROD_DB"
            ),
            target=DatabaseConnection(
                server="SQLSERVER01",
                database="TEST_DB"
            )
        )
        assert pair.name == "Test_Pair"
        assert pair.enabled is True
        assert pair.baseline.database == "PROD_DB"
        assert pair.target.database == "TEST_DB"
 class TestComparisonSettings:
    """Test ComparisonSettings model"""
    def test_comparison_settings_health_check(self):
        """Test health check mode settings"""
        settings = ComparisonSettings(
            mode="health_check",
            row_count=RowCountSettings(enabled=True, tolerance_percent=0.0),
            schema=SchemaSettings(
                enabled=True,
                checks={
                    "column_names": True,
                    "data_types": True
                }
            ),
            aggregates=AggregateSettings(enabled=False)
        )
        assert settings.mode == "health_check"
        assert settings.row_count.enabled is True
        assert settings.aggregates.enabled is False
    def test_comparison_settings_full_mode(self):
        """Test full mode settings"""
        settings = ComparisonSettings(
            mode="full",
            row_count=RowCountSettings(enabled=True, tolerance_percent=0.0),
            schema=SchemaSettings(enabled=True),
            aggregates=AggregateSettings(enabled=True, tolerance_percent=0.01)
        )
        assert settings.mode == "full"
        assert settings.aggregates.enabled is True
        assert settings.aggregates.tolerance_percent == 0.01
 class TestReportingSettings:
    """Test ReportingSettings model"""
    def test_reporting_settings_defaults(self):
        """Test default reporting settings"""
        settings = ReportingSettings()
        assert settings.output_dir == "./reports"
        assert settings.formats.html is True
        assert settings.formats.csv is True
        assert settings.formats.pdf is False
        assert settings.include_timestamp is True
    def test_reporting_settings_custom(self):
        """Test custom reporting settings"""
        settings = ReportingSettings(
            output_dir="./custom_reports",
            filename_prefix="custom_test",
            formats={"html": True, "csv": False, "pdf": True}
        )
        assert settings.output_dir == "./custom_reports"
        assert settings.filename_prefix == "custom_test"
        assert settings.formats.pdf is True
 class TestLoggingSettings:
    """Test LoggingSettings model"""
    def test_logging_settings_defaults(self):
        """Test default logging settings"""
        settings = LoggingSettings()
        assert settings.level == "INFO"
        assert settings.output_dir == "./logs"
        assert settings.console.enabled is True
        assert settings.file.enabled is True
    def test_logging_settings_custom(self):
        """Test custom logging settings"""
        settings = LoggingSettings(
            level="DEBUG",
            console={"enabled": True, "level": "WARNING"}
        )
        assert settings.level == "DEBUG"
        assert settings.console.level == "WARNING"
 class TestConfig:
    """Test Config model"""
    def test_config_minimal(self):
        """Test creating a minimal config"""
        config = Config(
            database_pairs=[
                DatabasePair(
                    name="Test",
                    enabled=True,
                    baseline=DatabaseConnection(
                        server="SERVER01",
                        database="PROD"
                    ),
                    target=DatabaseConnection(
                        server="SERVER01",
                        database="TEST"
                    )
                )
            ],
            comparison=ComparisonSettings(
                mode="health_check",
                row_count=RowCountSettings(enabled=True),
                schema=SchemaSettings(enabled=True),
                aggregates=AggregateSettings(enabled=False)
            ),
            tables=[]
        )
        assert len(config.database_pairs) == 1
        assert config.comparison.mode == "health_check"
        assert len(config.tables) == 0
    def test_config_with_tables(self):
        """Test config with table definitions"""
        from drt.models.table import TableInfo
        config = Config(
            database_pairs=[
                DatabasePair(
                    name="Test",
                    enabled=True,
                    baseline=DatabaseConnection(server="S1", database="D1"),
                    target=DatabaseConnection(server="S1", database="D2")
                )
            ],
            comparison=ComparisonSettings(
                mode="health_check",
                row_count=RowCountSettings(enabled=True),
                schema=SchemaSettings(enabled=True),
                aggregates=AggregateSettings(enabled=False)
            ),
            tables=[
                TableInfo(
                    schema="dbo",
                    name="TestTable",
                    enabled=True,
                    expected_in_target=True
                )
            ]
        )
        assert len(config.tables) == 1
        assert config.tables[0].name == "TestTable"
--- a/tests/test_models.py
+++ b/tests/test_models.py
@@ -0,0 +1,186 @@
 """
 Unit tests for data models
 """
 import pytest
 from drt.models.enums import Status, CheckType
 from drt.models.table import TableInfo, ColumnInfo
 from drt.models.results import CheckResult, ComparisonResult
 class TestStatus:
    """Test Status enum"""
    def test_status_values(self):
        """Test status enum values"""
        assert Status.PASS.value == "PASS"
        assert Status.FAIL.value == "FAIL"
        assert Status.WARNING.value == "WARNING"
        assert Status.ERROR.value == "ERROR"
        assert Status.INFO.value == "INFO"
        assert Status.SKIP.value == "SKIP"
    def test_status_severity(self):
        """Test status severity comparison"""
        assert Status.FAIL.severity > Status.WARNING.severity
        assert Status.WARNING.severity > Status.PASS.severity
        assert Status.ERROR.severity > Status.FAIL.severity
 class TestCheckType:
    """Test CheckType enum"""
    def test_check_type_values(self):
        """Test check type enum values"""
        assert CheckType.TABLE_EXISTENCE.value == "TABLE_EXISTENCE"
        assert CheckType.ROW_COUNT.value == "ROW_COUNT"
        assert CheckType.SCHEMA.value == "SCHEMA"
        assert CheckType.AGGREGATE.value == "AGGREGATE"
 class TestTableInfo:
    """Test TableInfo model"""
    def test_table_info_creation(self):
        """Test creating a TableInfo instance"""
        table = TableInfo(
            schema="dbo",
            name="TestTable",
            enabled=True,
            expected_in_target=True
        )
        assert table.schema == "dbo"
        assert table.name == "TestTable"
        assert table.enabled is True
        assert table.expected_in_target is True
        assert table.aggregate_columns == []
    def test_table_info_with_aggregates(self):
        """Test TableInfo with aggregate columns"""
        table = TableInfo(
            schema="dbo",
            name="FactSales",
            enabled=True,
            expected_in_target=True,
            aggregate_columns=["Amount", "Quantity"]
        )
        assert len(table.aggregate_columns) == 2
        assert "Amount" in table.aggregate_columns
 class TestColumnInfo:
    """Test ColumnInfo model"""
    def test_column_info_creation(self):
        """Test creating a ColumnInfo instance"""
        column = ColumnInfo(
            name="CustomerID",
            data_type="int",
            is_nullable=False,
            is_primary_key=True
        )
        assert column.name == "CustomerID"
        assert column.data_type == "int"
        assert column.is_nullable is False
        assert column.is_primary_key is True
 class TestCheckResult:
    """Test CheckResult model"""
    def test_check_result_pass(self):
        """Test creating a passing check result"""
        result = CheckResult(
            check_type=CheckType.ROW_COUNT,
            status=Status.PASS,
            message="Row counts match",
            baseline_value=1000,
            target_value=1000
        )
        assert result.status == Status.PASS
        assert result.baseline_value == 1000
        assert result.target_value == 1000
    def test_check_result_fail(self):
        """Test creating a failing check result"""
        result = CheckResult(
            check_type=CheckType.ROW_COUNT,
            status=Status.FAIL,
            message="Row count mismatch",
            baseline_value=1000,
            target_value=950
        )
        assert result.status == Status.FAIL
        assert result.baseline_value != result.target_value
 class TestComparisonResult:
    """Test ComparisonResult model"""
    def test_comparison_result_creation(self):
        """Test creating a ComparisonResult instance"""
        result = ComparisonResult(
            schema="dbo",
            table="TestTable"
        )
        assert result.schema == "dbo"
        assert result.table == "TestTable"
        assert len(result.checks) == 0
    def test_add_check_result(self):
        """Test adding check results"""
        comparison = ComparisonResult(
            schema="dbo",
            table="TestTable"
        )
        check = CheckResult(
            check_type=CheckType.ROW_COUNT,
            status=Status.PASS,
            message="Row counts match"
        )
        comparison.checks.append(check)
        assert len(comparison.checks) == 1
        assert comparison.checks[0].status == Status.PASS
    def test_overall_status_all_pass(self):
        """Test overall status when all checks pass"""
        comparison = ComparisonResult(
            schema="dbo",
            table="TestTable"
        )
        comparison.checks.append(CheckResult(
            check_type=CheckType.TABLE_EXISTENCE,
            status=Status.PASS,
            message="Table exists"
        ))
        comparison.checks.append(CheckResult(
            check_type=CheckType.ROW_COUNT,
            status=Status.PASS,
            message="Row counts match"
        ))
        assert comparison.overall_status == Status.PASS
    def test_overall_status_with_failure(self):
        """Test overall status when one check fails"""
        comparison = ComparisonResult(
            schema="dbo",
            table="TestTable"
        )
        comparison.checks.append(CheckResult(
            check_type=CheckType.TABLE_EXISTENCE,
            status=Status.PASS,
            message="Table exists"
        ))
        comparison.checks.append(CheckResult(
            check_type=CheckType.ROW_COUNT,
            status=Status.FAIL,
            message="Row count mismatch"
        ))
        assert comparison.overall_status == Status.FAIL
--- a/tests/test_utils.py
+++ b/tests/test_utils.py
@@ -0,0 +1,83 @@
 """
 Unit tests for utility functions
 """
 import pytest
 from datetime import datetime
 from drt.utils.timestamps import format_timestamp, format_duration
 from drt.utils.patterns import matches_pattern
 class TestTimestamps:
    """Test timestamp utilities"""
    def test_format_timestamp(self):
        """Test timestamp formatting"""
        dt = datetime(2024, 1, 15, 14, 30, 45)
        formatted = format_timestamp(dt)
        assert formatted == "20240115_143045"
    def test_format_timestamp_current(self):
        """Test formatting current timestamp"""
        formatted = format_timestamp()
        # Should be in YYYYMMDD_HHMMSS format
        assert len(formatted) == 15
        assert formatted[8] == "_"
    def test_format_duration_seconds(self):
        """Test duration formatting for seconds"""
        duration = format_duration(45.5)
        assert duration == "45.50s"
    def test_format_duration_minutes(self):
        """Test duration formatting for minutes"""
        duration = format_duration(125.0)
        assert duration == "2m 5.00s"
    def test_format_duration_hours(self):
        """Test duration formatting for hours"""
        duration = format_duration(3725.0)
        assert duration == "1h 2m 5.00s"
 class TestPatterns:
    """Test pattern matching utilities"""
    def test_exact_match(self):
        """Test exact pattern matching"""
        assert matches_pattern("TestTable", "TestTable") is True
        assert matches_pattern("TestTable", "OtherTable") is False
    def test_wildcard_star(self):
        """Test wildcard * pattern"""
        assert matches_pattern("TestTable", "Test*") is True
        assert matches_pattern("TestTable", "*Table") is True
        assert matches_pattern("TestTable", "*est*") is True
        assert matches_pattern("TestTable", "Other*") is False
    def test_wildcard_question(self):
        """Test wildcard ? pattern"""
        assert matches_pattern("Test1", "Test?") is True
        assert matches_pattern("TestA", "Test?") is True
        assert matches_pattern("Test12", "Test?") is False
        assert matches_pattern("Test", "Test?") is False
    def test_combined_wildcards(self):
        """Test combined wildcard patterns"""
        assert matches_pattern("Test_Table_01", "Test_*_??") is True
        assert matches_pattern("Test_Table_1", "Test_*_??") is False
    def test_case_sensitivity(self):
        """Test case-sensitive matching"""
        assert matches_pattern("TestTable", "testtable") is False
        assert matches_pattern("TestTable", "TestTable") is True
    def test_empty_pattern(self):
        """Test empty pattern"""
        assert matches_pattern("TestTable", "") is False
        assert matches_pattern("", "") is True
    def test_special_characters(self):
        """Test patterns with special characters"""
        assert matches_pattern("Test.Table", "Test.Table") is True
        assert matches_pattern("Test_Table", "Test_*") is True
        assert matches_pattern("Test-Table", "Test-*") is True