Initial commit

2026-01-03 22:05:49 +07:00
commit 2f8859dbe8
63 changed files with 6708 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,97 @@
+# Security: Sensitive Files and Credentials
+# Add these patterns to your .gitignore to prevent accidental commits of sensitive data
+
+# Environment variables
+.env
+.env.local
+.env.*.local
+
+# Configuration files with credentials
+config.*.yaml
+!config.example.yaml
+!config.quickstart.yaml
+!config.test.yaml
+
+# Logs (may contain sensitive information)
+logs/
+*.log
+
+# Reports and analysis output
+reports/
+investigation_reports/
+analysis/
+
+# IDE and editor files
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# Virtual environments
+venv/
+ENV/
+env/
+
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Temporary files
+*.tmp
+*.bak
+*.backup
+*~
+
+# Database files
+*.db
+*.sqlite
+*.sqlite3
+
+# Docker
+.dockerignore
+docker-compose.override.yml
+
+# Credentials and secrets (CRITICAL)
+**/secrets/
+**/credentials/
+**/.aws/
+**/.azure/
+**/.gcp/
+**/private_key*
+**/secret_key*
+**/api_key*
+**/token*
+**/password*
+
+# Configuration with real values
+config.prod.yaml
+config.production.yaml
+config.live.yaml
--- a/21
+++ b/21
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 QA Engineering Team
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,741 @@
+# Data Regression Testing Framework
+
+A comprehensive framework for validating data integrity during code migration and system updates by comparing data outputs between Baseline (Production) and Target (Test) SQL Server databases.
+
+## ✨ Features
+
+- **Automated Discovery** - Scan databases and auto-generate configuration files
+- **Multiple Comparison Types** - Row counts, schema validation, aggregate sums
+- **Investigation Queries** - Execute diagnostic SQL queries from regression analysis
+- **Flexible Configuration** - YAML-based setup with extensive customization
+- **Rich Reporting** - HTML, CSV, and PDF reports with detailed results
+- **Windows Authentication** - Secure, credential-free database access
+- **Read-Only Operations** - All queries are SELECT-only for safety
+- **Comprehensive Logging** - Detailed execution logs with timestamps
+
+## 🚀 Quick Start
+
+### Prerequisites
+
+- Python 3.9+
+- Microsoft ODBC Driver 17+ for SQL Server
+- Windows environment with domain authentication (or Linux with Kerberos)
+- Read access to SQL Server databases
+
+### Installation
+
+```bash
+# Clone the repository
+git clone <repository-url>
+cd data_regression_testing
+
+# Create virtual environment
+python -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+
+# Install the framework
+pip install -e .
+
+# Verify installation
+drt --version
+```
+
+### Basic Usage
+
+```bash
+# 1. Discover tables from baseline database
+drt discover --server <YOUR_SERVER> --database <YOUR_BASELINE_DB> --output config.yaml
+
+# 2. Edit config.yaml to add target database connection
+
+# 3. Validate configuration
+drt validate --config config.yaml
+
+# 4. Run comparison
+drt compare --config config.yaml
+
+# 5. (Optional) Investigate regression issues
+drt investigate --analysis-dir analysis/output_<TIMESTAMP>/ --config config.yaml
+```
+
+## 📦 Platform-Specific Installation
+
+### Windows
+
+1. Install Python 3.9+ from https://www.python.org/downloads/
+2. ODBC Driver is usually pre-installed on Windows
+3. Install Framework:
+   ```cmd
+   python -m venv venv
+   venv\Scripts\activate
+   pip install -e .
+   ```
+
+### Linux (Debian/Ubuntu)
+
+```bash
+# Install ODBC Driver
+curl -fsSL https://packages.microsoft.com/keys/microsoft.asc | sudo gpg --dearmor -o /usr/share/keyrings/microsoft-prod.gpg
+curl https://packages.microsoft.com/config/debian/12/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
+sudo apt-get update
+sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 unixodbc-dev
+
+# Install Kerberos for Windows Authentication
+sudo apt-get install -y krb5-user
+
+# Configure /etc/krb5.conf with your domain settings
+# Then obtain ticket: kinit username@YOUR_DOMAIN.COM
+
+# Install framework
+python3 -m venv venv
+source venv/bin/activate
+pip install -e .
+```
+
+## 📋 Commands
+
+### Discovery
+
+Automatically scan databases and generate configuration files.
+
+```bash
+drt discover --server <YOUR_SERVER> --database <YOUR_DATABASE> [OPTIONS]
+```
+
+**Options:**
+- `--server TEXT` - SQL Server hostname (required)
+- `--database TEXT` - Database name (required)
+- `--output, -o TEXT` - Output file (default: config_discovered.yaml)
+- `--schemas TEXT` - Specific schemas to include
+- `--verbose, -v` - Enable verbose output
+
+### Validate
+
+Validate configuration file syntax and database connectivity.
+
+```bash
+drt validate --config <CONFIG_FILE> [OPTIONS]
+```
+
+**Options:**
+- `--config, -c PATH` - Configuration file (required)
+- `--verbose, -v` - Enable verbose output
+
+### Compare
+
+Execute data comparison between baseline and target databases.
+
+```bash
+drt compare --config <CONFIG_FILE> [OPTIONS]
+```
+
+**Options:**
+- `--config, -c PATH` - Configuration file (required)
+- `--verbose, -v` - Enable verbose output
+- `--dry-run` - Show what would be compared without executing
+
+### Investigate
+
+Execute diagnostic queries from regression analysis.
+
+```bash
+drt investigate --analysis-dir <ANALYSIS_DIR> --config <CONFIG_FILE> [OPTIONS]
+```
+
+**Options:**
+- `--analysis-dir, -a PATH` - Analysis output directory containing `*_investigate.sql` files (required)
+- `--config, -c PATH` - Configuration file (required)
+- `--output-dir, -o PATH` - Output directory for reports (default: ./investigation_reports)
+- `--verbose, -v` - Enable verbose output
+- `--dry-run` - Show what would be executed without running
+
+**Example:**
+```bash
+drt investigate -a analysis/output_20251209_184032/ -c config.yaml
+drt investigate -a analysis/output_20251209_184032/ -c config.yaml -o ./my_reports
+```
+
+**What it does:**
+- Discovers all `*_investigate.sql` files in the analysis directory
+- Parses SQL files (handles markdown, multiple queries per file)
+- Executes queries on both baseline and target databases
+- Handles errors gracefully (continues on failures)
+- Generates HTML and CSV reports with side-by-side comparisons
+
+## ⚙️ Configuration
+
+### Database Connections
+
+```yaml
+database_pairs:
+  - name: "DWH_Comparison"
+    enabled: true
+    baseline:
+      server: "<YOUR_SERVER>"
+      database: "<YOUR_BASELINE_DB>"
+      timeout:
+        connection: 30
+        query: 300
+    target:
+      server: "<YOUR_SERVER>"
+      database: "<YOUR_TARGET_DB>"
+```
+
+### Comparison Settings
+
+```yaml
+comparison:
+  mode: "health_check"  # or "full"
+  row_count:
+    enabled: true
+    tolerance_percent: 0.0
+  schema:
+    enabled: true
+    checks:
+      column_names: true
+      data_types: true
+  aggregates:
+    enabled: true
+    tolerance_percent: 0.01
+```
+
+### Table Configuration
+
+```yaml
+tables:
+  - schema: "dbo"
+    name: "FactTable1"
+    enabled: true
+    expected_in_target: true
+    aggregate_columns:
+      - "Amount"
+      - "Quantity"
+```
+
+### Output Directories
+
+```yaml
+reporting:
+  output_dir: "./reports"
+  investigation_dir: "./investigation_reports"
+
+logging:
+  output_dir: "./logs"
+
+discovery:
+  analysis_directory: "./analysis"
+```
+
+**Benefits:**
+- Centralized storage of all output files
+- Easy cleanup and management of generated files
+- Configuration flexibility via YAML
+- Backward compatibility with CLI overrides
+
+## 📊 Reports
+
+### Comparison Reports
+
+The framework generates comprehensive reports in multiple formats:
+
+- **HTML Report** - Visual summary with color-coded results and detailed breakdowns
+- **CSV Report** - Machine-readable format for Excel or databases
+- **PDF Report** - Professional formatted output (requires weasyprint)
+
+Reports are saved to `./reports/` with timestamps.
+
+### Investigation Reports
+
+- **HTML Report** - Interactive report with collapsible query results, side-by-side baseline vs target comparison
+- **CSV Report** - Flattened structure with one row per query execution
+
+Investigation reports are saved to `./investigation_reports/` with timestamps.
+
+## 🔄 Exit Codes
+
+| Code | Meaning |
+|------|---------|
+| 0 | Success - all comparisons passed |
+| 1 | Failures detected - one or more FAIL results |
+| 2 | Execution error - configuration or connection issues |
+
+## 🧪 Testing
+
+### Docker Test Environment
+
+```bash
+# Start test SQL Server containers
+bash test_data/setup_test_environment.sh
+
+# Test discovery
+drt discover --server localhost,1433 --database TestDB_Baseline --output test.yaml
+
+# Test comparison
+drt compare --config config.test.yaml
+
+# Cleanup
+docker-compose -f docker-compose.test.yml down -v
+```
+
+### Manual Testing
+
+```bash
+# Connect to test databases (use SA_PASSWORD environment variable)
+docker exec -it drt-sqlserver-baseline /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P "$SA_PASSWORD"
+
+# Run queries to verify data
+SELECT COUNT(*) FROM dbo.FactTable1;
+```
+
+## 🚢 Deployment
+
+### Scheduled Execution
+
+**Windows Task Scheduler:**
+```batch
+@echo off
+cd C:\path\to\framework
+call venv\Scripts\activate.bat
+drt compare --config config.yaml
+if %ERRORLEVEL% NEQ 0 (
+    echo Test failed with exit code %ERRORLEVEL%
+    exit /b %ERRORLEVEL%
+)
+```
+
+**Linux Cron:**
+```bash
+# Run daily at 2 AM
+0 2 * * * /path/to/venv/bin/drt compare --config /path/to/config.yaml >> /path/to/logs/cron.log 2>&1
+```
+
+### Monitoring
+
+```bash
+# Watch logs
+tail -f logs/drt_*.log
+
+# Search for failures
+grep -i "FAIL\|ERROR" logs/drt_*.log
+```
+
+## 🏗️ Architecture
+
+```
+src/drt/
+├── cli/              # Command-line interface
+│   └── commands/     # CLI commands (compare, discover, validate, investigate)
+├── config/           # Configuration management
+├── database/         # Database connectivity (READ ONLY)
+├── models/           # Data models
+├── reporting/        # Report generators
+├── services/         # Business logic
+│   ├── checkers/     # Comparison checkers
+│   ├── investigation.py  # Investigation service
+│   └── sql_parser.py     # SQL file parser
+└── utils/            # Utilities
+```
+
+## 🔒 Security
+
+- **Windows Authentication Only** - No stored credentials
+- **Read-Only Operations** - All queries are SELECT-only
+- **Minimal Permissions** - Only requires db_datareader role
+- **No Data Logging** - Sensitive data never logged
+
+## 🔧 Troubleshooting
+
+### Connection Failed
+
+```bash
+# Test connectivity
+drt discover --server <YOUR_SERVER> --database master
+
+# Verify ODBC driver
+odbcinst -q -d
+
+# Check permissions
+# User needs db_datareader role on target databases
+```
+
+### Query Timeout
+
+Increase timeout in configuration:
+```yaml
+baseline:
+  timeout:
+    query: 600  # 10 minutes
+```
+
+### Linux Kerberos Issues
+
+```bash
+# Check ticket
+klist
+
+# Renew if expired
+kinit username@YOUR_DOMAIN.COM
+
+# Verify ticket is valid
+klist
+```
+
+## ⚡ Performance
+
+### Diagnostic Logging
+
+Enable verbose mode to see detailed timing:
+```bash
+drt compare --config config.yaml --verbose
+```
+
+This shows:
+- Per-check timing (existence, row count, schema, aggregates)
+- Query execution times
+- Parallelization opportunities
+
+### Optimization Tips
+
+- Disable aggregate checks for surrogate keys
+- Increase query timeouts for large tables
+- Use table filtering to focus on critical tables
+- Consider parallel execution for multiple database pairs
+
+## 👨‍💻 Development
+
+### Getting Started
+
+1. Fork the repository on GitHub
+2. Clone your fork locally:
+   ```bash
+   git clone https://github.com/your-username/data_regression_testing.git
+   cd data_regression_testing
+   ```
+3. Create a virtual environment:
+   ```bash
+   python -m venv venv
+   source venv/bin/activate  # On Windows: venv\Scripts\activate
+   ```
+4. Install dependencies:
+   ```bash
+   pip install -r requirements.txt
+   pip install -e .
+   ```
+5. Install development dependencies:
+   ```bash
+   pip install pytest pytest-cov black flake8 mypy
+   ```
+
+### Development Workflow
+
+#### 1. Create a Branch
+
+```bash
+git checkout -b feature/your-feature-name
+# or
+git checkout -b bugfix/issue-description
+```
+
+#### 2. Make Your Changes
+
+- Write clean, readable code
+- Follow the existing code style
+- Add docstrings to all functions and classes
+- Update documentation as needed
+
+#### 3. Run Tests
+
+```bash
+# All tests
+pytest
+
+# With coverage
+pytest --cov=src/drt --cov-report=html
+
+# Specific test file
+pytest tests/test_models.py
+```
+
+#### 4. Code Quality Checks
+
+```bash
+# Format code with black
+black src/ tests/
+
+# Check code style with flake8
+flake8 src/ tests/
+
+# Type checking with mypy
+mypy src/
+```
+
+#### 5. Commit Your Changes
+
+Write clear, descriptive commit messages:
+
+```bash
+git add .
+git commit -m "Add feature: description of your changes"
+```
+
+**Commit message guidelines:**
+- Use present tense ("Add feature" not "Added feature")
+- Use imperative mood ("Move cursor to..." not "Moves cursor to...")
+- Limit first line to 72 characters
+- Reference issues and pull requests when relevant
+
+#### 6. Push and Create Pull Request
+
+```bash
+git push origin feature/your-feature-name
+```
+
+Create a pull request on GitHub with:
+- Clear title and description
+- Reference to related issues
+- Screenshots (if applicable)
+- Test results
+
+### Code Style Guidelines
+
+#### Python Style
+
+- Follow PEP 8 style guide
+- Use type hints for function parameters and return values
+- Maximum line length: 100 characters
+- Use meaningful variable and function names
+
+**Example:**
+```python
+def calculate_row_count_difference(
+    baseline_count: int,
+    target_count: int,
+    tolerance_percent: float
+) -> tuple[bool, float]:
+    """
+    Calculate if row count difference is within tolerance.
+    
+    Args:
+        baseline_count: Row count from baseline database
+        target_count: Row count from target database
+        tolerance_percent: Acceptable difference percentage
+        
+    Returns:
+        Tuple of (is_within_tolerance, actual_difference_percent)
+    """
+    # Implementation here
+    pass
+```
+
+#### Documentation
+
+- Add docstrings to all public functions, classes, and modules
+- Use Google-style docstrings
+- Include examples in docstrings when helpful
+- Update README.md for user-facing changes
+
+#### Testing
+
+- Write unit tests for all new functionality
+- Aim for >80% code coverage
+- Use descriptive test names
+- Follow AAA pattern (Arrange, Act, Assert)
+
+**Example:**
+```python
+def test_row_count_checker_exact_match():
+    """Test row count checker with exact match"""
+    # Arrange
+    checker = RowCountChecker(tolerance_percent=0.0)
+    
+    # Act
+    result = checker.check(baseline_count=1000, target_count=1000)
+    
+    # Assert
+    assert result.status == Status.PASS
+    assert result.baseline_value == 1000
+    assert result.target_value == 1000
+```
+
+### Adding New Features
+
+#### New Checker Type
+
+To add a new comparison checker:
+
+1. Create new checker in `src/drt/services/checkers/`
+2. Inherit from `BaseChecker`
+3. Implement `check()` method
+4. Add new `CheckType` enum value
+5. Register in `ComparisonService`
+6. Add tests in `tests/test_checkers.py`
+7. Update documentation
+
+#### New Report Format
+
+To add a new report format:
+
+1. Create new reporter in `src/drt/reporting/`
+2. Implement `generate()` method
+3. Add format option to configuration
+4. Update `ReportGenerator` to use new format
+5. Add tests
+6. Update documentation
+
+### Testing
+
+#### Unit Tests
+
+Run the test suite:
+
+```bash
+# All tests
+pytest
+
+# With coverage report
+pytest --cov=src/drt --cov-report=html
+
+# Specific test file
+pytest tests/test_models.py -v
+
+# Specific test function
+pytest tests/test_models.py::test_status_enum -v
+```
+
+#### Integration Tests
+
+Use the Docker test environment:
+
+```bash
+# Start test databases
+bash test_data/setup_test_environment.sh
+
+# Run integration tests
+drt discover --server localhost,1433 --database TestDB_Baseline --output test.yaml
+drt compare --config config.test.yaml
+
+# Cleanup
+docker-compose -f docker-compose.test.yml down -v
+```
+
+#### Manual Testing
+
+```bash
+# Test against real databases (requires access)
+drt discover --server <YOUR_SERVER> --database <YOUR_DB> --output manual_test.yaml
+drt validate --config manual_test.yaml
+drt compare --config manual_test.yaml --dry-run
+```
+
+### Reporting Issues
+
+When reporting issues, please include:
+
+- Clear description of the problem
+- Steps to reproduce
+- Expected vs actual behavior
+- Environment details (OS, Python version, ODBC driver version)
+- Relevant logs or error messages
+- Configuration file (sanitized - remove server names/credentials)
+
+**Example:**
+```markdown
+**Description:** Row count comparison fails with timeout error
+
+**Steps to Reproduce:**
+1. Configure comparison for large table (>1M rows)
+2. Run `drt compare --config config.yaml`
+3. Observe timeout error
+
+**Expected:** Comparison completes successfully
+**Actual:** Query timeout after 300 seconds
+
+**Environment:**
+- OS: Windows 10
+- Python: 3.9.7
+- ODBC Driver: 17 for SQL Server
+
+**Logs:**
+```
+ERROR: Query timeout on table dbo.FactTable1
+```
+```
+
+### Feature Requests
+
+For feature requests, please:
+
+- Check if feature already exists or is planned
+- Describe the use case clearly
+- Explain why it would be valuable
+- Provide examples if possible
+
+### Code Review Process
+
+All contributions go through code review:
+
+1. Automated checks must pass (tests, linting)
+2. At least one maintainer approval required
+3. Address review feedback promptly
+4. Keep pull requests focused and reasonably sized
+
+### Release Process
+
+Releases follow semantic versioning (MAJOR.MINOR.PATCH):
+
+- **MAJOR** - Breaking changes
+- **MINOR** - New features (backward compatible)
+- **PATCH** - Bug fixes (backward compatible)
+
+### Development Tips
+
+#### Debugging
+
+```bash
+# Enable verbose logging
+drt compare --config config.yaml --verbose
+
+# Use dry-run to test without execution
+drt compare --config config.yaml --dry-run
+
+# Check configuration validity
+drt validate --config config.yaml
+```
+
+#### Performance Profiling
+
+```bash
+# Enable diagnostic logging
+drt compare --config config.yaml --verbose
+
+# Look for timing information in logs
+grep "execution time" logs/drt_*.log
+```
+
+#### Docker Development
+
+```bash
+# Build and test in Docker
+docker build -t drt:dev .
+docker run -v $(pwd)/config.yaml:/app/config.yaml drt:dev compare --config /app/config.yaml
+```
+
+## 📝 License
+
+MIT License - see LICENSE file for details
+
+## 📞 Support
+
+For issues and questions:
+- GitHub Issues: <repository-url>/issues
+- Check logs in `./logs/`
+- Review configuration with `drt validate`
+- Test connectivity with `drt discover`
+
+## 👥 Authors
+
+QA Engineering Team
+
+## 📌 Version
+
+Current version: 1.0.0
--- a/config.example.yaml
+++ b/config.example.yaml
@@ -0,0 +1,286 @@
+# Data Regression Testing Framework - Example Configuration
+# This file demonstrates all available configuration options
+
+# ============================================================================
+# DATABASE PAIRS
+# Define baseline (production) and target (test) database connections
+# ============================================================================
+database_pairs:
+  # Example 1: Data Warehouse Comparison
+  - name: "DWH_Comparison"
+    enabled: true
+    description: "Compare production and test data warehouse"
+    baseline:
+      server: "<YOUR_SERVER_NAME>"
+      database: "<YOUR_BASELINE_DB>"
+      timeout:
+        connection: 30  # seconds
+        query: 300      # seconds (5 minutes)
+    target:
+      server: "<YOUR_SERVER_NAME>"
+      database: "<YOUR_TARGET_DB>"
+      timeout:
+        connection: 30
+        query: 300
+
+  # Example 2: Operational Database Comparison (disabled)
+  - name: "OPS_Comparison"
+    enabled: false
+    description: "Compare operational databases (currently disabled)"
+    baseline:
+      server: "<YOUR_SERVER_NAME>"
+      database: "<YOUR_BASELINE_DB_2>"
+    target:
+      server: "<YOUR_SERVER_NAME>"
+      database: "<YOUR_TARGET_DB_2>"
+
+# ============================================================================
+# COMPARISON SETTINGS
+# Configure what types of comparisons to perform
+# ============================================================================
+comparison:
+  # Comparison mode: "health_check" or "full"
+  # - health_check: Quick validation (row counts, schema)
+  # - full: Comprehensive validation (includes aggregates)
+  mode: "health_check"
+  
+  # Row Count Comparison
+  row_count:
+    enabled: true
+    tolerance_percent: 0.0  # 0% = exact match required
+    # Examples:
+    # 0.0 = exact match
+    # 0.1 = allow 0.1% difference
+    # 1.0 = allow 1% difference
+  
+  # Schema Comparison
+  schema:
+    enabled: true
+    checks:
+      column_names: true    # Verify column names match
+      data_types: true      # Verify data types match
+      nullable: true        # Verify nullable constraints match
+      primary_keys: true    # Verify primary keys match
+  
+  # Aggregate Comparison (sums of numeric columns)
+  aggregates:
+    enabled: true
+    tolerance_percent: 0.01  # 0.01% tolerance for rounding differences
+    # Note: Only applies when mode is "full"
+
+# ============================================================================
+# TABLES TO COMPARE
+# List all tables to include in comparison
+# ============================================================================
+tables:
+  # Example 1: Fact table with aggregates
+  - schema: "dbo"
+    name: "FactTable1"
+    enabled: true
+    expected_in_target: true
+    aggregate_columns:
+      - "Amount1"
+      - "Amount2"
+      - "Amount3"
+      - "Quantity"
+    notes: "Example fact table with numeric aggregates"
+
+  # Example 2: Dimension table without aggregates
+  - schema: "dbo"
+    name: "DimTable1"
+    enabled: true
+    expected_in_target: true
+    aggregate_columns: []
+    notes: "Example dimension table - no numeric aggregates"
+
+  # Example 3: Table expected to be missing in target
+  - schema: "dbo"
+    name: "TempTable1"
+    enabled: true
+    expected_in_target: false
+    aggregate_columns: []
+    notes: "Example temporary table - should not exist in target"
+
+  # Example 4: Disabled table (skipped during comparison)
+  - schema: "dbo"
+    name: "Table4"
+    enabled: false
+    expected_in_target: true
+    aggregate_columns: []
+    notes: "Example disabled table - excluded from comparison"
+
+  # Example 5: Table with multiple schemas
+  - schema: "staging"
+    name: "StagingTable1"
+    enabled: true
+    expected_in_target: true
+    aggregate_columns:
+      - "Amount"
+    notes: "Example staging table"
+
+  # Example 6: Large fact table
+  - schema: "dbo"
+    name: "FactTable2"
+    enabled: true
+    expected_in_target: true
+    aggregate_columns:
+      - "Amount"
+      - "Fee"
+      - "NetAmount"
+    notes: "Example high-volume fact table"
+
+  # Example 7: Reference data table
+  - schema: "ref"
+    name: "RefTable1"
+    enabled: true
+    expected_in_target: true
+    aggregate_columns: []
+    notes: "Example reference data table"
+
+# ============================================================================
+# REPORTING SETTINGS
+# Configure report generation and output
+# ============================================================================
+reporting:
+  # Output directory for reports (use relative path or set via environment variable)
+  output_dir: "./reports"
+  
+  # Output directory for investigation reports (use relative path or set via environment variable)
+  investigation_dir: "./investigation_reports"
+  
+  # Report formats to generate
+  formats:
+    html: true   # Rich HTML report with styling
+    csv: true    # CSV report for Excel/analysis
+    pdf: false   # PDF report (requires weasyprint)
+  
+  # Report naming
+  filename_prefix: "regression_test"
+  include_timestamp: true  # Append YYYYMMDD_HHMMSS to filename
+  
+  # Report content options
+  include_passed: true     # Include passed checks in report
+  include_warnings: true   # Include warnings in report
+  summary_only: false      # Only show summary (no details)
+
+# ============================================================================
+# LOGGING SETTINGS
+# Configure logging behavior
+# ============================================================================
+logging:
+  # Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL
+  level: "INFO"
+  
+  # Log output directory (use relative path or set via environment variable)
+  output_dir: "./logs"
+  
+  # Log file naming
+  filename_prefix: "drt"
+  include_timestamp: true
+  
+  # Console output
+  console:
+    enabled: true
+    level: "INFO"
+    colored: true  # Use colored output (if terminal supports it)
+  
+  # File output
+  file:
+    enabled: true
+    level: "DEBUG"
+    max_size_mb: 10      # Rotate after 10MB
+    backup_count: 5      # Keep 5 backup files
+
+# ============================================================================
+# EXECUTION SETTINGS
+# Configure execution behavior
+# ============================================================================
+execution:
+  # Parallel execution (future feature)
+  parallel:
+    enabled: false
+    max_workers: 4
+  
+  # Retry settings for transient failures
+  retry:
+    enabled: true
+    max_attempts: 3
+    delay_seconds: 5
+  
+  # Performance settings
+  performance:
+    batch_size: 1000        # Rows per batch for large queries
+    use_nolock: true        # Use NOLOCK hints (read uncommitted)
+    connection_pooling: true
+
+# ============================================================================
+# FILTERS
+# Global filters applied to all tables
+# ============================================================================
+filters:
+  # Schema filters (include/exclude patterns)
+  schemas:
+    include:
+      - "dbo"
+      - "staging"
+      - "ref"
+    exclude:
+      - "sys"
+      - "temp"
+  
+  # Table name filters (wildcard patterns)
+  tables:
+    include:
+      - "*"  # Include all tables
+    exclude:
+      - "tmp_*"      # Exclude temporary tables
+      - "backup_*"   # Exclude backup tables
+      - "archive_*"  # Exclude archive tables
+  
+  # Column filters for aggregate comparisons
+  columns:
+    exclude_patterns:
+      - "*_id"       # Exclude ID columns
+      - "*_key"      # Exclude key columns
+      - "created_*"  # Exclude audit columns
+      - "modified_*" # Exclude audit columns
+
+# ============================================================================
+# NOTIFICATIONS (future feature)
+# Configure notifications for test results
+# ============================================================================
+notifications:
+  enabled: false
+  
+  # Email notifications
+  email:
+    enabled: false
+    smtp_server: "smtp.company.com"
+    smtp_port: 587
+    from_address: "drt@company.com"
+    to_addresses:
+      - "qa-team@company.com"
+    on_failure_only: true
+  
+  # Slack notifications
+  slack:
+    enabled: false
+    webhook_url: "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
+    channel: "#qa-alerts"
+    on_failure_only: true
+
+# ============================================================================
+# METADATA
+# Optional metadata about this configuration
+# ============================================================================
+metadata:
+  version: "1.0"
+  created_by: "QA Team"
+  created_date: "2024-01-15"
+  description: "Standard regression test configuration for DWH migration"
+  project: "DWH Migration Phase 2"
+  environment: "UAT"
+  tags:
+    - "migration"
+    - "data-quality"
+    - "regression"
--- a/config.quickstart.yaml
+++ b/config.quickstart.yaml
@@ -0,0 +1,46 @@
+# Quick Start Configuration
+# Minimal configuration to get started quickly
+
+database_pairs:
+  - name: "Quick_Test"
+    enabled: true
+    baseline:
+      server: "YOUR_SERVER_NAME"
+      database: "YOUR_BASELINE_DB"
+    target:
+      server: "YOUR_SERVER_NAME"
+      database: "YOUR_TARGET_DB"
+
+comparison:
+  mode: "health_check"
+  row_count:
+    enabled: true
+    tolerance_percent: 0.0
+  schema:
+    enabled: true
+    checks:
+      column_names: true
+      data_types: true
+  aggregates:
+    enabled: false
+
+tables:
+  # Add your tables here after running discovery
+  # Example:
+  # - schema: "dbo"
+  #   name: "YourTable"
+  #   enabled: true
+  #   expected_in_target: true
+  #   aggregate_columns: []
+
+reporting:
+  output_dir: "./reports"
+  investigation_dir: "./investigation_reports"
+  formats:
+    html: true
+    csv: true
+    pdf: false
+
+logging:
+  level: "INFO"
+  output_dir: "./logs"
--- a/config.test.yaml
+++ b/config.test.yaml
@@ -0,0 +1,83 @@
+# Test Configuration for Docker SQL Server Environment
+# Use this configuration with the Docker test environment
+
+database_pairs:
+  - name: "Docker_Test_Comparison"
+    enabled: true
+    description: "Compare Docker test databases"
+    baseline:
+      server: "localhost,1433"
+      database: "TestDB_Baseline"
+      # Use environment variables for credentials: DRT_DB_USERNAME, DRT_DB_PASSWORD
+      # username: "${DRT_DB_USERNAME}"
+      # password: "${DRT_DB_PASSWORD}"
+      timeout:
+        connection: 30
+        query: 300
+    target:
+      server: "localhost,1434"
+      database: "TestDB_Target"
+      # Use environment variables for credentials: DRT_DB_USERNAME, DRT_DB_PASSWORD
+      # username: "${DRT_DB_USERNAME}"
+      # password: "${DRT_DB_PASSWORD}"
+      timeout:
+        connection: 30
+        query: 300
+
+comparison:
+  mode: "health_check"
+  row_count:
+    enabled: true
+    tolerance_percent: 0.0
+  schema:
+    enabled: true
+    checks:
+      column_names: true
+      data_types: true
+  aggregates:
+    enabled: true
+    tolerance_percent: 0.01
+
+tables:
+  - schema: "dbo"
+    name: "DimTable1"
+    enabled: true
+    expected_in_target: true
+    aggregate_columns: []
+    notes: "Example dimension table"
+
+  - schema: "dbo"
+    name: "DimTable2"
+    enabled: true
+    expected_in_target: true
+    aggregate_columns: []
+    notes: "Example dimension table with schema differences"
+
+  - schema: "dbo"
+    name: "FactTable1"
+    enabled: true
+    expected_in_target: true
+    aggregate_columns:
+      - "Quantity"
+      - "Amount"
+      - "Tax"
+    notes: "Example fact table with numeric aggregates"
+
+  - schema: "dbo"
+    name: "TempTable1"
+    enabled: true
+    expected_in_target: false
+    aggregate_columns: []
+    notes: "Example temporary table - only exists in target"
+
+reporting:
+  output_directory: "/home/user/reports"
+  investigation_directory: "/home/user/investigation_reports"
+  formats: ["html", "csv"]
+  filename_template: "test_regression_{timestamp}"
+
+logging:
+  level: "INFO"
+  directory: "/home/user/logs"
+  filename_template: "drt_test_{timestamp}.log"
+  console: true
--- a/config/.gitkeep
+++ b/config/.gitkeep
--- a/docker-compose.test.yml
+++ b/docker-compose.test.yml
@@ -0,0 +1,52 @@
+version: '3.8'
+
+services:
+  # SQL Server 2022 - Baseline (Production)
+  sqlserver-baseline:
+    image: mcr.microsoft.com/mssql/server:2022-latest
+    container_name: drt-sqlserver-baseline
+    environment:
+      - ACCEPT_EULA=Y
+      - SA_PASSWORD=${SA_PASSWORD:-YourStrong!Passw0rd}
+      - MSSQL_PID=Developer
+    ports:
+      - "1433:1433"
+    volumes:
+      - ./test_data/init_baseline.sql:/docker-entrypoint-initdb.d/init.sql
+      - sqlserver_baseline_data:/var/opt/mssql
+    healthcheck:
+      test: /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P ${SA_PASSWORD:-YourStrong!Passw0rd} -Q "SELECT 1"
+      interval: 10s
+      timeout: 5s
+      retries: 5
+    networks:
+      - drt-network
+
+  # SQL Server 2022 - Target (Test)
+  sqlserver-target:
+    image: mcr.microsoft.com/mssql/server:2022-latest
+    container_name: drt-sqlserver-target
+    environment:
+      - ACCEPT_EULA=Y
+      - SA_PASSWORD=${SA_PASSWORD:-YourStrong!Passw0rd}
+      - MSSQL_PID=Developer
+    ports:
+      - "1434:1433"
+    volumes:
+      - ./test_data/init_target.sql:/docker-entrypoint-initdb.d/init.sql
+      - sqlserver_target_data:/var/opt/mssql
+    healthcheck:
+      test: /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P ${SA_PASSWORD:-YourStrong!Passw0rd} -Q "SELECT 1"
+      interval: 10s
+      timeout: 5s
+      retries: 5
+    networks:
+      - drt-network
+
+volumes:
+  sqlserver_baseline_data:
+  sqlserver_target_data:
+
+networks:
+  drt-network:
+    driver: bridge
--- a/install_docker_debian.sh
+++ b/install_docker_debian.sh
@@ -0,0 +1,121 @@
+#!/bin/bash
+# Docker Installation Script for Debian 12
+
+set -e
+
+echo "=========================================="
+echo "Docker Installation for Debian 12"
+echo "=========================================="
+echo ""
+
+# Check if running as root
+if [ "$EUID" -ne 0 ]; then 
+    echo "Please run with sudo: sudo bash install_docker_debian.sh"
+    exit 1
+fi
+
+# Detect OS
+if [ -f /etc/os-release ]; then
+    . /etc/os-release
+    OS=$ID
+    VER=$VERSION_ID
+    echo "Detected OS: $PRETTY_NAME"
+else
+    echo "Cannot detect OS version"
+    exit 1
+fi
+
+# Remove old versions
+echo ""
+echo "Step 1: Removing old Docker versions (if any)..."
+apt-get remove -y docker docker-engine docker.io containerd runc 2>/dev/null || true
+
+# Install prerequisites
+echo ""
+echo "Step 2: Installing prerequisites..."
+apt-get update
+apt-get install -y \
+    ca-certificates \
+    curl \
+    gnupg \
+    lsb-release
+
+# Add Docker's official GPG key
+echo ""
+echo "Step 3: Adding Docker GPG key..."
+install -m 0755 -d /etc/apt/keyrings
+curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
+chmod a+r /etc/apt/keyrings/docker.gpg
+
+# Set up Docker repository
+echo ""
+echo "Step 4: Adding Docker repository..."
+echo \
+  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
+  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
+  tee /etc/apt/sources.list.d/docker.list > /dev/null
+
+# Install Docker Engine
+echo ""
+echo "Step 5: Installing Docker Engine..."
+apt-get update
+apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
+
+# Start Docker service
+echo ""
+echo "Step 6: Starting Docker service..."
+systemctl start docker
+systemctl enable docker
+
+# Add current user to docker group (if not root)
+if [ -n "$SUDO_USER" ]; then
+    echo ""
+    echo "Step 7: Adding user $SUDO_USER to docker group..."
+    usermod -aG docker $SUDO_USER
+    echo "Note: You'll need to log out and back in for group changes to take effect"
+fi
+
+# Verify installation
+echo ""
+echo "Step 8: Verifying Docker installation..."
+if docker --version; then
+    echo "✓ Docker installed successfully"
+else
+    echo "✗ Docker installation failed"
+    exit 1
+fi
+
+if docker compose version; then
+    echo "✓ Docker Compose installed successfully"
+else
+    echo "✗ Docker Compose installation failed"
+    exit 1
+fi
+
+# Test Docker
+echo ""
+echo "Step 9: Testing Docker..."
+if docker run --rm hello-world > /dev/null 2>&1; then
+    echo "✓ Docker is working correctly"
+else
+    echo "⚠ Docker test failed - you may need to log out and back in"
+fi
+
+echo ""
+echo "=========================================="
+echo "Installation completed successfully!"
+echo "=========================================="
+echo ""
+echo "Docker version:"
+docker --version
+echo ""
+echo "Docker Compose version:"
+docker compose version
+echo ""
+echo "IMPORTANT: If you're not root, log out and back in for group changes to take effect"
+echo ""
+echo "Next steps:"
+echo "1. Log out and back in (or run: newgrp docker)"
+echo "2. Test Docker: docker run hello-world"
+echo "3. Set up test environment: bash test_data/setup_test_environment.sh"
+echo ""
--- a/install_odbc_debian.sh
+++ b/install_odbc_debian.sh
@@ -0,0 +1,112 @@
+#!/bin/bash
+# ODBC Driver Installation Script for Debian 12
+# This script installs Microsoft ODBC Driver 18 for SQL Server
+
+set -e
+
+echo "=========================================="
+echo "ODBC Driver Installation for Debian 12"
+echo "=========================================="
+echo ""
+
+# Check if running as root
+if [ "$EUID" -ne 0 ]; then 
+    echo "Please run with sudo: sudo bash install_odbc_debian.sh"
+    exit 1
+fi
+
+# Detect OS
+if [ -f /etc/os-release ]; then
+    . /etc/os-release
+    OS=$ID
+    VER=$VERSION_ID
+    echo "Detected OS: $PRETTY_NAME"
+else
+    echo "Cannot detect OS version"
+    exit 1
+fi
+
+# Clean up any corrupted repository files
+echo ""
+echo "Step 1: Cleaning up any previous installation attempts..."
+if [ -f /etc/apt/sources.list.d/mssql-release.list ]; then
+    echo "Removing corrupted mssql-release.list..."
+    rm -f /etc/apt/sources.list.d/mssql-release.list
+fi
+
+# Install prerequisites
+echo ""
+echo "Step 2: Installing prerequisites..."
+apt-get update
+apt-get install -y curl gnupg2 apt-transport-https ca-certificates
+
+# Add Microsoft GPG key
+echo ""
+echo "Step 3: Adding Microsoft GPG key..."
+curl -fsSL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor -o /usr/share/keyrings/microsoft-prod.gpg
+
+# Add Microsoft repository based on OS
+echo ""
+echo "Step 4: Adding Microsoft repository..."
+if [ "$OS" = "debian" ]; then
+    if [ "$VER" = "12" ]; then
+        curl https://packages.microsoft.com/config/debian/12/prod.list | tee /etc/apt/sources.list.d/mssql-release.list
+    elif [ "$VER" = "11" ]; then
+        curl https://packages.microsoft.com/config/debian/11/prod.list | tee /etc/apt/sources.list.d/mssql-release.list
+    else
+        echo "Unsupported Debian version: $VER"
+        exit 1
+    fi
+elif [ "$OS" = "ubuntu" ]; then
+    curl https://packages.microsoft.com/config/ubuntu/$VER/prod.list | tee /etc/apt/sources.list.d/mssql-release.list
+else
+    echo "Unsupported OS: $OS"
+    exit 1
+fi
+
+# Update package list
+echo ""
+echo "Step 5: Updating package list..."
+apt-get update
+
+# Install ODBC Driver
+echo ""
+echo "Step 6: Installing ODBC Driver 18 for SQL Server..."
+ACCEPT_EULA=Y apt-get install -y msodbcsql18
+
+# Install unixODBC development headers
+echo ""
+echo "Step 7: Installing unixODBC development headers..."
+apt-get install -y unixodbc-dev
+
+# Verify installation
+echo ""
+echo "Step 8: Verifying installation..."
+if odbcinst -q -d -n "ODBC Driver 18 for SQL Server" > /dev/null 2>&1; then
+    echo "✓ ODBC Driver 18 for SQL Server installed successfully"
+    odbcinst -q -d -n "ODBC Driver 18 for SQL Server"
+else
+    echo "✗ ODBC Driver installation failed"
+    exit 1
+fi
+
+# Check for ODBC Driver 17 as fallback
+if odbcinst -q -d -n "ODBC Driver 17 for SQL Server" > /dev/null 2>&1; then
+    echo "✓ ODBC Driver 17 for SQL Server also available"
+fi
+
+echo ""
+echo "=========================================="
+echo "Installation completed successfully!"
+echo "=========================================="
+echo ""
+echo "Next steps:"
+echo "1. Install Python dependencies: pip install -r requirements.txt"
+echo "2. Install the framework: pip install -e ."
+echo "3. Test the installation: drt --version"
+echo ""
+echo "For Windows Authentication, you'll also need to:"
+echo "1. Install Kerberos: apt-get install -y krb5-user"
+echo "2. Configure /etc/krb5.conf with your domain settings"
+echo "3. Get a Kerberos ticket: kinit username@YOUR_DOMAIN.COM"
+echo ""
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,73 @@
+[project]
+name = "data-regression-tester"
+version = "1.0.0"
+description = "Data Regression Testing Framework for SQL Server"
+readme = "README.md"
+requires-python = ">=3.9"
+license = {text = "MIT"}
+authors = [
+    {name = "QA Engineering Team"}
+]
+keywords = ["data", "regression", "testing", "sql-server", "comparison"]
+classifiers = [
+    "Development Status :: 4 - Beta",
+    "Environment :: Console",
+    "Intended Audience :: Developers",
+    "Operating System :: Microsoft :: Windows",
+    "Programming Language :: Python :: 3.9",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Topic :: Database",
+    "Topic :: Software Development :: Testing",
+]
+
+dependencies = [
+    "pandas>=2.0",
+    "sqlalchemy>=2.0",
+    "pyodbc>=4.0",
+    "pyyaml>=6.0",
+    "pydantic>=2.0",
+    "click>=8.0",
+    "rich>=13.0",
+    "jinja2>=3.0",
+    "weasyprint>=60.0",
+]
+
+[project.optional-dependencies]
+dev = [
+    "pytest>=7.0",
+    "pytest-cov>=4.0",
+    "black>=23.0",
+    "ruff>=0.1.0",
+    "mypy>=1.0",
+    "pre-commit>=3.0",
+]
+
+[project.scripts]
+drt = "drt.cli.main:cli"
+
+[build-system]
+requires = ["setuptools>=61.0", "wheel"]
+build-backend = "setuptools.build_meta"
+
+[tool.setuptools.packages.find]
+where = ["src"]
+
+[tool.black]
+line-length = 100
+target-version = ["py39", "py310", "py311", "py312"]
+
+[tool.ruff]
+line-length = 100
+select = ["E", "F", "W", "I", "N", "UP", "B", "C4"]
+
+[tool.mypy]
+python_version = "3.9"
+warn_return_any = true
+warn_unused_configs = true
+ignore_missing_imports = true
+
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+addopts = "-v --cov=drt --cov-report=term-missing"
--- a/pytest.ini
+++ b/pytest.ini
@@ -0,0 +1,14 @@
+[pytest]
+testpaths = tests
+python_files = test_*.py
+python_classes = Test*
+python_functions = test_*
+addopts = 
+    -v
+    --strict-markers
+    --tb=short
+    --disable-warnings
+markers =
+    unit: Unit tests
+    integration: Integration tests
+    slow: Slow running tests
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,9 @@
+pandas>=2.0
+sqlalchemy>=2.0
+pyodbc>=4.0
+pyyaml>=6.0
+pydantic>=2.0
+click>=8.0
+rich>=13.0
+jinja2>=3.0
+weasyprint>=60.0
--- a/src/drt/init.py
+++ b/src/drt/init.py
@@ -0,0 +1,14 @@
+"""
+Data Regression Testing Framework
+
+A comprehensive framework for validating data integrity during code migration
+and system updates by comparing data outputs between Baseline (Production)
+and Target (Test) SQL Server databases.
+"""
+
+__version__ = "1.0.0"
+__author__ = "QA Engineering Team"
+
+from drt.models.enums import Status, CheckType
+
+__all__ = ["__version__", "__author__", "Status", "CheckType"]
--- a/src/drt/main.py
+++ b/src/drt/main.py
@@ -0,0 +1,11 @@
+"""
+Entry point for running the framework as a module.
+
+Usage:
+    python -m drt <command> [options]
+"""
+
+from drt.cli.main import cli
+
+if __name__ == "__main__":
+    cli()
--- a/src/drt/cli/init.py
+++ b/src/drt/cli/init.py
@@ -0,0 +1,5 @@
+"""Command-line interface for the framework."""
+
+from drt.cli.main import cli
+
+__all__ = ["cli"]
--- a/src/drt/cli/commands/init.py
+++ b/src/drt/cli/commands/init.py
@@ -0,0 +1,5 @@
+"""CLI commands."""
+
+from drt.cli.commands import discover, compare, validate, investigate
+
+__all__ = ["discover", "compare", "validate", "investigate"]
--- a/src/drt/cli/commands/compare.py
+++ b/src/drt/cli/commands/compare.py
@@ -0,0 +1,137 @@
+"""Compare command implementation."""
+
+import click
+import sys
+from pathlib import Path
+from drt.config.loader import load_config
+from drt.services.comparison import ComparisonService
+from drt.reporting.generator import ReportGenerator
+from drt.utils.logging import setup_logging, get_logger
+from drt.utils.timestamps import format_duration
+
+logger = get_logger(__name__)
+
+
+@click.command()
+@click.option('--config', '-c', required=True, type=click.Path(exists=True), help='Configuration file path')
+@click.option('--verbose', '-v', is_flag=True, help='Enable verbose output')
+@click.option('--dry-run', is_flag=True, help='Show what would be compared without executing')
+def compare(config, verbose, dry_run):
+    """
+    Execute comparison between Baseline and Target databases.
+    
+    Compares configured tables between baseline and target databases,
+    checking for data regression issues.
+    
+    Example:
+        drt compare --config ./config.yaml
+    """
+    # Load config first to get log directory
+    from drt.config.loader import load_config
+    cfg = load_config(config)
+    
+    # Setup logging using config
+    log_level = "DEBUG" if verbose else "INFO"
+    log_dir = cfg.logging.directory
+    setup_logging(log_level=log_level, log_dir=log_dir, log_to_file=not dry_run)
+    
+    click.echo("=" * 60)
+    click.echo("Data Regression Testing Framework")
+    click.echo("=" * 60)
+    click.echo()
+    
+    try:
+        # Load configuration
+        click.echo(f"Loading configuration: {config}")
+        cfg = load_config(config)
+        click.echo(f"✓ Configuration loaded")
+        click.echo(f"  Database pairs: {len(cfg.database_pairs)}")
+        click.echo(f"  Tables configured: {len(cfg.tables)}")
+        click.echo()
+        
+        if dry_run:
+            click.echo("=" * 60)
+            click.echo("DRY RUN - Preview Only")
+            click.echo("=" * 60)
+            
+            for pair in cfg.database_pairs:
+                if not pair.enabled:
+                    continue
+                    
+                click.echo(f"\nDatabase Pair: {pair.name}")
+                click.echo(f"  Baseline: {pair.baseline.server}.{pair.baseline.database}")
+                click.echo(f"  Target:   {pair.target.server}.{pair.target.database}")
+                
+                # Count enabled tables
+                enabled_tables = [t for t in cfg.tables if t.enabled]
+                click.echo(f"  Tables to compare: {len(enabled_tables)}")
+            
+            click.echo("\n" + "=" * 60)
+            click.echo("Use without --dry-run to execute comparison")
+            click.echo("=" * 60)
+            sys.exit(0)
+        
+        # Execute comparison for each database pair
+        all_summaries = []
+        
+        for pair in cfg.database_pairs:
+            if not pair.enabled:
+                click.echo(f"Skipping disabled pair: {pair.name}")
+                continue
+            
+            click.echo(f"Comparing: {pair.name}")
+            click.echo(f"  Baseline: {pair.baseline.server}.{pair.baseline.database}")
+            click.echo(f"  Target:   {pair.target.server}.{pair.target.database}")
+            click.echo()
+            
+            # Run comparison
+            comparison_service = ComparisonService(cfg)
+            summary = comparison_service.run_comparison(pair)
+            all_summaries.append(summary)
+            
+            click.echo()
+        
+        # Generate reports for all summaries
+        if all_summaries:
+            click.echo("=" * 60)
+            click.echo("Generating Reports")
+            click.echo("=" * 60)
+            
+            report_gen = ReportGenerator(cfg)
+            
+            for summary in all_summaries:
+                report_files = report_gen.generate_reports(summary)
+                
+                for filepath in report_files:
+                    click.echo(f"  ✓ {filepath}")
+            
+            click.echo()
+        
+        # Display final summary
+        click.echo("=" * 60)
+        click.echo("EXECUTION COMPLETE")
+        click.echo("=" * 60)
+        
+        total_passed = sum(s.passed for s in all_summaries)
+        total_failed = sum(s.failed for s in all_summaries)
+        total_warnings = sum(s.warnings for s in all_summaries)
+        total_errors = sum(s.errors for s in all_summaries)
+        
+        click.echo(f"  PASS:    {total_passed:3d}")
+        click.echo(f"  FAIL:    {total_failed:3d}")
+        click.echo(f"  WARNING: {total_warnings:3d}")
+        click.echo(f"  ERROR:   {total_errors:3d}")
+        click.echo("=" * 60)
+        
+        # Exit with appropriate code
+        if total_errors > 0 or total_failed > 0:
+            click.echo("Status: FAILED ❌")
+            sys.exit(1)
+        else:
+            click.echo("Status: PASSED ✓")
+            sys.exit(0)
+        
+    except Exception as e:
+        logger.error(f"Comparison failed: {e}", exc_info=verbose)
+        click.echo(f"✗ Error: {e}", err=True)
+        sys.exit(2)
--- a/src/drt/cli/commands/discover.py
+++ b/src/drt/cli/commands/discover.py
@@ -0,0 +1,118 @@
+"""Discovery command implementation."""
+
+import click
+import sys
+from drt.services.discovery import DiscoveryService
+from drt.config.models import ConnectionConfig, Config
+from drt.config.loader import save_config
+from drt.utils.logging import setup_logging, get_logger
+
+logger = get_logger(__name__)
+
+
+@click.command()
+@click.option('--server', required=True, help='SQL Server hostname or instance')
+@click.option('--database', required=True, help='Database name to discover')
+@click.option('--output', '-o', default='./config_discovered.yaml', help='Output configuration file')
+@click.option('--schemas', multiple=True, help='Specific schemas to include (can specify multiple)')
+@click.option('--verbose', '-v', is_flag=True, help='Enable verbose output')
+def discover(server, database, output, schemas, verbose):
+    """
+    Discover tables and generate configuration file.
+    
+    Scans the specified database and automatically generates a configuration
+    file with all discovered tables, columns, and metadata.
+    
+    Example:
+        drt discover --server SQLSERVER01 --database ORBIS_DWH_PROD
+    """
+    # Setup logging
+    log_level = "DEBUG" if verbose else "INFO"
+    setup_logging(log_level=log_level)
+    
+    click.echo("=" * 60)
+    click.echo("Data Regression Testing Framework - Discovery Mode")
+    click.echo("=" * 60)
+    click.echo()
+    
+    try:
+        # Create connection config
+        conn_config = ConnectionConfig(
+            server=server,
+            database=database
+        )
+        
+        # Create base config with schema filters if provided
+        config = Config()
+        if schemas:
+            config.discovery.include_schemas = list(schemas)
+        
+        # Initialize discovery service
+        click.echo(f"Connecting to {server}.{database}...")
+        discovery_service = DiscoveryService(conn_config, config)
+        
+        # Test connection
+        if not discovery_service.conn_mgr.test_connection():
+            click.echo("✗ Connection failed", err=True)
+            sys.exit(2)
+        
+        click.echo("✓ Connected (Windows Authentication)")
+        click.echo()
+        
+        # Discover tables
+        click.echo("Scanning tables...")
+        tables = discovery_service.discover_tables()
+        
+        if not tables:
+            click.echo("⚠ No tables found", err=True)
+            sys.exit(0)
+        
+        click.echo(f"✓ Found {len(tables)} tables")
+        click.echo()
+        
+        # Generate configuration
+        click.echo("Generating configuration...")
+        generated_config = discovery_service.generate_config(tables)
+        
+        # Save configuration
+        save_config(generated_config, output)
+        click.echo(f"✓ Configuration saved to: {output}")
+        click.echo()
+        
+        # Display summary
+        click.echo("=" * 60)
+        click.echo("Discovery Summary")
+        click.echo("=" * 60)
+        click.echo(f"  Tables discovered: {len(tables)}")
+        
+        # Count columns
+        total_cols = sum(len(t.columns) for t in tables)
+        click.echo(f"  Total columns:     {total_cols}")
+        
+        # Count numeric columns
+        numeric_cols = sum(len(t.aggregate_columns) for t in tables)
+        click.echo(f"  Numeric columns:   {numeric_cols}")
+        
+        # Show largest tables
+        if tables:
+            largest = sorted(tables, key=lambda t: t.estimated_row_count, reverse=True)[:3]
+            click.echo()
+            click.echo("  Largest tables:")
+            for table in largest:
+                click.echo(f"    • {table.full_name:40s} {table.estimated_row_count:>12,} rows")
+        
+        click.echo()
+        click.echo("=" * 60)
+        click.echo("Next Steps:")
+        click.echo(f"  1. Review {output}")
+        click.echo("  2. Configure target database connection")
+        click.echo("  3. Set 'expected_in_target: false' for tables being removed")
+        click.echo(f"  4. Run: drt compare --config {output}")
+        click.echo("=" * 60)
+        
+        sys.exit(0)
+        
+    except Exception as e:
+        logger.error(f"Discovery failed: {e}", exc_info=verbose)
+        click.echo(f"✗ Error: {e}", err=True)
+        sys.exit(2)
--- a/src/drt/cli/commands/investigate.py
+++ b/src/drt/cli/commands/investigate.py
@@ -0,0 +1,177 @@
+"""Investigate command implementation."""
+
+import click
+import sys
+from pathlib import Path
+from drt.config.loader import load_config
+from drt.services.investigation import InvestigationService
+from drt.reporting.investigation_report import (
+    InvestigationHTMLReportGenerator,
+    InvestigationCSVReportGenerator
+)
+from drt.utils.logging import setup_logging, get_logger
+from drt.utils.timestamps import get_timestamp
+
+logger = get_logger(__name__)
+
+
+@click.command()
+@click.option('--analysis-dir', '-a', required=True, type=click.Path(exists=True),
+              help='Analysis output directory containing *_investigate.sql files')
+@click.option('--config', '-c', required=True, type=click.Path(exists=True),
+              help='Configuration file path')
+@click.option('--output-dir', '-o', default=None,
+              help='Output directory for reports (overrides config setting)')
+@click.option('--verbose', '-v', is_flag=True, help='Enable verbose output')
+@click.option('--dry-run', is_flag=True, help='Show what would be executed without running')
+def investigate(analysis_dir, config, output_dir, verbose, dry_run):
+    """
+    Execute investigation queries from regression analysis.
+    
+    Processes all *_investigate.sql files in the analysis directory,
+    executes queries on both baseline and target databases, and
+    generates comprehensive reports.
+    
+    Example:
+        drt investigate -a /home/user/analysis/output_20251209_184032/ -c config.yaml
+    """
+    # Load config first to get log directory
+    from drt.config.loader import load_config
+    cfg = load_config(config)
+    
+    # Setup logging using config
+    log_level = "DEBUG" if verbose else "INFO"
+    log_dir = cfg.logging.directory
+    setup_logging(log_level=log_level, log_dir=log_dir, log_to_file=not dry_run)
+    
+    click.echo("=" * 60)
+    click.echo("Data Regression Testing Framework - Investigation")
+    click.echo("=" * 60)
+    click.echo()
+    
+    try:
+        # Use output_dir from CLI if provided, otherwise use config
+        if output_dir is None:
+            output_dir = cfg.reporting.investigation_directory
+        
+        click.echo(f"✓ Configuration loaded")
+        click.echo(f"  Database pairs: {len(cfg.database_pairs)}")
+        click.echo()
+        
+        # Convert paths
+        analysis_path = Path(analysis_dir)
+        output_path = Path(output_dir)
+        
+        # Create output directory
+        output_path.mkdir(parents=True, exist_ok=True)
+        
+        if dry_run:
+            click.echo("=" * 60)
+            click.echo("DRY RUN - Preview Only")
+            click.echo("=" * 60)
+            
+            # Discover SQL files
+            from drt.services.sql_parser import discover_sql_files
+            sql_files = discover_sql_files(analysis_path)
+            
+            click.echo(f"\nAnalysis Directory: {analysis_path}")
+            click.echo(f"Found {len(sql_files)} investigation SQL files")
+            
+            if sql_files:
+                click.echo("\nTables with investigation queries:")
+                for schema, table, sql_path in sql_files[:10]:  # Show first 10
+                    click.echo(f"  • {schema}.{table}")
+                
+                if len(sql_files) > 10:
+                    click.echo(f"  ... and {len(sql_files) - 10} more")
+            
+            for pair in cfg.database_pairs:
+                if not pair.enabled:
+                    continue
+                
+                click.echo(f"\nDatabase Pair: {pair.name}")
+                click.echo(f"  Baseline: {pair.baseline.server}.{pair.baseline.database}")
+                click.echo(f"  Target:   {pair.target.server}.{pair.target.database}")
+            
+            click.echo(f"\nReports would be saved to: {output_path}")
+            click.echo("\n" + "=" * 60)
+            click.echo("Use without --dry-run to execute investigation")
+            click.echo("=" * 60)
+            sys.exit(0)
+        
+        # Execute investigation for each database pair
+        all_summaries = []
+        
+        for pair in cfg.database_pairs:
+            if not pair.enabled:
+                click.echo(f"Skipping disabled pair: {pair.name}")
+                continue
+            
+            click.echo(f"Investigating: {pair.name}")
+            click.echo(f"  Baseline: {pair.baseline.server}.{pair.baseline.database}")
+            click.echo(f"  Target:   {pair.target.server}.{pair.target.database}")
+            click.echo()
+            
+            # Run investigation
+            investigation_service = InvestigationService(cfg)
+            summary = investigation_service.run_investigation(analysis_path, pair)
+            all_summaries.append(summary)
+            
+            click.echo()
+        
+        # Generate reports for all summaries
+        if all_summaries:
+            click.echo("=" * 60)
+            click.echo("Generating Reports")
+            click.echo("=" * 60)
+            
+            for summary in all_summaries:
+                timestamp = get_timestamp()
+                
+                # Generate HTML report
+                html_gen = InvestigationHTMLReportGenerator(cfg)
+                html_path = output_path / f"investigation_report_{timestamp}.html"
+                html_gen.generate(summary, html_path)
+                click.echo(f"  ✓ HTML: {html_path}")
+                
+                # Generate CSV report
+                csv_gen = InvestigationCSVReportGenerator(cfg)
+                csv_path = output_path / f"investigation_report_{timestamp}.csv"
+                csv_gen.generate(summary, csv_path)
+                click.echo(f"  ✓ CSV:  {csv_path}")
+            
+            click.echo()
+        
+        # Display final summary
+        click.echo("=" * 60)
+        click.echo("INVESTIGATION COMPLETE")
+        click.echo("=" * 60)
+        
+        total_processed = sum(s.tables_processed for s in all_summaries)
+        total_successful = sum(s.tables_successful for s in all_summaries)
+        total_partial = sum(s.tables_partial for s in all_summaries)
+        total_failed = sum(s.tables_failed for s in all_summaries)
+        total_queries = sum(s.total_queries_executed for s in all_summaries)
+        
+        click.echo(f"  Tables Processed:  {total_processed:3d}")
+        click.echo(f"  Successful:        {total_successful:3d}")
+        click.echo(f"  Partial:           {total_partial:3d}")
+        click.echo(f"  Failed:            {total_failed:3d}")
+        click.echo(f"  Total Queries:     {total_queries:3d}")
+        click.echo("=" * 60)
+        
+        # Exit with appropriate code
+        if total_failed > 0:
+            click.echo("Status: COMPLETED WITH FAILURES ⚠️")
+            sys.exit(1)
+        elif total_partial > 0:
+            click.echo("Status: COMPLETED WITH PARTIAL RESULTS ◐")
+            sys.exit(0)
+        else:
+            click.echo("Status: SUCCESS ✓")
+            sys.exit(0)
+        
+    except Exception as e:
+        logger.error(f"Investigation failed: {e}", exc_info=verbose)
+        click.echo(f"✗ Error: {e}", err=True)
+        sys.exit(2)
--- a/src/drt/cli/commands/validate.py
+++ b/src/drt/cli/commands/validate.py
@@ -0,0 +1,92 @@
+"""Validate command implementation."""
+
+import click
+import sys
+from drt.config.loader import load_config
+from drt.config.validator import validate_config
+from drt.utils.logging import setup_logging, get_logger
+
+logger = get_logger(__name__)
+
+
+@click.command()
+@click.option('--config', '-c', required=True, type=click.Path(exists=True), help='Configuration file path')
+@click.option('--verbose', '-v', is_flag=True, help='Enable verbose output')
+def validate(config, verbose):
+    """
+    Validate configuration file without running comparison.
+    
+    Checks configuration for completeness and correctness, reporting
+    any errors or warnings.
+    
+    Example:
+        drt validate --config ./config.yaml
+    """
+    # Setup logging
+    log_level = "DEBUG" if verbose else "INFO"
+    setup_logging(log_level=log_level, log_to_console=True, log_to_file=False)
+    
+    click.echo("=" * 60)
+    click.echo("Configuration Validation")
+    click.echo("=" * 60)
+    click.echo()
+    
+    try:
+        # Load configuration
+        click.echo(f"Loading: {config}")
+        cfg = load_config(config)
+        click.echo("✓ YAML syntax valid")
+        click.echo("✓ Configuration structure valid")
+        click.echo()
+        
+        # Validate configuration
+        click.echo("Validating configuration...")
+        is_valid, errors = validate_config(cfg)
+        
+        if errors:
+            click.echo()
+            click.echo("Validation Errors:")
+            for error in errors:
+                click.echo(f"  ✗ {error}", err=True)
+            click.echo()
+        
+        # Display configuration summary
+        click.echo("=" * 60)
+        click.echo("Configuration Summary")
+        click.echo("=" * 60)
+        click.echo(f"  Database pairs:    {len(cfg.database_pairs)}")
+        click.echo(f"  Tables configured: {len(cfg.tables)}")
+        click.echo(f"  Enabled tables:    {sum(1 for t in cfg.tables if t.enabled)}")
+        click.echo(f"  Disabled tables:   {sum(1 for t in cfg.tables if not t.enabled)}")
+        click.echo()
+        
+        # Check for tables not expected in target
+        not_expected = sum(1 for t in cfg.tables if not t.expected_in_target)
+        if not_expected > 0:
+            click.echo(f"  ⚠ {not_expected} table(s) marked as expected_in_target: false")
+        
+        # Display database pairs
+        click.echo()
+        click.echo("Database Pairs:")
+        for pair in cfg.database_pairs:
+            status = "✓" if pair.enabled else "○"
+            click.echo(f"  {status} {pair.name}")
+            click.echo(f"      Baseline: {pair.baseline.server}.{pair.baseline.database}")
+            click.echo(f"      Target:   {pair.target.server}.{pair.target.database}")
+        
+        click.echo()
+        click.echo("=" * 60)
+        
+        if is_valid:
+            click.echo("Configuration is VALID ✓")
+            click.echo("=" * 60)
+            sys.exit(0)
+        else:
+            click.echo("Configuration is INVALID ✗")
+            click.echo("=" * 60)
+            sys.exit(1)
+        
+    except Exception as e:
+        logger.error(f"Validation failed: {e}", exc_info=verbose)
+        click.echo(f"✗ Error: {e}", err=True)
+        sys.exit(2)
--- a/src/drt/cli/main.py
+++ b/src/drt/cli/main.py
@@ -0,0 +1,52 @@
+"""Main CLI entry point."""
+
+import click
+import sys
+from drt import __version__
+from drt.cli.commands import discover, compare, validate, investigate
+from drt.utils.logging import setup_logging
+
+
+@click.group()
+@click.version_option(version=__version__, prog_name="drt")
+@click.option('--verbose', '-v', is_flag=True, help='Enable verbose output')
+@click.pass_context
+def cli(ctx, verbose):
+    """
+    Data Regression Testing Framework
+    
+    A comprehensive framework for validating data integrity during code migration
+    and system updates by comparing data outputs between Baseline (Production)
+    and Target (Test) SQL Server databases.
+    """
+    ctx.ensure_object(dict)
+    ctx.obj['verbose'] = verbose
+    
+    # Setup logging
+    log_level = "DEBUG" if verbose else "INFO"
+    setup_logging(log_level=log_level, log_to_console=True, log_to_file=False)
+
+
+@cli.command()
+def version():
+    """Display version information."""
+    import platform
+    
+    click.echo("=" * 60)
+    click.echo("Data Regression Testing Framework")
+    click.echo("=" * 60)
+    click.echo(f"Version:  {__version__}")
+    click.echo(f"Python:   {platform.python_version()}")
+    click.echo(f"Platform: {platform.platform()}")
+    click.echo("=" * 60)
+
+
+# Register commands
+cli.add_command(discover.discover)
+cli.add_command(compare.compare)
+cli.add_command(validate.validate)
+cli.add_command(investigate.investigate)
+
+
+if __name__ == '__main__':
+    cli()
--- a/src/drt/config/init.py
+++ b/src/drt/config/init.py
@@ -0,0 +1,7 @@
+"""Configuration management for the framework."""
+
+from drt.config.loader import load_config
+from drt.config.validator import validate_config
+from drt.config.models import Config
+
+__all__ = ["load_config", "validate_config", "Config"]
--- a/src/drt/config/loader.py
+++ b/src/drt/config/loader.py
@@ -0,0 +1,84 @@
+"""Configuration file loader."""
+
+import yaml
+from pathlib import Path
+from typing import Union
+from drt.config.models import Config
+from drt.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+def load_config(config_path: Union[str, Path]) -> Config:
+    """
+    Load configuration from YAML file.
+
+    Args:
+        config_path: Path to configuration file
+
+    Returns:
+        Parsed configuration object
+
+    Raises:
+        FileNotFoundError: If config file doesn't exist
+        yaml.YAMLError: If YAML is invalid
+        ValueError: If configuration is invalid
+    """
+    config_path = Path(config_path)
+
+    if not config_path.exists():
+        raise FileNotFoundError(f"Configuration file not found: {config_path}")
+
+    logger.info(f"Loading configuration from: {config_path}")
+
+    try:
+        with open(config_path, "r", encoding="utf-8") as f:
+            config_data = yaml.safe_load(f)
+
+        if not config_data:
+            raise ValueError("Configuration file is empty")
+
+        # Parse with Pydantic
+        config = Config(**config_data)
+
+        logger.info(f"Configuration loaded successfully")
+        logger.info(f"  Database pairs: {len(config.database_pairs)}")
+        logger.info(f"  Tables configured: {len(config.tables)}")
+
+        return config
+
+    except yaml.YAMLError as e:
+        logger.error(f"YAML parsing error: {e}")
+        raise
+    except Exception as e:
+        logger.error(f"Configuration loading error: {e}")
+        raise
+
+
+def save_config(config: Config, output_path: Union[str, Path]) -> None:
+    """
+    Save configuration to YAML file.
+
+    Args:
+        config: Configuration object to save
+        output_path: Path where to save the configuration
+    """
+    output_path = Path(output_path)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+
+    logger.info(f"Saving configuration to: {output_path}")
+
+    # Convert to dict and save as YAML
+    config_dict = config.model_dump(exclude_none=True)
+
+    with open(output_path, "w", encoding="utf-8") as f:
+        yaml.dump(
+            config_dict,
+            f,
+            default_flow_style=False,
+            sort_keys=False,
+            allow_unicode=True,
+            width=100,
+        )
+
+    logger.info(f"Configuration saved successfully")
--- a/src/drt/config/models.py
+++ b/src/drt/config/models.py
@@ -0,0 +1,199 @@
+"""Pydantic models for configuration."""
+
+from typing import List, Optional, Dict, Any
+from pydantic import BaseModel, Field, field_validator
+
+
+class ConnectionConfig(BaseModel):
+    """Database connection configuration."""
+    server: str
+    database: str
+    username: Optional[str] = None
+    password: Optional[str] = None
+    timeout: Dict[str, int] = Field(default_factory=lambda: {"connection": 30, "query": 300})
+
+
+class DatabasePairConfig(BaseModel):
+    """Configuration for a database pair to compare."""
+    name: str
+    enabled: bool = True
+    baseline: ConnectionConfig
+    target: ConnectionConfig
+
+
+class RowCountConfig(BaseModel):
+    """Row count comparison configuration."""
+    enabled: bool = True
+    tolerance_percent: float = 0.0
+
+
+class SchemaConfig(BaseModel):
+    """Schema comparison configuration."""
+    enabled: bool = True
+    checks: Dict[str, bool] = Field(default_factory=lambda: {
+        "column_names": True,
+        "data_types": True,
+        "nullability": False,
+        "column_order": False
+    })
+    severity: Dict[str, str] = Field(default_factory=lambda: {
+        "missing_column_in_target": "FAIL",
+        "extra_column_in_target": "WARNING",
+        "data_type_mismatch": "WARNING"
+    })
+
+
+class AggregatesConfig(BaseModel):
+    """Aggregate comparison configuration."""
+    enabled: bool = True
+    tolerance_percent: float = 0.01
+    large_table_threshold: int = 10000000
+    sample_size: int = 100000
+
+
+class TableExistenceConfig(BaseModel):
+    """Table existence check configuration."""
+    missing_table_default: str = "FAIL"
+    extra_table_action: str = "INFO"
+
+
+class ComparisonConfig(BaseModel):
+    """Comparison settings."""
+    mode: str = "health_check"
+    row_count: RowCountConfig = Field(default_factory=RowCountConfig)
+    schema_config: SchemaConfig = Field(default_factory=SchemaConfig, alias="schema")
+    aggregates: AggregatesConfig = Field(default_factory=AggregatesConfig)
+    table_existence: TableExistenceConfig = Field(default_factory=TableExistenceConfig)
+    
+    @property
+    def schema(self) -> SchemaConfig:
+        """Return schema config for backward compatibility."""
+        return self.schema_config
+    
+    class Config:
+        populate_by_name = True
+
+
+class ExecutionConfig(BaseModel):
+    """Execution settings."""
+    continue_on_error: bool = True
+    retry: Dict[str, int] = Field(default_factory=lambda: {"attempts": 3, "delay_seconds": 5})
+
+
+class TableFilterConfig(BaseModel):
+    """Table filtering configuration."""
+    mode: str = "all"
+    include_list: List[Dict[str, str]] = Field(default_factory=list)
+    exclude_patterns: List[str] = Field(default_factory=lambda: [
+        "*_TEMP", "*_TMP", "*_BAK", "*_BACKUP", "*_OLD", "tmp*", "temp*", "#*"
+    ])
+    exclude_schemas: List[str] = Field(default_factory=lambda: [
+        "sys", "INFORMATION_SCHEMA", "guest"
+    ])
+
+
+class TableConfig(BaseModel):
+    """Individual table configuration."""
+    schema_name: str = Field(..., alias="schema")
+    name: str
+    enabled: bool = True
+    expected_in_target: bool = True
+    estimated_row_count: int = 0
+    primary_key_columns: List[str] = Field(default_factory=list)
+    aggregate_columns: List[str] = Field(default_factory=list)
+    notes: str = ""
+    
+    @property
+    def schema(self) -> str:
+        """Return schema name for backward compatibility."""
+        return self.schema_name
+    
+    class Config:
+        populate_by_name = True
+
+
+class ReportingConfig(BaseModel):
+    """Reporting configuration."""
+    output_directory: str = "./reports"
+    investigation_directory: str = "./investigation_reports"
+    formats: List[str] = Field(default_factory=lambda: ["html", "csv"])
+    filename_template: str = "regression_report_{timestamp}"
+    html: Dict[str, Any] = Field(default_factory=lambda: {
+        "embed_styles": True,
+        "include_charts": True,
+        "colors": {
+            "pass": "#28a745",
+            "fail": "#dc3545",
+            "warning": "#ffc107",
+            "error": "#6f42c1",
+            "info": "#17a2b8",
+            "skip": "#6c757d"
+        }
+    })
+    csv: Dict[str, Any] = Field(default_factory=lambda: {
+        "delimiter": ",",
+        "include_header": True,
+        "encoding": "utf-8-sig"
+    })
+    pdf: Dict[str, str] = Field(default_factory=lambda: {
+        "page_size": "A4",
+        "orientation": "landscape"
+    })
+
+
+class LoggingConfig(BaseModel):
+    """Logging configuration."""
+    level: str = "INFO"
+    directory: str = "./logs"
+    filename_template: str = "drt_{timestamp}.log"
+    console: bool = True
+    format: str = "%(asctime)s | %(levelname)-8s | %(name)-20s | %(message)s"
+    date_format: str = "%Y%m%d_%H%M%S"
+
+
+class DiscoveryConfig(BaseModel):
+    """Discovery settings."""
+    output_file: str = "./config_discovered.yaml"
+    analysis_directory: str = "./analysis"
+    include_schemas: List[str] = Field(default_factory=list)
+    exclude_schemas: List[str] = Field(default_factory=lambda: [
+        "sys", "INFORMATION_SCHEMA", "guest"
+    ])
+    exclude_patterns: List[str] = Field(default_factory=lambda: [
+        "*_TEMP", "*_TMP", "*_BAK", "#*"
+    ])
+    include_row_counts: bool = True
+    include_column_details: bool = True
+    detect_numeric_columns: bool = True
+    detect_primary_keys: bool = True
+    default_expected_in_target: bool = True
+
+
+class MetadataConfig(BaseModel):
+    """Configuration metadata."""
+    config_version: str = "1.0"
+    generated_date: Optional[str] = None
+    generated_by: Optional[str] = None
+    framework_version: str = "1.0.0"
+
+
+class Config(BaseModel):
+    """Main configuration model."""
+    metadata: MetadataConfig = Field(default_factory=MetadataConfig)
+    connections: Dict[str, ConnectionConfig] = Field(default_factory=dict)
+    database_pairs: List[DatabasePairConfig] = Field(default_factory=list)
+    comparison: ComparisonConfig = Field(default_factory=ComparisonConfig)
+    execution: ExecutionConfig = Field(default_factory=ExecutionConfig)
+    table_filters: TableFilterConfig = Field(default_factory=TableFilterConfig)
+    tables: List[TableConfig] = Field(default_factory=list)
+    reporting: ReportingConfig = Field(default_factory=ReportingConfig)
+    logging: LoggingConfig = Field(default_factory=LoggingConfig)
+    discovery: DiscoveryConfig = Field(default_factory=DiscoveryConfig)
+
+    @field_validator('database_pairs')
+    @classmethod
+    def validate_database_pairs(cls, v):
+        """Ensure at least one database pair is configured."""
+        if not v:
+            raise ValueError("At least one database pair must be configured")
+        return v
--- a/src/drt/config/validator.py
+++ b/src/drt/config/validator.py
@@ -0,0 +1,79 @@
+"""Configuration validator."""
+
+from typing import List, Tuple
+from drt.config.models import Config
+from drt.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+def validate_config(config: Config) -> Tuple[bool, List[str]]:
+    """
+    Validate configuration for completeness and correctness.
+
+    Args:
+        config: Configuration to validate
+
+    Returns:
+        Tuple of (is_valid, list_of_errors)
+    """
+    errors = []
+    warnings = []
+
+    # Check database pairs
+    if not config.database_pairs:
+        errors.append("No database pairs configured")
+    
+    for pair in config.database_pairs:
+        if not pair.baseline.server or not pair.baseline.database:
+            errors.append(f"Database pair '{pair.name}': Baseline connection incomplete")
+        if not pair.target.server or not pair.target.database:
+            errors.append(f"Database pair '{pair.name}': Target connection incomplete")
+
+    # Check comparison mode
+    valid_modes = ["health_check", "detailed"]
+    if config.comparison.mode not in valid_modes:
+        errors.append(f"Invalid comparison mode: {config.comparison.mode}. Must be one of {valid_modes}")
+
+    # Check table configuration
+    if config.table_filters.mode == "include_list" and not config.table_filters.include_list:
+        warnings.append("Table filter mode is 'include_list' but include_list is empty")
+
+    # Check for tables marked as not expected in target
+    not_expected_count = sum(1 for t in config.tables if not t.expected_in_target)
+    if not_expected_count > 0:
+        warnings.append(f"{not_expected_count} table(s) marked as expected_in_target: false")
+
+    # Check for disabled tables
+    disabled_count = sum(1 for t in config.tables if not t.enabled)
+    if disabled_count > 0:
+        warnings.append(f"{disabled_count} table(s) disabled (enabled: false)")
+
+    # Check reporting formats
+    valid_formats = ["html", "csv", "pdf"]
+    for fmt in config.reporting.formats:
+        if fmt not in valid_formats:
+            errors.append(f"Invalid report format: {fmt}. Must be one of {valid_formats}")
+
+    # Check logging level
+    valid_levels = ["DEBUG", "INFO", "WARNING", "ERROR"]
+    if config.logging.level.upper() not in valid_levels:
+        errors.append(f"Invalid logging level: {config.logging.level}. Must be one of {valid_levels}")
+
+    # Log results
+    if errors:
+        logger.error(f"Configuration validation failed with {len(errors)} error(s)")
+        for error in errors:
+            logger.error(f"  ❌ {error}")
+    
+    if warnings:
+        logger.warning(f"Configuration has {len(warnings)} warning(s)")
+        for warning in warnings:
+            logger.warning(f"  ⚠️  {warning}")
+
+    if not errors and not warnings:
+        logger.info("✓ Configuration is valid")
+    elif not errors:
+        logger.info("✓ Configuration is valid (with warnings)")
+
+    return len(errors) == 0, errors
--- a/src/drt/database/init.py
+++ b/src/drt/database/init.py
@@ -0,0 +1,7 @@
+"""Database access layer."""
+
+from drt.database.connection import ConnectionManager
+from drt.database.executor import QueryExecutor
+from drt.database.queries import SQLQueries
+
+__all__ = ["ConnectionManager", "QueryExecutor", "SQLQueries"]
--- a/src/drt/database/connection.py
+++ b/src/drt/database/connection.py
@@ -0,0 +1,176 @@
+"""Database connection management."""
+
+import pyodbc
+import platform
+from typing import Optional
+from contextlib import contextmanager
+from drt.config.models import ConnectionConfig
+from drt.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+def get_odbc_driver() -> str:
+    """
+    Detect available ODBC driver for SQL Server.
+    
+    Returns:
+        ODBC driver name
+    """
+    # Get list of available drivers
+    drivers = [driver for driver in pyodbc.drivers() if 'SQL Server' in driver]
+    
+    # Prefer newer drivers
+    preferred_order = [
+        'ODBC Driver 18 for SQL Server',
+        'ODBC Driver 17 for SQL Server',
+        'ODBC Driver 13 for SQL Server',
+        'SQL Server Native Client 11.0',
+        'SQL Server'
+    ]
+    
+    for preferred in preferred_order:
+        if preferred in drivers:
+            logger.debug(f"Using ODBC driver: {preferred}")
+            return preferred
+    
+    # Fallback to first available
+    if drivers:
+        logger.warning(f"Using fallback driver: {drivers[0]}")
+        return drivers[0]
+    
+    # Default fallback
+    logger.warning("No SQL Server ODBC driver found, using default")
+    return 'ODBC Driver 17 for SQL Server'
+
+
+class ConnectionManager:
+    """Manages database connections using Windows Authentication."""
+
+    def __init__(self, config: ConnectionConfig):
+        """
+        Initialize connection manager.
+
+        Args:
+            config: Connection configuration
+        """
+        self.config = config
+        self._connection: Optional[pyodbc.Connection] = None
+
+    def connect(self) -> pyodbc.Connection:
+        """
+        Establish database connection using Windows or SQL Authentication.
+
+        Returns:
+            Database connection
+
+        Raises:
+            pyodbc.Error: If connection fails
+        """
+        if self._connection and not self._connection.closed:
+            return self._connection
+
+        try:
+            # Detect available ODBC driver
+            driver = get_odbc_driver()
+            
+            # Build connection string
+            conn_str_parts = [
+                f"DRIVER={{{driver}}}",
+                f"SERVER={self.config.server}",
+                f"DATABASE={self.config.database}",
+                f"Connection Timeout={self.config.timeout.get('connection', 30)}"
+            ]
+            
+            # Check if username/password are provided for SQL Authentication
+            if hasattr(self.config, 'username') and self.config.username:
+                conn_str_parts.append(f"UID={self.config.username}")
+                conn_str_parts.append(f"PWD={self.config.password}")
+                auth_type = "SQL Authentication"
+            else:
+                # Use Windows Authentication
+                conn_str_parts.append("Trusted_Connection=yes")
+                auth_type = "Windows Authentication"
+            
+            # Add TrustServerCertificate on Linux for self-signed certs
+            if platform.system() != 'Windows':
+                conn_str_parts.append("TrustServerCertificate=yes")
+            
+            conn_str = ";".join(conn_str_parts) + ";"
+
+            logger.info(f"Connecting to {self.config.server}.{self.config.database}")
+            logger.debug(f"Connection string: {conn_str.replace(self.config.server, 'SERVER').replace(self.config.password if hasattr(self.config, 'password') and self.config.password else '', '***')}")
+            self._connection = pyodbc.connect(conn_str)
+            
+            # Set query timeout
+            query_timeout = self.config.timeout.get('query', 300)
+            self._connection.timeout = query_timeout
+
+            logger.info(f"✓ Connected ({auth_type})")
+            return self._connection
+
+        except pyodbc.Error as e:
+            logger.error(f"Connection failed: {e}")
+            raise
+
+    def disconnect(self) -> None:
+        """Close database connection."""
+        if self._connection and not self._connection.closed:
+            self._connection.close()
+            logger.info("Connection closed")
+            self._connection = None
+
+    @contextmanager
+    def get_connection(self):
+        """
+        Context manager for database connections.
+
+        Yields:
+            Database connection
+
+        Example:
+            with conn_mgr.get_connection() as conn:
+                cursor = conn.cursor()
+                cursor.execute("SELECT 1")
+        """
+        conn = self.connect()
+        try:
+            yield conn
+        finally:
+            # Don't close connection here - reuse it
+            pass
+
+    def test_connection(self) -> bool:
+        """
+        Test database connectivity.
+
+        Returns:
+            True if connection successful, False otherwise
+        """
+        try:
+            with self.get_connection() as conn:
+                cursor = conn.cursor()
+                cursor.execute("SELECT 1")
+                cursor.fetchone()
+                return True
+        except Exception as e:
+            logger.error(f"Connection test failed: {e}")
+            return False
+
+    @property
+    def is_connected(self) -> bool:
+        """Check if connection is active."""
+        return self._connection is not None and not self._connection.closed
+
+    def __enter__(self):
+        """Context manager entry."""
+        self.connect()
+        return self
+
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        """Context manager exit."""
+        self.disconnect()
+
+    def __del__(self):
+        """Cleanup on deletion."""
+        self.disconnect()
--- a/src/drt/database/executor.py
+++ b/src/drt/database/executor.py
@@ -0,0 +1,267 @@
+"""Query executor for READ ONLY database operations."""
+
+import pandas as pd
+import time
+from typing import Any, Dict, List, Optional, Tuple
+from drt.database.connection import ConnectionManager
+from drt.database.queries import SQLQueries
+from drt.models.enums import Status
+from drt.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+class QueryExecutor:
+    """Executes READ ONLY queries against the database."""
+
+    def __init__(self, connection_manager: ConnectionManager):
+        """
+        Initialize query executor.
+
+        Args:
+            connection_manager: Connection manager instance
+        """
+        self.conn_mgr = connection_manager
+
+    def execute_query(self, query: str, params: tuple = None) -> pd.DataFrame:
+        """
+        Execute a SELECT query and return results as DataFrame.
+
+        Args:
+            query: SQL query string (SELECT only)
+            params: Query parameters
+
+        Returns:
+            Query results as pandas DataFrame
+
+        Raises:
+            ValueError: If query is not a SELECT statement
+            Exception: If query execution fails
+        """
+        # Safety check - only allow SELECT queries
+        query_upper = query.strip().upper()
+        if not query_upper.startswith('SELECT'):
+            raise ValueError("Only SELECT queries are allowed (READ ONLY)")
+
+        try:
+            with self.conn_mgr.get_connection() as conn:
+                if params:
+                    df = pd.read_sql(query, conn, params=params)
+                else:
+                    df = pd.read_sql(query, conn)
+                return df
+
+        except Exception as e:
+            logger.error(f"Query execution failed: {e}")
+            logger.debug(f"Query: {query}")
+            raise
+
+    def execute_scalar(self, query: str, params: tuple = None) -> Any:
+        """
+        Execute query and return single scalar value.
+
+        Args:
+            query: SQL query string
+            params: Query parameters
+
+        Returns:
+            Single scalar value
+        """
+        df = self.execute_query(query, params)
+        if df.empty:
+            return None
+        return df.iloc[0, 0]
+
+    def get_row_count(self, schema: str, table: str) -> int:
+        """
+        Get row count for a table.
+
+        Args:
+            schema: Schema name
+            table: Table name
+
+        Returns:
+            Row count
+        """
+        query = SQLQueries.build_row_count_query(schema, table)
+        count = self.execute_scalar(query)
+        return int(count) if count is not None else 0
+
+    def table_exists(self, schema: str, table: str) -> bool:
+        """
+        Check if table exists.
+
+        Args:
+            schema: Schema name
+            table: Table name
+
+        Returns:
+            True if table exists, False otherwise
+        """
+        count = self.execute_scalar(SQLQueries.CHECK_TABLE_EXISTS, (schema, table))
+        return int(count) > 0 if count is not None else False
+
+    def get_all_tables(self) -> List[Dict[str, Any]]:
+        """
+        Get list of all user tables in the database.
+
+        Returns:
+            List of table information dictionaries
+        """
+        df = self.execute_query(SQLQueries.GET_ALL_TABLES)
+        return df.to_dict('records')
+
+    def get_columns(self, schema: str, table: str) -> List[Dict[str, Any]]:
+        """
+        Get column information for a table.
+
+        Args:
+            schema: Schema name
+            table: Table name
+
+        Returns:
+            List of column information dictionaries
+        """
+        df = self.execute_query(SQLQueries.GET_COLUMNS, (schema, table))
+        return df.to_dict('records')
+
+    def get_primary_keys(self, schema: str, table: str) -> List[str]:
+        """
+        Get primary key columns for a table.
+
+        Args:
+            schema: Schema name
+            table: Table name
+
+        Returns:
+            List of primary key column names
+        """
+        # Diagnostic: Check what columns are available in CONSTRAINT_COLUMN_USAGE
+        try:
+            logger.debug("Checking CONSTRAINT_COLUMN_USAGE schema...")
+            constraint_cols_df = self.execute_query(SQLQueries.GET_CONSTRAINT_COLUMNS_SCHEMA)
+            logger.debug(f"CONSTRAINT_COLUMN_USAGE columns: {constraint_cols_df['COLUMN_NAME'].tolist()}")
+        except Exception as e:
+            logger.debug(f"Could not query CONSTRAINT_COLUMN_USAGE schema: {e}")
+        
+        # Diagnostic: Check what columns are available in KEY_COLUMN_USAGE
+        try:
+            logger.debug("Checking KEY_COLUMN_USAGE schema...")
+            key_cols_df = self.execute_query(SQLQueries.GET_KEY_COLUMNS_SCHEMA)
+            logger.debug(f"KEY_COLUMN_USAGE columns: {key_cols_df['COLUMN_NAME'].tolist()}")
+        except Exception as e:
+            logger.debug(f"Could not query KEY_COLUMN_USAGE schema: {e}")
+        
+        df = self.execute_query(SQLQueries.GET_PRIMARY_KEYS, (schema, table))
+        return df['COLUMN_NAME'].tolist() if not df.empty else []
+
+    def get_aggregate_sums(self, schema: str, table: str, columns: List[str]) -> Dict[str, float]:
+        """
+        Get aggregate sums for numeric columns.
+
+        Args:
+            schema: Schema name
+            table: Table name
+            columns: List of column names to aggregate
+
+        Returns:
+            Dictionary mapping column names to their sums
+        """
+        if not columns:
+            return {}
+
+        query = SQLQueries.build_aggregate_query(schema, table, columns)
+        if not query:
+            return {}
+
+        df = self.execute_query(query)
+        if df.empty:
+            return {col: 0.0 for col in columns}
+
+        # Extract results
+        results = {}
+        for col in columns:
+            sum_col = f"{col}_sum"
+            if sum_col in df.columns:
+                value = df.iloc[0][sum_col]
+                results[col] = float(value) if pd.notna(value) else 0.0
+            else:
+                results[col] = 0.0
+
+        return results
+
+    def execute_investigation_query(
+        self,
+        query: str,
+        timeout: Optional[int] = None
+    ) -> Tuple[Status, Optional[pd.DataFrame], Optional[str], int]:
+        """
+        Execute investigation query with comprehensive error handling.
+        
+        This method is specifically for investigation queries and does NOT
+        enforce the SELECT-only restriction. It handles errors gracefully
+        and returns detailed status information.
+        
+        Args:
+            query: SQL query to execute
+            timeout: Query timeout in seconds (optional)
+            
+        Returns:
+            Tuple of (status, result_df, error_message, execution_time_ms)
+        """
+        start_time = time.time()
+        
+        try:
+            # Execute query
+            with self.conn_mgr.get_connection() as conn:
+                if timeout:
+                    # Set query timeout if supported
+                    try:
+                        cursor = conn.cursor()
+                        cursor.execute(f"SET QUERY_TIMEOUT {timeout}")
+                    except Exception:
+                        # Timeout setting not supported, continue anyway
+                        pass
+                
+                df = pd.read_sql(query, conn)
+            
+            execution_time = int((time.time() - start_time) * 1000)
+            
+            return (Status.PASS, df, None, execution_time)
+            
+        except Exception as e:
+            execution_time = int((time.time() - start_time) * 1000)
+            error_msg = str(e)
+            error_type = type(e).__name__
+            
+            # Categorize error
+            if any(phrase in error_msg.lower() for phrase in [
+                'does not exist',
+                'invalid object name',
+                'could not find',
+                'not found'
+            ]):
+                status = Status.SKIP
+                message = f"Object not found: {error_msg}"
+                
+            elif 'timeout' in error_msg.lower():
+                status = Status.FAIL
+                message = f"Query timeout: {error_msg}"
+                
+            elif any(phrase in error_msg.lower() for phrase in [
+                'syntax error',
+                'incorrect syntax'
+            ]):
+                status = Status.FAIL
+                message = f"Syntax error: {error_msg}"
+                
+            elif 'permission' in error_msg.lower():
+                status = Status.FAIL
+                message = f"Permission denied: {error_msg}"
+                
+            else:
+                status = Status.FAIL
+                message = f"{error_type}: {error_msg}"
+            
+            logger.debug(f"Query execution failed: {message}")
+            return (status, None, message, execution_time)
--- a/src/drt/database/queries.py
+++ b/src/drt/database/queries.py
@@ -0,0 +1,128 @@
+"""SQL query templates for database operations."""
+
+
+class SQLQueries:
+    """Collection of SQL query templates (READ ONLY)."""
+
+    # Table discovery queries
+    GET_ALL_TABLES = """
+        SELECT 
+            s.name AS schema_name,
+            t.name AS table_name,
+            SUM(p.rows) AS estimated_rows
+        FROM sys.tables t WITH (NOLOCK)
+        INNER JOIN sys.schemas s WITH (NOLOCK) ON t.schema_id = s.schema_id
+        INNER JOIN sys.partitions p WITH (NOLOCK) ON t.object_id = p.object_id
+        WHERE t.type = 'U' 
+          AND p.index_id IN (0, 1)
+        GROUP BY s.name, t.name
+        ORDER BY s.name, t.name
+    """
+
+    GET_COLUMNS = """
+        SELECT 
+            COLUMN_NAME,
+            DATA_TYPE,
+            CHARACTER_MAXIMUM_LENGTH,
+            NUMERIC_PRECISION,
+            NUMERIC_SCALE,
+            IS_NULLABLE,
+            ORDINAL_POSITION
+        FROM INFORMATION_SCHEMA.COLUMNS WITH (NOLOCK)
+        WHERE TABLE_SCHEMA = ?
+          AND TABLE_NAME = ?
+        ORDER BY ORDINAL_POSITION
+    """
+
+    # Diagnostic query to check available columns in CONSTRAINT_COLUMN_USAGE
+    GET_CONSTRAINT_COLUMNS_SCHEMA = """
+        SELECT COLUMN_NAME
+        FROM INFORMATION_SCHEMA.COLUMNS WITH (NOLOCK)
+        WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'
+          AND TABLE_NAME = 'CONSTRAINT_COLUMN_USAGE'
+        ORDER BY ORDINAL_POSITION
+    """
+    
+    # Diagnostic query to check available columns in KEY_COLUMN_USAGE
+    GET_KEY_COLUMNS_SCHEMA = """
+        SELECT COLUMN_NAME
+        FROM INFORMATION_SCHEMA.COLUMNS WITH (NOLOCK)
+        WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'
+          AND TABLE_NAME = 'KEY_COLUMN_USAGE'
+        ORDER BY ORDINAL_POSITION
+    """
+    
+    GET_PRIMARY_KEYS = """
+        SELECT
+            c.COLUMN_NAME
+        FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc WITH (NOLOCK)
+        INNER JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE c WITH (NOLOCK)
+            ON tc.CONSTRAINT_NAME = c.CONSTRAINT_NAME
+        WHERE tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
+          AND tc.TABLE_SCHEMA = ?
+          AND tc.TABLE_NAME = ?
+    """
+
+    # Comparison queries
+    GET_ROW_COUNT = """
+        SELECT COUNT(*) AS row_count 
+        FROM [{schema}].[{table}] WITH (NOLOCK)
+    """
+
+    CHECK_TABLE_EXISTS = """
+        SELECT COUNT(*) AS table_exists
+        FROM INFORMATION_SCHEMA.TABLES WITH (NOLOCK)
+        WHERE TABLE_SCHEMA = ?
+          AND TABLE_NAME = ?
+    """
+
+    GET_AGGREGATE_SUMS = """
+        SELECT {column_expressions}
+        FROM [{schema}].[{table}] WITH (NOLOCK)
+    """
+
+    @staticmethod
+    def build_row_count_query(schema: str, table: str) -> str:
+        """Build row count query for a specific table."""
+        return SQLQueries.GET_ROW_COUNT.format(schema=schema, table=table)
+
+    @staticmethod
+    def build_aggregate_query(schema: str, table: str, columns: list[str]) -> str:
+        """
+        Build aggregate query for numeric columns.
+        
+        Args:
+            schema: Schema name
+            table: Table name
+            columns: List of column names to aggregate
+            
+        Returns:
+            SQL query string
+        """
+        if not columns:
+            return None
+        
+        # Build column expressions
+        column_expressions = []
+        for col in columns:
+            # Cast to FLOAT to handle different numeric types
+            expr = f"SUM(CAST([{col}] AS FLOAT)) AS [{col}_sum]"
+            column_expressions.append(expr)
+        
+        column_expr_str = ",\n            ".join(column_expressions)
+        
+        return SQLQueries.GET_AGGREGATE_SUMS.format(
+            schema=schema,
+            table=table,
+            column_expressions=column_expr_str
+        )
+
+    @staticmethod
+    def is_numeric_type(data_type: str) -> bool:
+        """Check if a data type is numeric."""
+        numeric_types = {
+            'int', 'bigint', 'smallint', 'tinyint',
+            'decimal', 'numeric', 'float', 'real',
+            'money', 'smallmoney'
+        }
+        return data_type.lower() in numeric_types
--- a/src/drt/models/init.py
+++ b/src/drt/models/init.py
@@ -0,0 +1,16 @@
+"""Data models for the regression testing framework."""
+
+from drt.models.enums import Status, CheckType
+from drt.models.table import TableInfo, ColumnInfo
+from drt.models.results import ComparisonResult, CheckResult
+from drt.models.summary import ExecutionSummary
+
+__all__ = [
+    "Status",
+    "CheckType",
+    "TableInfo",
+    "ColumnInfo",
+    "ComparisonResult",
+    "CheckResult",
+    "ExecutionSummary",
+]
--- a/src/drt/models/enums.py
+++ b/src/drt/models/enums.py
@@ -0,0 +1,49 @@
+"""Enumerations for status and check types."""
+
+from enum import Enum
+
+
+class Status(str, Enum):
+    """Result status enumeration."""
+
+    PASS = "PASS"
+    FAIL = "FAIL"
+    WARNING = "WARNING"
+    ERROR = "ERROR"
+    INFO = "INFO"
+    SKIP = "SKIP"
+
+    def __str__(self) -> str:
+        return self.value
+
+    @property
+    def severity(self) -> int:
+        """Return severity level for comparison (higher = more severe)."""
+        severity_map = {
+            Status.ERROR: 6,
+            Status.FAIL: 5,
+            Status.WARNING: 4,
+            Status.INFO: 3,
+            Status.PASS: 2,
+            Status.SKIP: 1,
+        }
+        return severity_map[self]
+
+    @classmethod
+    def most_severe(cls, statuses: list["Status"]) -> "Status":
+        """Return the most severe status from a list."""
+        if not statuses:
+            return cls.SKIP
+        return max(statuses, key=lambda s: s.severity)
+
+
+class CheckType(str, Enum):
+    """Type of comparison check."""
+
+    EXISTENCE = "TABLE_EXISTENCE"
+    ROW_COUNT = "ROW_COUNT"
+    SCHEMA = "SCHEMA"
+    AGGREGATE = "AGGREGATE"
+
+    def __str__(self) -> str:
+        return self.value
--- a/src/drt/models/investigation.py
+++ b/src/drt/models/investigation.py
@@ -0,0 +1,70 @@
+"""Data models for investigation feature."""
+
+from dataclasses import dataclass, field
+from typing import List, Optional
+import pandas as pd
+from drt.models.enums import Status
+
+
+@dataclass
+class QueryExecutionResult:
+    """Result of executing a single query."""
+    query_number: int
+    query_text: str
+    status: Status
+    execution_time_ms: int
+    result_data: Optional[pd.DataFrame] = None
+    error_message: Optional[str] = None
+    row_count: int = 0
+
+
+@dataclass
+class TableInvestigationResult:
+    """Results for all queries in a table's investigation."""
+    schema: str
+    table: str
+    sql_file_path: str
+    baseline_results: List[QueryExecutionResult]
+    target_results: List[QueryExecutionResult]
+    overall_status: Status
+    timestamp: str
+    
+    @property
+    def full_name(self) -> str:
+        """Get full table name."""
+        return f"{self.schema}.{self.table}"
+    
+    @property
+    def total_queries(self) -> int:
+        """Get total number of queries."""
+        return len(self.baseline_results)
+    
+    @property
+    def successful_queries(self) -> int:
+        """Get number of successful queries."""
+        all_results = self.baseline_results + self.target_results
+        return sum(1 for r in all_results if r.status == Status.PASS)
+
+
+@dataclass
+class InvestigationSummary:
+    """Overall investigation execution summary."""
+    start_time: str
+    end_time: str
+    duration_seconds: int
+    analysis_directory: str
+    baseline_info: str
+    target_info: str
+    tables_processed: int
+    tables_successful: int
+    tables_partial: int
+    tables_failed: int
+    total_queries_executed: int
+    results: List[TableInvestigationResult] = field(default_factory=list)
+    
+    @property
+    def success_rate(self) -> float:
+        """Calculate success rate percentage."""
+        if self.tables_processed == 0:
+            return 0.0
+        return (self.tables_successful / self.tables_processed) * 100
--- a/src/drt/models/results.py
+++ b/src/drt/models/results.py
@@ -0,0 +1,49 @@
+"""Result models for comparison operations."""
+
+from typing import Any, Dict, Optional
+from pydantic import BaseModel, Field
+from drt.models.enums import Status, CheckType
+from drt.models.table import TableInfo
+
+
+class CheckResult(BaseModel):
+    """Result of a single check operation."""
+
+    check_type: CheckType
+    status: Status
+    baseline_value: Any = None
+    target_value: Any = None
+    difference: Any = None
+    message: str = ""
+    details: Dict[str, Any] = Field(default_factory=dict)
+
+    class Config:
+        arbitrary_types_allowed = True
+
+
+class ComparisonResult(BaseModel):
+    """Result of comparing a single table."""
+
+    table: TableInfo
+    overall_status: Status
+    check_results: list[CheckResult] = Field(default_factory=list)
+    execution_time_ms: int = 0
+    error_message: str = ""
+    timestamp: str = ""
+
+    def add_check(self, check_result: CheckResult) -> None:
+        """Add a check result and update overall status."""
+        self.check_results.append(check_result)
+        # Update overall status to most severe
+        all_statuses = [cr.status for cr in self.check_results]
+        self.overall_status = Status.most_severe(all_statuses)
+
+    def get_check(self, check_type: CheckType) -> Optional[CheckResult]:
+        """Get check result by type."""
+        for check in self.check_results:
+            if check.check_type == check_type:
+                return check
+        return None
+
+    class Config:
+        arbitrary_types_allowed = True
--- a/src/drt/models/summary.py
+++ b/src/drt/models/summary.py
@@ -0,0 +1,65 @@
+"""Execution summary model."""
+
+from typing import List
+from pydantic import BaseModel, Field
+from drt.models.results import ComparisonResult
+from drt.models.enums import Status
+
+
+class ExecutionSummary(BaseModel):
+    """Summary of an entire test execution."""
+
+    start_time: str
+    end_time: str
+    duration_seconds: int
+    total_tables: int = 0
+    passed: int = 0
+    failed: int = 0
+    warnings: int = 0
+    errors: int = 0
+    skipped: int = 0
+    info: int = 0
+    results: List[ComparisonResult] = Field(default_factory=list)
+    config_file: str = ""
+    baseline_info: str = ""
+    target_info: str = ""
+
+    def add_result(self, result: ComparisonResult) -> None:
+        """Add a comparison result and update counters."""
+        self.results.append(result)
+        self.total_tables += 1
+
+        # Update status counters
+        status = result.overall_status
+        if status == Status.PASS:
+            self.passed += 1
+        elif status == Status.FAIL:
+            self.failed += 1
+        elif status == Status.WARNING:
+            self.warnings += 1
+        elif status == Status.ERROR:
+            self.errors += 1
+        elif status == Status.INFO:
+            self.info += 1
+        elif status == Status.SKIP:
+            self.skipped += 1
+
+    @property
+    def has_failures(self) -> bool:
+        """Check if there are any failures."""
+        return self.failed > 0
+
+    @property
+    def has_errors(self) -> bool:
+        """Check if there are any errors."""
+        return self.errors > 0
+
+    @property
+    def success_rate(self) -> float:
+        """Calculate success rate percentage."""
+        if self.total_tables == 0:
+            return 0.0
+        return (self.passed / self.total_tables) * 100
+
+    class Config:
+        arbitrary_types_allowed = True
--- a/src/drt/models/table.py
+++ b/src/drt/models/table.py
@@ -0,0 +1,53 @@
+"""Table and column information models."""
+
+from typing import List, Optional
+from pydantic import BaseModel, Field
+
+
+class ColumnInfo(BaseModel):
+    """Information about a database column."""
+
+    name: str
+    data_type: str
+    max_length: Optional[int] = None
+    precision: Optional[int] = None
+    scale: Optional[int] = None
+    is_nullable: bool = True
+    is_numeric: bool = False
+    ordinal_position: int
+
+    class Config:
+        frozen = True
+
+
+class TableInfo(BaseModel):
+    """Information about a database table."""
+
+    schema_name: str = Field(..., alias="schema")
+    name: str
+    estimated_row_count: int = 0
+    columns: List[ColumnInfo] = Field(default_factory=list)
+    primary_key_columns: List[str] = Field(default_factory=list)
+    enabled: bool = True
+    expected_in_target: bool = True
+    aggregate_columns: List[str] = Field(default_factory=list)
+    notes: str = ""
+
+    @property
+    def schema(self) -> str:
+        """Return schema name for backward compatibility."""
+        return self.schema_name
+    
+    @property
+    def full_name(self) -> str:
+        """Return fully qualified table name."""
+        return f"{self.schema_name}.{self.name}"
+
+    @property
+    def numeric_columns(self) -> List[ColumnInfo]:
+        """Return list of numeric columns."""
+        return [col for col in self.columns if col.is_numeric]
+
+    class Config:
+        frozen = False
+        populate_by_name = True  # Allow both 'schema' and 'schema_name'
--- a/src/drt/reporting/init.py
+++ b/src/drt/reporting/init.py
@@ -0,0 +1,7 @@
+"""Reporting module for generating test reports."""
+
+from drt.reporting.generator import ReportGenerator
+from drt.reporting.html import HTMLReportGenerator
+from drt.reporting.csv import CSVReportGenerator
+
+__all__ = ["ReportGenerator", "HTMLReportGenerator", "CSVReportGenerator"]
--- a/src/drt/reporting/csv.py
+++ b/src/drt/reporting/csv.py
@@ -0,0 +1,97 @@
+"""CSV report generator."""
+
+import csv
+from pathlib import Path
+from drt.models.summary import ExecutionSummary
+from drt.models.enums import CheckType
+from drt.config.models import Config
+from drt.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+class CSVReportGenerator:
+    """Generates CSV format reports."""
+
+    def __init__(self, config: Config):
+        """
+        Initialize CSV generator.
+
+        Args:
+            config: Configuration object
+        """
+        self.config = config
+
+    def generate(self, summary: ExecutionSummary, filepath: Path) -> None:
+        """
+        Generate CSV report.
+
+        Args:
+            summary: Execution summary
+            filepath: Output file path
+        """
+        csv_config = self.config.reporting.csv
+        delimiter = csv_config.get("delimiter", ",")
+        encoding = csv_config.get("encoding", "utf-8-sig")
+
+        with open(filepath, "w", newline="", encoding=encoding) as f:
+            writer = csv.writer(f, delimiter=delimiter)
+
+            # Write header
+            writer.writerow([
+                "Timestamp",
+                "Schema",
+                "Table",
+                "Overall_Status",
+                "Existence_Status",
+                "RowCount_Status",
+                "Baseline_Rows",
+                "Target_Rows",
+                "Row_Difference",
+                "Row_Diff_Pct",
+                "Schema_Status",
+                "Schema_Details",
+                "Aggregate_Status",
+                "Aggregate_Details",
+                "Expected_In_Target",
+                "Notes",
+                "Execution_Time_Ms"
+            ])
+
+            # Write data rows
+            for result in summary.results:
+                # Get check results
+                existence = result.get_check(CheckType.EXISTENCE)
+                row_count = result.get_check(CheckType.ROW_COUNT)
+                schema = result.get_check(CheckType.SCHEMA)
+                aggregate = result.get_check(CheckType.AGGREGATE)
+
+                # Extract values
+                baseline_rows = row_count.baseline_value if row_count else "N/A"
+                target_rows = row_count.target_value if row_count else "N/A"
+                row_diff = row_count.difference if row_count else "N/A"
+                row_diff_pct = ""
+                if row_count and row_count.baseline_value and row_count.baseline_value > 0:
+                    row_diff_pct = f"{(row_count.difference / row_count.baseline_value * 100):.2f}%"
+
+                writer.writerow([
+                    result.timestamp,
+                    result.table.schema,
+                    result.table.name,
+                    result.overall_status.value,
+                    existence.status.value if existence else "N/A",
+                    row_count.status.value if row_count else "N/A",
+                    baseline_rows,
+                    target_rows,
+                    row_diff,
+                    row_diff_pct,
+                    schema.status.value if schema else "N/A",
+                    schema.message if schema else "",
+                    aggregate.status.value if aggregate else "N/A",
+                    aggregate.message if aggregate else "",
+                    result.table.expected_in_target,
+                    result.table.notes,
+                    result.execution_time_ms
+                ])
+
+        logger.debug(f"CSV report written to {filepath}")
--- a/src/drt/reporting/generator.py
+++ b/src/drt/reporting/generator.py
@@ -0,0 +1,84 @@
+"""Report generator orchestrator."""
+
+from pathlib import Path
+from typing import List
+from drt.models.summary import ExecutionSummary
+from drt.config.models import Config
+from drt.reporting.html import HTMLReportGenerator
+from drt.reporting.csv import CSVReportGenerator
+from drt.utils.logging import get_logger
+from drt.utils.timestamps import get_timestamp
+
+logger = get_logger(__name__)
+
+
+class ReportGenerator:
+    """Orchestrates report generation in multiple formats."""
+
+    def __init__(self, config: Config):
+        """
+        Initialize report generator.
+
+        Args:
+            config: Configuration object
+        """
+        self.config = config
+        # Use absolute path from config
+        self.output_dir = Path(config.reporting.output_directory).expanduser().resolve()
+        self.output_dir.mkdir(parents=True, exist_ok=True)
+
+    def generate_reports(self, summary: ExecutionSummary) -> List[str]:
+        """
+        Generate reports in all configured formats.
+
+        Args:
+            summary: Execution summary
+
+        Returns:
+            List of generated report file paths
+        """
+        logger.info("Generating reports...")
+        
+        generated_files = []
+        timestamp = summary.start_time
+
+        # Generate filename
+        filename_base = self.config.reporting.filename_template.format(
+            timestamp=timestamp,
+            config_name="regression"
+        )
+
+        for fmt in self.config.reporting.formats:
+            try:
+                if fmt == "html":
+                    filepath = self._generate_html(summary, filename_base)
+                    generated_files.append(filepath)
+                elif fmt == "csv":
+                    filepath = self._generate_csv(summary, filename_base)
+                    generated_files.append(filepath)
+                elif fmt == "pdf":
+                    logger.warning("PDF generation not yet implemented")
+                else:
+                    logger.warning(f"Unknown report format: {fmt}")
+
+            except Exception as e:
+                logger.error(f"Failed to generate {fmt} report: {e}")
+
+        logger.info(f"Generated {len(generated_files)} report(s)")
+        return generated_files
+
+    def _generate_html(self, summary: ExecutionSummary, filename_base: str) -> str:
+        """Generate HTML report."""
+        generator = HTMLReportGenerator(self.config)
+        filepath = self.output_dir / f"{filename_base}.html"
+        generator.generate(summary, filepath)
+        logger.info(f"✓ HTML: {filepath}")
+        return str(filepath)
+
+    def _generate_csv(self, summary: ExecutionSummary, filename_base: str) -> str:
+        """Generate CSV report."""
+        generator = CSVReportGenerator(self.config)
+        filepath = self.output_dir / f"{filename_base}.csv"
+        generator.generate(summary, filepath)
+        logger.info(f"✓ CSV:  {filepath}")
+        return str(filepath)
--- a/src/drt/reporting/html.py
+++ b/src/drt/reporting/html.py
@@ -0,0 +1,239 @@
+"""HTML report generator."""
+
+from pathlib import Path
+from drt.models.summary import ExecutionSummary
+from drt.models.enums import Status, CheckType
+from drt.config.models import Config
+from drt.utils.logging import get_logger
+from drt.utils.timestamps import format_duration
+
+logger = get_logger(__name__)
+
+
+class HTMLReportGenerator:
+    """Generates HTML format reports."""
+
+    def __init__(self, config: Config):
+        """
+        Initialize HTML generator.
+
+        Args:
+            config: Configuration object
+        """
+        self.config = config
+        self.colors = config.reporting.html.get("colors", {})
+
+    def generate(self, summary: ExecutionSummary, filepath: Path) -> None:
+        """
+        Generate HTML report.
+
+        Args:
+            summary: Execution summary
+            filepath: Output file path
+        """
+        html_content = self._build_html(summary)
+        
+        with open(filepath, "w", encoding="utf-8") as f:
+            f.write(html_content)
+
+        logger.debug(f"HTML report written to {filepath}")
+
+    def _build_html(self, summary: ExecutionSummary) -> str:
+        """Build complete HTML document."""
+        return f"""<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Data Regression Test Report - {summary.start_time}</title>
+    {self._get_styles()}
+</head>
+<body>
+    <div class="container">
+        {self._build_header(summary)}
+        {self._build_summary(summary)}
+        {self._build_failures(summary)}
+        {self._build_warnings(summary)}
+        {self._build_detailed_results(summary)}
+        {self._build_footer(summary)}
+    </div>
+</body>
+</html>"""
+
+    def _get_styles(self) -> str:
+        """Get embedded CSS styles."""
+        return """<style>
+        * { margin: 0; padding: 0; box-sizing: border-box; }
+        body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; background: #f5f5f5; padding: 20px; }
+        .container { max-width: 1400px; margin: 0 auto; background: white; padding: 30px; border-radius: 8px; box-shadow: 0 2px 10px rgba(0,0,0,0.1); }
+        h1 { color: #333; border-bottom: 3px solid #007bff; padding-bottom: 10px; margin-bottom: 20px; }
+        h2 { color: #555; margin-top: 30px; margin-bottom: 15px; border-left: 4px solid #007bff; padding-left: 10px; }
+        .header { background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 20px; border-radius: 8px; margin-bottom: 30px; }
+        .header h1 { color: white; border: none; }
+        .info-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 15px; margin: 20px 0; }
+        .info-box { background: #f8f9fa; padding: 15px; border-radius: 5px; border-left: 4px solid #007bff; }
+        .info-label { font-weight: bold; color: #666; font-size: 0.9em; }
+        .info-value { color: #333; font-size: 1.1em; margin-top: 5px; }
+        .summary-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(150px, 1fr)); gap: 15px; margin: 20px 0; }
+        .summary-box { padding: 20px; border-radius: 8px; text-align: center; color: white; }
+        .summary-box.pass { background: #28a745; }
+        .summary-box.fail { background: #dc3545; }
+        .summary-box.warning { background: #ffc107; color: #333; }
+        .summary-box.error { background: #6f42c1; }
+        .summary-box.info { background: #17a2b8; }
+        .summary-box.skip { background: #6c757d; }
+        .summary-number { font-size: 2.5em; font-weight: bold; }
+        .summary-label { font-size: 0.9em; margin-top: 5px; }
+        .summary-percent { font-size: 0.8em; opacity: 0.9; }
+        table { width: 100%; border-collapse: collapse; margin: 20px 0; }
+        th { background: #007bff; color: white; padding: 12px; text-align: left; font-weight: 600; }
+        td { padding: 10px 12px; border-bottom: 1px solid #dee2e6; }
+        tr:hover { background: #f8f9fa; }
+        .status-badge { display: inline-block; padding: 4px 12px; border-radius: 12px; font-size: 0.85em; font-weight: 600; }
+        .status-PASS { background: #d4edda; color: #155724; }
+        .status-FAIL { background: #f8d7da; color: #721c24; }
+        .status-WARNING { background: #fff3cd; color: #856404; }
+        .status-ERROR { background: #e7d6f5; color: #4a148c; }
+        .status-INFO { background: #d1ecf1; color: #0c5460; }
+        .status-SKIP { background: #e2e3e5; color: #383d41; }
+        .failure-box { background: #fff5f5; border: 1px solid #feb2b2; border-radius: 5px; padding: 15px; margin: 10px 0; }
+        .failure-title { font-weight: bold; color: #c53030; margin-bottom: 8px; }
+        .failure-detail { color: #666; margin: 5px 0; font-size: 0.95em; }
+        .footer { margin-top: 40px; padding-top: 20px; border-top: 1px solid #dee2e6; text-align: center; color: #666; font-size: 0.9em; }
+        </style>"""
+
+    def _build_header(self, summary: ExecutionSummary) -> str:
+        """Build report header."""
+        return f"""<div class="header">
+        <h1>📊 Data Regression Test Report</h1>
+        <p>Generated: {summary.start_time}</p>
+    </div>
+    
+    <div class="info-grid">
+        <div class="info-box">
+            <div class="info-label">Start Time</div>
+            <div class="info-value">{summary.start_time}</div>
+        </div>
+        <div class="info-box">
+            <div class="info-label">End Time</div>
+            <div class="info-value">{summary.end_time}</div>
+        </div>
+        <div class="info-box">
+            <div class="info-label">Duration</div>
+            <div class="info-value">{format_duration(summary.duration_seconds)}</div>
+        </div>
+        <div class="info-box">
+            <div class="info-label">Baseline</div>
+            <div class="info-value">{summary.baseline_info}</div>
+        </div>
+        <div class="info-box">
+            <div class="info-label">Target</div>
+            <div class="info-value">{summary.target_info}</div>
+        </div>
+        <div class="info-box">
+            <div class="info-label">Total Tables</div>
+            <div class="info-value">{summary.total_tables}</div>
+        </div>
+    </div>"""
+
+    def _build_summary(self, summary: ExecutionSummary) -> str:
+        """Build summary section."""
+        return f"""<h2>Summary</h2>
+    <div class="summary-grid">
+        <div class="summary-box pass">
+            <div class="summary-number">{summary.passed}</div>
+            <div class="summary-label">PASS</div>
+            <div class="summary-percent">{(summary.passed/summary.total_tables*100) if summary.total_tables > 0 else 0:.1f}%</div>
+        </div>
+        <div class="summary-box fail">
+            <div class="summary-number">{summary.failed}</div>
+            <div class="summary-label">FAIL</div>
+            <div class="summary-percent">{(summary.failed/summary.total_tables*100) if summary.total_tables > 0 else 0:.1f}%</div>
+        </div>
+        <div class="summary-box warning">
+            <div class="summary-number">{summary.warnings}</div>
+            <div class="summary-label">WARNING</div>
+            <div class="summary-percent">{(summary.warnings/summary.total_tables*100) if summary.total_tables > 0 else 0:.1f}%</div>
+        </div>
+        <div class="summary-box error">
+            <div class="summary-number">{summary.errors}</div>
+            <div class="summary-label">ERROR</div>
+            <div class="summary-percent">{(summary.errors/summary.total_tables*100) if summary.total_tables > 0 else 0:.1f}%</div>
+        </div>
+        <div class="summary-box info">
+            <div class="summary-number">{summary.info}</div>
+            <div class="summary-label">INFO</div>
+            <div class="summary-percent">{(summary.info/summary.total_tables*100) if summary.total_tables > 0 else 0:.1f}%</div>
+        </div>
+        <div class="summary-box skip">
+            <div class="summary-number">{summary.skipped}</div>
+            <div class="summary-label">SKIP</div>
+            <div class="summary-percent">{(summary.skipped/summary.total_tables*100) if summary.total_tables > 0 else 0:.1f}%</div>
+        </div>
+    </div>"""
+
+    def _build_failures(self, summary: ExecutionSummary) -> str:
+        """Build failures section."""
+        failures = [r for r in summary.results if r.overall_status == Status.FAIL]
+        
+        if not failures:
+            return ""
+
+        html = '<h2>❌ Failures (Immediate Action Required)</h2>'
+        
+        for result in failures:
+            html += f"""<div class="failure-box">
+            <div class="failure-title">{result.table.full_name}</div>"""
+            
+            for check in result.check_results:
+                if check.status == Status.FAIL:
+                    html += f'<div class="failure-detail">• {check.check_type.value}: {check.message}</div>'
+            
+            html += '</div>'
+
+        return html
+
+    def _build_warnings(self, summary: ExecutionSummary) -> str:
+        """Build warnings section."""
+        warnings = [r for r in summary.results if r.overall_status == Status.WARNING]
+        
+        if not warnings:
+            return ""
+
+        html = '<h2>⚠️ Warnings</h2><ul>'
+        
+        for result in warnings:
+            for check in result.check_results:
+                if check.status == Status.WARNING:
+                    html += f'<li><strong>{result.table.full_name}</strong>: {check.message}</li>'
+
+        html += '</ul>'
+        return html
+
+    def _build_detailed_results(self, summary: ExecutionSummary) -> str:
+        """Build detailed results table."""
+        html = '<h2>Detailed Results</h2><table><thead><tr>'
+        html += '<th>Table</th><th>Status</th><th>Row Count</th><th>Schema</th><th>Aggregates</th><th>Time (ms)</th>'
+        html += '</tr></thead><tbody>'
+
+        for result in summary.results:
+            row_count = result.get_check(CheckType.ROW_COUNT)
+            schema = result.get_check(CheckType.SCHEMA)
+            aggregate = result.get_check(CheckType.AGGREGATE)
+
+            html += f'<tr><td>{result.table.full_name}</td>'
+            html += f'<td><span class="status-badge status-{result.overall_status.value}">{result.overall_status.value}</span></td>'
+            html += f'<td><span class="status-badge status-{row_count.status.value if row_count else "SKIP"}">{row_count.status.value if row_count else "SKIP"}</span></td>'
+            html += f'<td><span class="status-badge status-{schema.status.value if schema else "SKIP"}">{schema.status.value if schema else "SKIP"}</span></td>'
+            html += f'<td><span class="status-badge status-{aggregate.status.value if aggregate else "SKIP"}">{aggregate.status.value if aggregate else "SKIP"}</span></td>'
+            html += f'<td>{result.execution_time_ms}</td></tr>'
+
+        html += '</tbody></table>'
+        return html
+
+    def _build_footer(self, summary: ExecutionSummary) -> str:
+        """Build report footer."""
+        return f"""<div class="footer">
+        <p>Generated by Data Regression Testing Framework v1.0.0</p>
+        <p>Success Rate: {summary.success_rate:.1f}%</p>
+    </div>"""
--- a/src/drt/reporting/investigation_report.py
+++ b/src/drt/reporting/investigation_report.py
@@ -0,0 +1,357 @@
+"""Investigation report generators for HTML and CSV formats."""
+
+import csv
+from pathlib import Path
+from typing import Optional
+from drt.models.investigation import InvestigationSummary, QueryExecutionResult
+from drt.models.enums import Status
+from drt.config.models import Config
+from drt.utils.logging import get_logger
+from drt.utils.timestamps import format_duration
+
+logger = get_logger(__name__)
+
+
+class InvestigationHTMLReportGenerator:
+    """Generates HTML format investigation reports."""
+    
+    def __init__(self, config: Config):
+        """
+        Initialize HTML generator.
+        
+        Args:
+            config: Configuration object
+        """
+        self.config = config
+        self.max_rows = 100  # Limit rows displayed in HTML
+    
+    def generate(self, summary: InvestigationSummary, filepath: Path) -> None:
+        """
+        Generate HTML investigation report.
+        
+        Args:
+            summary: Investigation summary
+            filepath: Output file path
+        """
+        html_content = self._build_html(summary)
+        
+        with open(filepath, "w", encoding="utf-8") as f:
+            f.write(html_content)
+        
+        logger.debug(f"Investigation HTML report written to {filepath}")
+    
+    def _build_html(self, summary: InvestigationSummary) -> str:
+        """Build complete HTML document."""
+        return f"""<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Investigation Report - {summary.start_time}</title>
+    {self._get_styles()}
+    {self._get_scripts()}
+</head>
+<body>
+    <div class="container">
+        {self._build_header(summary)}
+        {self._build_summary(summary)}
+        {self._build_table_results(summary)}
+        {self._build_footer(summary)}
+    </div>
+</body>
+</html>"""
+    
+    def _get_styles(self) -> str:
+        """Get embedded CSS styles."""
+        return """<style>
+        * { margin: 0; padding: 0; box-sizing: border-box; }
+        body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; background: #f5f5f5; padding: 20px; }
+        .container { max-width: 1600px; margin: 0 auto; background: white; padding: 30px; border-radius: 8px; box-shadow: 0 2px 10px rgba(0,0,0,0.1); }
+        h1 { color: #333; border-bottom: 3px solid #007bff; padding-bottom: 10px; margin-bottom: 20px; }
+        h2 { color: #555; margin-top: 30px; margin-bottom: 15px; border-left: 4px solid #007bff; padding-left: 10px; }
+        h3 { color: #666; margin-top: 20px; margin-bottom: 10px; }
+        .header { background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 20px; border-radius: 8px; margin-bottom: 30px; }
+        .header h1 { color: white; border: none; }
+        .info-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 15px; margin: 20px 0; }
+        .info-box { background: #f8f9fa; padding: 15px; border-radius: 5px; border-left: 4px solid #007bff; }
+        .info-label { font-weight: bold; color: #666; font-size: 0.9em; }
+        .info-value { color: #333; font-size: 1.1em; margin-top: 5px; }
+        .summary-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(150px, 1fr)); gap: 15px; margin: 20px 0; }
+        .summary-box { padding: 20px; border-radius: 8px; text-align: center; color: white; }
+        .summary-box.success { background: #28a745; }
+        .summary-box.partial { background: #ffc107; color: #333; }
+        .summary-box.failed { background: #dc3545; }
+        .summary-number { font-size: 2.5em; font-weight: bold; }
+        .summary-label { font-size: 0.9em; margin-top: 5px; }
+        .table-card { background: #fff; border: 1px solid #dee2e6; border-radius: 8px; margin: 20px 0; overflow: hidden; }
+        .table-header { background: #f8f9fa; padding: 15px; border-bottom: 2px solid #dee2e6; cursor: pointer; }
+        .table-header:hover { background: #e9ecef; }
+        .table-name { font-size: 1.2em; font-weight: bold; color: #333; }
+        .table-status { display: inline-block; padding: 4px 12px; border-radius: 12px; font-size: 0.85em; font-weight: 600; margin-left: 10px; }
+        .status-SUCCESS { background: #d4edda; color: #155724; }
+        .status-PASS { background: #d4edda; color: #155724; }
+        .status-FAIL { background: #f8d7da; color: #721c24; }
+        .status-WARNING { background: #fff3cd; color: #856404; }
+        .status-SKIP { background: #e2e3e5; color: #383d41; }
+        .table-content { padding: 20px; display: none; }
+        .table-content.active { display: block; }
+        .query-section { margin: 20px 0; padding: 15px; background: #f8f9fa; border-radius: 5px; }
+        .query-header { font-weight: bold; margin-bottom: 10px; color: #555; }
+        .comparison-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 20px; margin: 15px 0; }
+        .env-section { background: white; padding: 15px; border-radius: 5px; border: 1px solid #dee2e6; }
+        .env-title { font-weight: bold; color: #007bff; margin-bottom: 10px; }
+        .query-code { background: #2d2d2d; color: #f8f8f2; padding: 15px; border-radius: 5px; overflow-x: auto; font-family: 'Courier New', monospace; font-size: 0.9em; margin: 10px 0; }
+        .result-table { width: 100%; border-collapse: collapse; margin: 10px 0; font-size: 0.9em; }
+        .result-table th { background: #007bff; color: white; padding: 8px; text-align: left; }
+        .result-table td { padding: 8px; border-bottom: 1px solid #dee2e6; }
+        .result-table tr:hover { background: #f8f9fa; }
+        .error-box { background: #fff5f5; border: 1px solid #feb2b2; border-radius: 5px; padding: 15px; margin: 10px 0; color: #c53030; }
+        .result-meta { display: flex; gap: 20px; margin: 10px 0; font-size: 0.9em; color: #666; }
+        .footer { margin-top: 40px; padding-top: 20px; border-top: 1px solid #dee2e6; text-align: center; color: #666; font-size: 0.9em; }
+        .toggle-icon { float: right; transition: transform 0.3s; }
+        .toggle-icon.active { transform: rotate(180deg); }
+        </style>"""
+    
+    def _get_scripts(self) -> str:
+        """Get embedded JavaScript."""
+        return """<script>
+        function toggleTable(id) {
+            const content = document.getElementById('content-' + id);
+            const icon = document.getElementById('icon-' + id);
+            content.classList.toggle('active');
+            icon.classList.toggle('active');
+        }
+        </script>"""
+    
+    def _build_header(self, summary: InvestigationSummary) -> str:
+        """Build report header."""
+        return f"""<div class="header">
+        <h1>🔍 Investigation Report</h1>
+        <p>Analysis Directory: {summary.analysis_directory}</p>
+    </div>
+    
+    <div class="info-grid">
+        <div class="info-box">
+            <div class="info-label">Start Time</div>
+            <div class="info-value">{summary.start_time}</div>
+        </div>
+        <div class="info-box">
+            <div class="info-label">End Time</div>
+            <div class="info-value">{summary.end_time}</div>
+        </div>
+        <div class="info-box">
+            <div class="info-label">Duration</div>
+            <div class="info-value">{format_duration(summary.duration_seconds)}</div>
+        </div>
+        <div class="info-box">
+            <div class="info-label">Baseline</div>
+            <div class="info-value">{summary.baseline_info}</div>
+        </div>
+        <div class="info-box">
+            <div class="info-label">Target</div>
+            <div class="info-value">{summary.target_info}</div>
+        </div>
+        <div class="info-box">
+            <div class="info-label">Total Queries</div>
+            <div class="info-value">{summary.total_queries_executed}</div>
+        </div>
+    </div>"""
+    
+    def _build_summary(self, summary: InvestigationSummary) -> str:
+        """Build summary section."""
+        return f"""<h2>Summary</h2>
+    <div class="summary-grid">
+        <div class="summary-box success">
+            <div class="summary-number">{summary.tables_successful}</div>
+            <div class="summary-label">Successful</div>
+        </div>
+        <div class="summary-box partial">
+            <div class="summary-number">{summary.tables_partial}</div>
+            <div class="summary-label">Partial</div>
+        </div>
+        <div class="summary-box failed">
+            <div class="summary-number">{summary.tables_failed}</div>
+            <div class="summary-label">Failed</div>
+        </div>
+    </div>"""
+    
+    def _build_table_results(self, summary: InvestigationSummary) -> str:
+        """Build table-by-table results."""
+        html = '<h2>Investigation Results</h2>'
+        
+        for idx, table_result in enumerate(summary.results):
+            html += f"""<div class="table-card">
+                <div class="table-header" onclick="toggleTable({idx})">
+                    <span class="table-name">{table_result.full_name}</span>
+                    <span class="table-status status-{table_result.overall_status.value}">{table_result.overall_status.value}</span>
+                    <span class="toggle-icon" id="icon-{idx}">▼</span>
+                </div>
+                <div class="table-content" id="content-{idx}">
+                    <p><strong>SQL File:</strong> {table_result.sql_file_path}</p>
+                    <p><strong>Total Queries:</strong> {table_result.total_queries}</p>
+                    <p><strong>Successful Queries:</strong> {table_result.successful_queries}</p>
+                    {self._build_queries(table_result)}
+                </div>
+            </div>"""
+        
+        return html
+    
+    def _build_queries(self, table_result) -> str:
+        """Build query results for a table."""
+        html = ""
+        
+        for i, (baseline_result, target_result) in enumerate(zip(
+            table_result.baseline_results,
+            table_result.target_results
+        ), 1):
+            html += f"""<div class="query-section">
+                <div class="query-header">Query {baseline_result.query_number}</div>
+                <details>
+                    <summary>View SQL</summary>
+                    <div class="query-code">{self._escape_html(baseline_result.query_text)}</div>
+                </details>
+                <div class="comparison-grid">
+                    {self._build_query_result(baseline_result, "Baseline")}
+                    {self._build_query_result(target_result, "Target")}
+                </div>
+            </div>"""
+        
+        return html
+    
+    def _build_query_result(self, result: QueryExecutionResult, env: str) -> str:
+        """Build single query result."""
+        html = f"""<div class="env-section">
+            <div class="env-title">{env}</div>
+            <span class="table-status status-{result.status.value}">{result.status.value}</span>
+            <div class="result-meta">
+                <span>⏱️ {result.execution_time_ms}ms</span>
+                <span>📊 {result.row_count} rows</span>
+            </div>"""
+        
+        if result.error_message:
+            html += f'<div class="error-box">❌ {self._escape_html(result.error_message)}</div>'
+        elif result.result_data is not None and not result.result_data.empty:
+            html += self._build_result_table(result)
+        
+        html += '</div>'
+        return html
+    
+    def _build_result_table(self, result: QueryExecutionResult) -> str:
+        """Build HTML table from DataFrame."""
+        df = result.result_data
+        
+        if df is None or df.empty:
+            return '<p>No data returned</p>'
+        
+        # Limit rows
+        display_df = df.head(self.max_rows)
+        
+        html = '<table class="result-table"><thead><tr>'
+        for col in display_df.columns:
+            html += f'<th>{self._escape_html(str(col))}</th>'
+        html += '</tr></thead><tbody>'
+        
+        for _, row in display_df.iterrows():
+            html += '<tr>'
+            for val in row:
+                html += f'<td>{self._escape_html(str(val))}</td>'
+            html += '</tr>'
+        
+        html += '</tbody></table>'
+        
+        if len(df) > self.max_rows:
+            html += f'<p><em>Showing first {self.max_rows} of {len(df)} rows</em></p>'
+        
+        return html
+    
+    def _escape_html(self, text: str) -> str:
+        """Escape HTML special characters."""
+        return (text
+                .replace('&', '&amp;')
+                .replace('<', '&lt;')
+                .replace('>', '&gt;')
+                .replace('"', '&quot;')
+                .replace("'", '&#39;'))
+    
+    def _build_footer(self, summary: InvestigationSummary) -> str:
+        """Build report footer."""
+        return f"""<div class="footer">
+        <p>Generated by Data Regression Testing Framework - Investigation Module</p>
+        <p>Success Rate: {summary.success_rate:.1f}%</p>
+    </div>"""
+
+
+class InvestigationCSVReportGenerator:
+    """Generates CSV format investigation reports."""
+    
+    def __init__(self, config: Config):
+        """
+        Initialize CSV generator.
+        
+        Args:
+            config: Configuration object
+        """
+        self.config = config
+    
+    def generate(self, summary: InvestigationSummary, filepath: Path) -> None:
+        """
+        Generate CSV investigation report.
+        
+        Args:
+            summary: Investigation summary
+            filepath: Output file path
+        """
+        csv_config = self.config.reporting.csv
+        delimiter = csv_config.get("delimiter", ",")
+        encoding = csv_config.get("encoding", "utf-8-sig")
+        
+        with open(filepath, "w", newline="", encoding=encoding) as f:
+            writer = csv.writer(f, delimiter=delimiter)
+            
+            # Write header
+            writer.writerow([
+                "Timestamp",
+                "Schema",
+                "Table",
+                "Query_Number",
+                "Environment",
+                "Status",
+                "Row_Count",
+                "Execution_Time_Ms",
+                "Error_Message",
+                "SQL_File_Path"
+            ])
+            
+            # Write data rows
+            for table_result in summary.results:
+                # Baseline results
+                for query_result in table_result.baseline_results:
+                    writer.writerow([
+                        table_result.timestamp,
+                        table_result.schema,
+                        table_result.table,
+                        query_result.query_number,
+                        "baseline",
+                        query_result.status.value,
+                        query_result.row_count,
+                        query_result.execution_time_ms,
+                        query_result.error_message or "",
+                        table_result.sql_file_path
+                    ])
+                
+                # Target results
+                for query_result in table_result.target_results:
+                    writer.writerow([
+                        table_result.timestamp,
+                        table_result.schema,
+                        table_result.table,
+                        query_result.query_number,
+                        "target",
+                        query_result.status.value,
+                        query_result.row_count,
+                        query_result.execution_time_ms,
+                        query_result.error_message or "",
+                        table_result.sql_file_path
+                    ])
+        
+        logger.debug(f"Investigation CSV report written to {filepath}")
--- a/src/drt/services/init.py
+++ b/src/drt/services/init.py
@@ -0,0 +1,6 @@
+"""Business logic services."""
+
+from drt.services.discovery import DiscoveryService
+from drt.services.comparison import ComparisonService
+
+__all__ = ["DiscoveryService", "ComparisonService"]
--- a/src/drt/services/checkers/init.py
+++ b/src/drt/services/checkers/init.py
@@ -0,0 +1,15 @@
+"""Comparison checkers."""
+
+from drt.services.checkers.base import BaseChecker
+from drt.services.checkers.existence import ExistenceChecker
+from drt.services.checkers.row_count import RowCountChecker
+from drt.services.checkers.schema import SchemaChecker
+from drt.services.checkers.aggregate import AggregateChecker
+
+__all__ = [
+    "BaseChecker",
+    "ExistenceChecker",
+    "RowCountChecker",
+    "SchemaChecker",
+    "AggregateChecker",
+]
--- a/src/drt/services/checkers/aggregate.py
+++ b/src/drt/services/checkers/aggregate.py
@@ -0,0 +1,111 @@
+"""Aggregate checker."""
+
+import time
+from drt.services.checkers.base import BaseChecker
+from drt.models.results import CheckResult
+from drt.models.table import TableInfo
+from drt.models.enums import Status, CheckType
+from drt.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+class AggregateChecker(BaseChecker):
+    """Checks aggregate sums for numeric columns."""
+
+    def check(self, table: TableInfo) -> CheckResult:
+        """
+        Check aggregate sums.
+
+        Args:
+            table: Table information
+
+        Returns:
+            Check result
+        """
+        if not self.config.comparison.aggregates.enabled:
+            return CheckResult(
+                check_type=CheckType.AGGREGATE,
+                status=Status.SKIP,
+                message="Aggregate check disabled"
+            )
+
+        if not table.aggregate_columns:
+            return CheckResult(
+                check_type=CheckType.AGGREGATE,
+                status=Status.SKIP,
+                message="No aggregate columns configured"
+            )
+
+        try:
+            # Time baseline query
+            baseline_start = time.time()
+            baseline_sums = self.baseline_executor.get_aggregate_sums(
+                table.schema, table.name, table.aggregate_columns
+            )
+            baseline_time = (time.time() - baseline_start) * 1000
+            logger.debug(f"    └─ Baseline aggregate query: {baseline_time:.0f}ms")
+            
+            # Time target query
+            target_start = time.time()
+            target_sums = self.target_executor.get_aggregate_sums(
+                table.schema, table.name, table.aggregate_columns
+            )
+            target_time = (time.time() - target_start) * 1000
+            logger.debug(f"    └─ Target aggregate query: {target_time:.0f}ms")
+            logger.debug(f"    └─ Total aggregate time: {baseline_time + target_time:.0f}ms (could be parallelized)")
+
+            tolerance_pct = self.config.comparison.aggregates.tolerance_percent
+            issues = []
+            statuses = []
+
+            for col in table.aggregate_columns:
+                baseline_val = baseline_sums.get(col, 0.0)
+                target_val = target_sums.get(col, 0.0)
+
+                if baseline_val == target_val:
+                    continue
+
+                # Calculate percentage difference
+                if baseline_val != 0:
+                    pct_diff = abs((target_val - baseline_val) / baseline_val * 100)
+                else:
+                    pct_diff = 100.0 if target_val != 0 else 0.0
+
+                if pct_diff > tolerance_pct:
+                    statuses.append(Status.FAIL)
+                    issues.append(
+                        f"Column '{col}': SUM differs by {pct_diff:.2f}% "
+                        f"(Baseline: {baseline_val:,.2f}, Target: {target_val:,.2f})"
+                    )
+
+            # Determine overall status
+            if not statuses:
+                status = Status.PASS
+                message = f"All {len(table.aggregate_columns)} aggregate(s) match"
+            else:
+                status = Status.most_severe(statuses)
+                message = "; ".join(issues)
+
+            return CheckResult(
+                check_type=CheckType.AGGREGATE,
+                status=status,
+                baseline_value=baseline_sums,
+                target_value=target_sums,
+                message=message,
+                details={
+                    "baseline_sums": baseline_sums,
+                    "target_sums": target_sums,
+                    "tolerance_percent": tolerance_pct,
+                    "columns_checked": table.aggregate_columns,
+                    "issues": issues
+                }
+            )
+
+        except Exception as e:
+            logger.error(f"Aggregate check failed for {table.full_name}: {e}")
+            return CheckResult(
+                check_type=CheckType.AGGREGATE,
+                status=Status.ERROR,
+                message=f"Aggregate check error: {str(e)}"
+            )
--- a/src/drt/services/checkers/base.py
+++ b/src/drt/services/checkers/base.py
@@ -0,0 +1,42 @@
+"""Base checker class."""
+
+from abc import ABC, abstractmethod
+from drt.models.results import CheckResult
+from drt.models.table import TableInfo
+from drt.database.executor import QueryExecutor
+from drt.config.models import Config
+
+
+class BaseChecker(ABC):
+    """Abstract base class for all checkers."""
+
+    def __init__(
+        self,
+        baseline_executor: QueryExecutor,
+        target_executor: QueryExecutor,
+        config: Config
+    ):
+        """
+        Initialize checker.
+
+        Args:
+            baseline_executor: Query executor for baseline database
+            target_executor: Query executor for target database
+            config: Configuration object
+        """
+        self.baseline_executor = baseline_executor
+        self.target_executor = target_executor
+        self.config = config
+
+    @abstractmethod
+    def check(self, table: TableInfo) -> CheckResult:
+        """
+        Perform the check.
+
+        Args:
+            table: Table information
+
+        Returns:
+            Check result
+        """
+        pass
--- a/src/drt/services/checkers/existence.py
+++ b/src/drt/services/checkers/existence.py
@@ -0,0 +1,78 @@
+"""Table existence checker."""
+
+import time
+from drt.services.checkers.base import BaseChecker
+from drt.models.results import CheckResult
+from drt.models.table import TableInfo
+from drt.models.enums import Status, CheckType
+from drt.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+class ExistenceChecker(BaseChecker):
+    """Checks if table exists in both baseline and target."""
+
+    def check(self, table: TableInfo) -> CheckResult:
+        """
+        Check table existence.
+
+        Args:
+            table: Table information
+
+        Returns:
+            Check result
+        """
+        try:
+            # Time baseline query
+            baseline_start = time.time()
+            baseline_exists = self.baseline_executor.table_exists(table.schema, table.name)
+            baseline_time = (time.time() - baseline_start) * 1000
+            logger.debug(f"    └─ Baseline existence query: {baseline_time:.0f}ms")
+            
+            # Time target query
+            target_start = time.time()
+            target_exists = self.target_executor.table_exists(table.schema, table.name)
+            target_time = (time.time() - target_start) * 1000
+            logger.debug(f"    └─ Target existence query: {target_time:.0f}ms")
+            logger.debug(f"    └─ Total existence time: {baseline_time + target_time:.0f}ms (could be parallelized)")
+
+            # Determine status
+            if baseline_exists and target_exists:
+                status = Status.PASS
+                message = "Table exists in both databases"
+            elif baseline_exists and not target_exists:
+                # Table missing in target
+                if table.expected_in_target:
+                    status = Status.FAIL
+                    message = "Table exists in Baseline but missing in Target (REGRESSION)"
+                else:
+                    status = Status.INFO
+                    message = "Table removed from Target (expected per configuration)"
+            elif not baseline_exists and target_exists:
+                status = Status.INFO
+                message = "Table exists only in Target (new table)"
+            else:
+                status = Status.ERROR
+                message = "Table does not exist in either database"
+
+            return CheckResult(
+                check_type=CheckType.EXISTENCE,
+                status=status,
+                baseline_value=baseline_exists,
+                target_value=target_exists,
+                message=message,
+                details={
+                    "baseline_exists": baseline_exists,
+                    "target_exists": target_exists,
+                    "expected_in_target": table.expected_in_target
+                }
+            )
+
+        except Exception as e:
+            logger.error(f"Existence check failed for {table.full_name}: {e}")
+            return CheckResult(
+                check_type=CheckType.EXISTENCE,
+                status=Status.ERROR,
+                message=f"Existence check error: {str(e)}"
+            )
--- a/src/drt/services/checkers/row_count.py
+++ b/src/drt/services/checkers/row_count.py
@@ -0,0 +1,90 @@
+"""Row count checker."""
+
+import time
+from drt.services.checkers.base import BaseChecker
+from drt.models.results import CheckResult
+from drt.models.table import TableInfo
+from drt.models.enums import Status, CheckType
+from drt.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+class RowCountChecker(BaseChecker):
+    """Checks row count differences between baseline and target."""
+
+    def check(self, table: TableInfo) -> CheckResult:
+        """
+        Check row counts.
+
+        Args:
+            table: Table information
+
+        Returns:
+            Check result
+        """
+        if not self.config.comparison.row_count.enabled:
+            return CheckResult(
+                check_type=CheckType.ROW_COUNT,
+                status=Status.SKIP,
+                message="Row count check disabled"
+            )
+
+        try:
+            # Time baseline query
+            baseline_start = time.time()
+            baseline_count = self.baseline_executor.get_row_count(table.schema, table.name)
+            baseline_time = (time.time() - baseline_start) * 1000
+            logger.debug(f"    └─ Baseline row count query: {baseline_time:.0f}ms")
+            
+            # Time target query
+            target_start = time.time()
+            target_count = self.target_executor.get_row_count(table.schema, table.name)
+            target_time = (time.time() - target_start) * 1000
+            logger.debug(f"    └─ Target row count query: {target_time:.0f}ms")
+            logger.debug(f"    └─ Total row count time: {baseline_time + target_time:.0f}ms (could be parallelized)")
+
+            difference = target_count - baseline_count
+            tolerance_pct = self.config.comparison.row_count.tolerance_percent
+
+            # Determine status
+            if baseline_count == target_count:
+                status = Status.PASS
+                message = f"Row counts match: {baseline_count:,}"
+            elif target_count > baseline_count:
+                pct_diff = (difference / baseline_count * 100) if baseline_count > 0 else 0
+                status = Status.WARNING
+                message = f"Target has {difference:,} more rows (+{pct_diff:.2f}%)"
+            else:  # target_count < baseline_count
+                pct_diff = abs(difference / baseline_count * 100) if baseline_count > 0 else 0
+                
+                if pct_diff <= tolerance_pct:
+                    status = Status.WARNING
+                    message = f"Target has {abs(difference):,} fewer rows (-{pct_diff:.2f}%) - within tolerance"
+                else:
+                    status = Status.FAIL
+                    message = f"Target missing {abs(difference):,} rows (-{pct_diff:.2f}%) - REGRESSION"
+
+            return CheckResult(
+                check_type=CheckType.ROW_COUNT,
+                status=status,
+                baseline_value=baseline_count,
+                target_value=target_count,
+                difference=difference,
+                message=message,
+                details={
+                    "baseline_count": baseline_count,
+                    "target_count": target_count,
+                    "difference": difference,
+                    "percent_difference": (difference / baseline_count * 100) if baseline_count > 0 else 0,
+                    "tolerance_percent": tolerance_pct
+                }
+            )
+
+        except Exception as e:
+            logger.error(f"Row count check failed for {table.full_name}: {e}")
+            return CheckResult(
+                check_type=CheckType.ROW_COUNT,
+                status=Status.ERROR,
+                message=f"Row count check error: {str(e)}"
+            )
--- a/src/drt/services/checkers/schema.py
+++ b/src/drt/services/checkers/schema.py
@@ -0,0 +1,132 @@
+"""Schema checker."""
+
+import time
+from typing import Set
+from drt.services.checkers.base import BaseChecker
+from drt.models.results import CheckResult
+from drt.models.table import TableInfo
+from drt.models.enums import Status, CheckType
+from drt.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+class SchemaChecker(BaseChecker):
+    """Checks schema differences between baseline and target."""
+
+    def check(self, table: TableInfo) -> CheckResult:
+        """
+        Check schema compatibility.
+
+        Args:
+            table: Table information
+
+        Returns:
+            Check result
+        """
+        if not self.config.comparison.schema.enabled:
+            return CheckResult(
+                check_type=CheckType.SCHEMA,
+                status=Status.SKIP,
+                message="Schema check disabled"
+            )
+
+        try:
+            # Time baseline query
+            baseline_start = time.time()
+            baseline_cols = self.baseline_executor.get_columns(table.schema, table.name)
+            baseline_time = (time.time() - baseline_start) * 1000
+            logger.debug(f"    └─ Baseline schema query: {baseline_time:.0f}ms")
+            
+            # Time target query
+            target_start = time.time()
+            target_cols = self.target_executor.get_columns(table.schema, table.name)
+            target_time = (time.time() - target_start) * 1000
+            logger.debug(f"    └─ Target schema query: {target_time:.0f}ms")
+            logger.debug(f"    └─ Total schema time: {baseline_time + target_time:.0f}ms (could be parallelized)")
+
+            baseline_col_names = {col['COLUMN_NAME'] for col in baseline_cols}
+            target_col_names = {col['COLUMN_NAME'] for col in target_cols}
+
+            missing_in_target = baseline_col_names - target_col_names
+            extra_in_target = target_col_names - baseline_col_names
+
+            issues = []
+            statuses = []
+
+            # Check for missing columns
+            if missing_in_target:
+                severity = self.config.comparison.schema.severity.get(
+                    "missing_column_in_target", "FAIL"
+                )
+                statuses.append(Status[severity])
+                issues.append(f"Missing columns in Target: {', '.join(sorted(missing_in_target))}")
+
+            # Check for extra columns
+            if extra_in_target:
+                severity = self.config.comparison.schema.severity.get(
+                    "extra_column_in_target", "WARNING"
+                )
+                statuses.append(Status[severity])
+                issues.append(f"Extra columns in Target: {', '.join(sorted(extra_in_target))}")
+
+            # Check data types for matching columns
+            if self.config.comparison.schema.checks.get("data_types", True):
+                type_mismatches = self._check_data_types(baseline_cols, target_cols)
+                if type_mismatches:
+                    severity = self.config.comparison.schema.severity.get(
+                        "data_type_mismatch", "WARNING"
+                    )
+                    statuses.append(Status[severity])
+                    issues.extend(type_mismatches)
+
+            # Determine overall status
+            if not statuses:
+                status = Status.PASS
+                message = f"Schema matches: {len(baseline_col_names)} columns"
+            else:
+                status = Status.most_severe(statuses)
+                message = "; ".join(issues)
+
+            return CheckResult(
+                check_type=CheckType.SCHEMA,
+                status=status,
+                baseline_value=len(baseline_col_names),
+                target_value=len(target_col_names),
+                message=message,
+                details={
+                    "baseline_columns": sorted(baseline_col_names),
+                    "target_columns": sorted(target_col_names),
+                    "missing_in_target": sorted(missing_in_target),
+                    "extra_in_target": sorted(extra_in_target),
+                    "issues": issues
+                }
+            )
+
+        except Exception as e:
+            logger.error(f"Schema check failed for {table.full_name}: {e}")
+            return CheckResult(
+                check_type=CheckType.SCHEMA,
+                status=Status.ERROR,
+                message=f"Schema check error: {str(e)}"
+            )
+
+    def _check_data_types(self, baseline_cols: list, target_cols: list) -> list:
+        """Check for data type mismatches."""
+        mismatches = []
+        
+        # Create lookup dictionaries
+        baseline_types = {col['COLUMN_NAME']: col['DATA_TYPE'] for col in baseline_cols}
+        target_types = {col['COLUMN_NAME']: col['DATA_TYPE'] for col in target_cols}
+
+        # Check common columns
+        common_cols = set(baseline_types.keys()) & set(target_types.keys())
+        
+        for col in sorted(common_cols):
+            if baseline_types[col] != target_types[col]:
+                mismatches.append(
+                    f"Column '{col}': type mismatch "
+                    f"(Baseline: {baseline_types[col]}, Target: {target_types[col]})"
+                )
+
+        return mismatches
--- a/src/drt/services/comparison.py
+++ b/src/drt/services/comparison.py
@@ -0,0 +1,250 @@
+"""Comparison service for executing database comparisons."""
+
+import time
+from typing import List
+from drt.database.connection import ConnectionManager
+from drt.database.executor import QueryExecutor
+from drt.config.models import Config, DatabasePairConfig
+from drt.models.table import TableInfo
+from drt.models.results import ComparisonResult
+from drt.models.summary import ExecutionSummary
+from drt.models.enums import Status
+from drt.services.checkers import (
+    ExistenceChecker,
+    RowCountChecker,
+    SchemaChecker,
+    AggregateChecker
+)
+from drt.utils.logging import get_logger
+from drt.utils.timestamps import get_timestamp
+from drt.utils.patterns import matches_pattern
+
+logger = get_logger(__name__)
+
+
+class ComparisonService:
+    """Service for comparing baseline and target databases."""
+
+    def __init__(self, config: Config):
+        """
+        Initialize comparison service.
+
+        Args:
+            config: Configuration object
+        """
+        self.config = config
+
+    def run_comparison(self, db_pair: DatabasePairConfig) -> ExecutionSummary:
+        """
+        Run comparison for a database pair.
+
+        Args:
+            db_pair: Database pair configuration
+
+        Returns:
+            Execution summary with results
+        """
+        start_time = get_timestamp()
+        start_ts = time.time()
+
+        logger.info("=" * 60)
+        logger.info(f"Starting comparison: {db_pair.name}")
+        logger.info("=" * 60)
+
+        # Initialize connections
+        baseline_mgr = ConnectionManager(db_pair.baseline)
+        target_mgr = ConnectionManager(db_pair.target)
+
+        try:
+            # Connect to databases
+            baseline_mgr.connect()
+            target_mgr.connect()
+
+            # Create executors
+            baseline_executor = QueryExecutor(baseline_mgr)
+            target_executor = QueryExecutor(target_mgr)
+
+            # Initialize checkers
+            existence_checker = ExistenceChecker(baseline_executor, target_executor, self.config)
+            row_count_checker = RowCountChecker(baseline_executor, target_executor, self.config)
+            schema_checker = SchemaChecker(baseline_executor, target_executor, self.config)
+            aggregate_checker = AggregateChecker(baseline_executor, target_executor, self.config)
+
+            # Get tables to compare
+            tables = self._get_tables_to_compare()
+            logger.info(f"Tables to compare: {len(tables)}")
+
+            # Create summary
+            summary = ExecutionSummary(
+                start_time=start_time,
+                end_time="",
+                duration_seconds=0,
+                config_file=self.config.metadata.generated_date or "",
+                baseline_info=f"{db_pair.baseline.server}.{db_pair.baseline.database}",
+                target_info=f"{db_pair.target.server}.{db_pair.target.database}"
+            )
+
+            # Compare each table
+            for idx, table in enumerate(tables, 1):
+                if not table.enabled:
+                    logger.info(f"[{idx:3d}/{len(tables)}] {table.full_name:40s} SKIP (disabled)")
+                    result = self._create_skipped_result(table)
+                    summary.add_result(result)
+                    continue
+
+                logger.info(f"[{idx:3d}/{len(tables)}] {table.full_name:40s} ...", extra={'end': ''})
+                
+                result = self._compare_table(
+                    table,
+                    existence_checker,
+                    row_count_checker,
+                    schema_checker,
+                    aggregate_checker
+                )
+                
+                summary.add_result(result)
+                
+                # Log result
+                status_symbol = self._get_status_symbol(result.overall_status)
+                logger.info(f" {status_symbol} {result.overall_status.value}")
+
+                if not self.config.execution.continue_on_error and result.overall_status == Status.ERROR:
+                    logger.error("Stopping due to error (continue_on_error=False)")
+                    break
+
+            # Finalize summary
+            end_time = get_timestamp()
+            duration = int(time.time() - start_ts)
+            summary.end_time = end_time
+            summary.duration_seconds = duration
+
+            # Log summary
+            self._log_summary(summary)
+
+            return summary
+
+        finally:
+            baseline_mgr.disconnect()
+            target_mgr.disconnect()
+
+    def _compare_table(
+        self,
+        table: TableInfo,
+        existence_checker: ExistenceChecker,
+        row_count_checker: RowCountChecker,
+        schema_checker: SchemaChecker,
+        aggregate_checker: AggregateChecker
+    ) -> ComparisonResult:
+        """Compare a single table."""
+        start_ms = time.time() * 1000
+        
+        result = ComparisonResult(
+            table=table,
+            overall_status=Status.PASS,
+            timestamp=get_timestamp()
+        )
+
+        try:
+            # Check existence first
+            check_start = time.time()
+            existence_result = existence_checker.check(table)
+            existence_time = (time.time() - check_start) * 1000
+            logger.debug(f"  └─ Existence check: {existence_time:.0f}ms")
+            result.add_check(existence_result)
+
+            # Only proceed with other checks if table exists in both
+            if existence_result.status == Status.PASS:
+                # Row count check
+                check_start = time.time()
+                row_count_result = row_count_checker.check(table)
+                row_count_time = (time.time() - check_start) * 1000
+                logger.debug(f"  └─ Row count check: {row_count_time:.0f}ms")
+                result.add_check(row_count_result)
+
+                # Schema check
+                check_start = time.time()
+                schema_result = schema_checker.check(table)
+                schema_time = (time.time() - check_start) * 1000
+                logger.debug(f"  └─ Schema check: {schema_time:.0f}ms")
+                result.add_check(schema_result)
+
+                # Aggregate check
+                check_start = time.time()
+                aggregate_result = aggregate_checker.check(table)
+                aggregate_time = (time.time() - check_start) * 1000
+                logger.debug(f"  └─ Aggregate check: {aggregate_time:.0f}ms")
+                result.add_check(aggregate_result)
+
+        except Exception as e:
+            logger.error(f"Comparison failed for {table.full_name}: {e}")
+            result.overall_status = Status.ERROR
+            result.error_message = str(e)
+
+        result.execution_time_ms = int(time.time() * 1000 - start_ms)
+        logger.debug(f"  └─ Total table time: {result.execution_time_ms}ms")
+        return result
+
+    def _get_tables_to_compare(self) -> List[TableInfo]:
+        """Get list of tables to compare based on configuration."""
+        tables = []
+
+        for table_config in self.config.tables:
+            table = TableInfo(
+                schema=table_config.schema,
+                name=table_config.name,
+                enabled=table_config.enabled,
+                expected_in_target=table_config.expected_in_target,
+                estimated_row_count=table_config.estimated_row_count,
+                primary_key_columns=table_config.primary_key_columns,
+                aggregate_columns=table_config.aggregate_columns,
+                notes=table_config.notes
+            )
+            tables.append(table)
+
+        # Apply filters
+        if self.config.table_filters.mode == "include_list":
+            if self.config.table_filters.include_list:
+                include_names = {f"{t['schema']}.{t['name']}" for t in self.config.table_filters.include_list}
+                tables = [t for t in tables if t.full_name in include_names]
+
+        # Apply exclusions
+        tables = [
+            t for t in tables
+            if not matches_pattern(t.name, self.config.table_filters.exclude_patterns)
+            and t.schema not in self.config.table_filters.exclude_schemas
+        ]
+
+        return tables
+
+    def _create_skipped_result(self, table: TableInfo) -> ComparisonResult:
+        """Create a skipped result for disabled tables."""
+        return ComparisonResult(
+            table=table,
+            overall_status=Status.SKIP,
+            timestamp=get_timestamp()
+        )
+
+    def _get_status_symbol(self, status: Status) -> str:
+        """Get symbol for status."""
+        symbols = {
+            Status.PASS: "✓",
+            Status.FAIL: "✗",
+            Status.WARNING: "⚠",
+            Status.ERROR: "🔴",
+            Status.INFO: "ℹ",
+            Status.SKIP: "○"
+        }
+        return symbols.get(status, "?")
+
+    def _log_summary(self, summary: ExecutionSummary) -> None:
+        """Log execution summary."""
+        logger.info("=" * 60)
+        logger.info("COMPARISON SUMMARY")
+        logger.info("=" * 60)
+        logger.info(f"  PASS:    {summary.passed:3d} | FAIL:    {summary.failed:3d}")
+        logger.info(f"  WARNING: {summary.warnings:3d} | ERROR:   {summary.errors:3d}")
+        logger.info(f"  INFO:    {summary.info:3d} | SKIP:    {summary.skipped:3d}")
+        logger.info("=" * 60)
+        logger.info(f"Duration: {summary.duration_seconds} seconds")
+        logger.info(f"Success Rate: {summary.success_rate:.1f}%")
+        logger.info("=" * 60)
--- a/src/drt/services/discovery.py
+++ b/src/drt/services/discovery.py
@@ -0,0 +1,192 @@
+"""Discovery service for auto-generating configuration."""
+
+from typing import List
+from drt.database.connection import ConnectionManager
+from drt.database.executor import QueryExecutor
+from drt.database.queries import SQLQueries
+from drt.models.table import TableInfo, ColumnInfo
+from drt.config.models import Config, TableConfig, MetadataConfig, ConnectionConfig
+from drt.utils.logging import get_logger
+from drt.utils.timestamps import get_timestamp
+from drt.utils.patterns import matches_pattern
+
+logger = get_logger(__name__)
+
+
+class DiscoveryService:
+    """Service for discovering database tables and generating configuration."""
+
+    def __init__(self, connection_config: ConnectionConfig, config: Config = None):
+        """
+        Initialize discovery service.
+
+        Args:
+            connection_config: Connection configuration for baseline database
+            config: Optional existing configuration for discovery settings
+        """
+        self.conn_config = connection_config
+        self.config = config or Config()
+        self.conn_mgr = ConnectionManager(connection_config)
+        self.executor = QueryExecutor(self.conn_mgr)
+
+    def discover_tables(self) -> List[TableInfo]:
+        """
+        Discover all tables in the database.
+
+        Returns:
+            List of discovered tables
+        """
+        logger.info("Starting table discovery...")
+
+        try:
+            # Get all tables
+            tables_data = self.executor.get_all_tables()
+            logger.info(f"Found {len(tables_data)} tables")
+
+            discovered_tables = []
+
+            for table_data in tables_data:
+                schema = table_data['schema_name']
+                name = table_data['table_name']
+                estimated_rows = table_data.get('estimated_rows', 0)
+
+                # Apply filters
+                if self._should_exclude_table(schema, name):
+                    logger.debug(f"Excluding table: {schema}.{name}")
+                    continue
+
+                # Get column information
+                columns = self._discover_columns(schema, name)
+                
+                # Get primary keys
+                pk_columns = self.executor.get_primary_keys(schema, name)
+
+                # Identify numeric columns for aggregation
+                aggregate_cols = [
+                    col.name for col in columns 
+                    if col.is_numeric and self.config.discovery.detect_numeric_columns
+                ]
+
+                table_info = TableInfo(
+                    schema=schema,
+                    name=name,
+                    estimated_row_count=estimated_rows,
+                    columns=columns,
+                    primary_key_columns=pk_columns,
+                    enabled=True,
+                    expected_in_target=self.config.discovery.default_expected_in_target,
+                    aggregate_columns=aggregate_cols,
+                    notes=""
+                )
+
+                discovered_tables.append(table_info)
+                logger.debug(f"Discovered: {table_info.full_name} ({estimated_rows:,} rows)")
+
+            logger.info(f"Discovery complete: {len(discovered_tables)} tables discovered")
+            return discovered_tables
+
+        except Exception as e:
+            logger.error(f"Discovery failed: {e}")
+            raise
+
+    def _discover_columns(self, schema: str, table: str) -> List[ColumnInfo]:
+        """Discover columns for a table."""
+        import math
+        columns_data = self.executor.get_columns(schema, table)
+        columns = []
+
+        for idx, col_data in enumerate(columns_data, 1):
+            is_numeric = SQLQueries.is_numeric_type(col_data['DATA_TYPE'])
+            
+            # Convert nan to None for Pydantic validation
+            # Pandas converts SQL NULL to nan, but Pydantic v2 rejects nan for Optional[int]
+            max_length = col_data.get('CHARACTER_MAXIMUM_LENGTH')
+            if isinstance(max_length, float) and math.isnan(max_length):
+                max_length = None
+                
+            precision = col_data.get('NUMERIC_PRECISION')
+            if isinstance(precision, float) and math.isnan(precision):
+                precision = None
+                
+            scale = col_data.get('NUMERIC_SCALE')
+            if isinstance(scale, float) and math.isnan(scale):
+                scale = None
+            
+            # DEBUG: Log converted values to verify fix
+            logger.debug(f"Column {col_data['COLUMN_NAME']}: max_length={max_length} (converted from {col_data.get('CHARACTER_MAXIMUM_LENGTH')}), "
+                        f"precision={precision}, scale={scale}, data_type={col_data['DATA_TYPE']}")
+            
+            column = ColumnInfo(
+                name=col_data['COLUMN_NAME'],
+                data_type=col_data['DATA_TYPE'],
+                max_length=max_length,
+                precision=precision,
+                scale=scale,
+                is_nullable=col_data['IS_NULLABLE'] == 'YES',
+                is_numeric=is_numeric,
+                ordinal_position=col_data.get('ORDINAL_POSITION', idx)
+            )
+            columns.append(column)
+
+        return columns
+
+    def _should_exclude_table(self, schema: str, table: str) -> bool:
+        """Check if table should be excluded based on filters."""
+        # Check schema exclusions
+        if schema in self.config.discovery.exclude_schemas:
+            return True
+
+        # Check table name patterns
+        if matches_pattern(table, self.config.discovery.exclude_patterns):
+            return True
+
+        # Check schema inclusions (if specified)
+        if self.config.discovery.include_schemas:
+            if schema not in self.config.discovery.include_schemas:
+                return True
+
+        return False
+
+    def generate_config(self, tables: List[TableInfo]) -> Config:
+        """
+        Generate configuration from discovered tables.
+
+        Args:
+            tables: List of discovered tables
+
+        Returns:
+            Generated configuration
+        """
+        logger.info("Generating configuration...")
+
+        # Create table configs
+        table_configs = [
+            TableConfig(
+                schema=table.schema,
+                name=table.name,
+                enabled=table.enabled,
+                expected_in_target=table.expected_in_target,
+                estimated_row_count=table.estimated_row_count,
+                primary_key_columns=table.primary_key_columns,
+                aggregate_columns=table.aggregate_columns,
+                notes=table.notes
+            )
+            for table in tables
+        ]
+
+        # Update metadata
+        metadata = MetadataConfig(
+            config_version="1.0",
+            generated_date=get_timestamp(),
+            generated_by="discovery",
+            framework_version="1.0.0"
+        )
+
+        # Create new config with discovered tables
+        config = Config(
+            metadata=metadata,
+            tables=table_configs
+        )
+
+        logger.info(f"Configuration generated with {len(table_configs)} tables")
+        return config
--- a/src/drt/services/investigation.py
+++ b/src/drt/services/investigation.py
@@ -0,0 +1,297 @@
+"""Investigation service for executing investigation queries."""
+
+import time
+from pathlib import Path
+from typing import List, Tuple
+from drt.database.connection import ConnectionManager
+from drt.database.executor import QueryExecutor
+from drt.config.models import Config, DatabasePairConfig
+from drt.models.investigation import (
+    QueryExecutionResult,
+    TableInvestigationResult,
+    InvestigationSummary
+)
+from drt.models.enums import Status
+from drt.services.sql_parser import SQLParser, discover_sql_files
+from drt.utils.logging import get_logger
+from drt.utils.timestamps import get_timestamp
+
+logger = get_logger(__name__)
+
+
+class InvestigationService:
+    """Service for executing investigation queries."""
+    
+    def __init__(self, config: Config):
+        """
+        Initialize investigation service.
+        
+        Args:
+            config: Configuration object
+        """
+        self.config = config
+        self.parser = SQLParser()
+    
+    def run_investigation(
+        self,
+        analysis_dir: Path,
+        db_pair: DatabasePairConfig
+    ) -> InvestigationSummary:
+        """
+        Run investigation for all SQL files in analysis directory.
+        
+        Args:
+            analysis_dir: Path to analysis output directory
+            db_pair: Database pair configuration
+            
+        Returns:
+            Investigation summary with all results
+        """
+        start_time = get_timestamp()
+        start_ts = time.time()
+        
+        logger.info("=" * 60)
+        logger.info(f"Starting investigation: {analysis_dir.name}")
+        logger.info("=" * 60)
+        
+        # Initialize connections
+        baseline_mgr = ConnectionManager(db_pair.baseline)
+        target_mgr = ConnectionManager(db_pair.target)
+        
+        try:
+            # Connect to databases
+            baseline_mgr.connect()
+            target_mgr.connect()
+            
+            # Create executors
+            baseline_executor = QueryExecutor(baseline_mgr)
+            target_executor = QueryExecutor(target_mgr)
+            
+            # Discover SQL files
+            sql_files = discover_sql_files(analysis_dir)
+            logger.info(f"Found {len(sql_files)} investigation files")
+            
+            # Create summary
+            summary = InvestigationSummary(
+                start_time=start_time,
+                end_time="",
+                duration_seconds=0,
+                analysis_directory=str(analysis_dir),
+                baseline_info=f"{db_pair.baseline.server}.{db_pair.baseline.database}",
+                target_info=f"{db_pair.target.server}.{db_pair.target.database}",
+                tables_processed=0,
+                tables_successful=0,
+                tables_partial=0,
+                tables_failed=0,
+                total_queries_executed=0,
+                results=[]
+            )
+            
+            # Process each SQL file
+            for idx, (schema, table, sql_path) in enumerate(sql_files, 1):
+                logger.info(f"[{idx:3d}/{len(sql_files)}] {schema}.{table:40s} ...")
+                
+                result = self._investigate_table(
+                    schema,
+                    table,
+                    sql_path,
+                    baseline_executor,
+                    target_executor
+                )
+                
+                summary.results.append(result)
+                summary.tables_processed += 1
+                
+                # Update counters
+                if result.overall_status == Status.PASS:
+                    summary.tables_successful += 1
+                elif result.overall_status == Status.SKIP:
+                    # Don't count skipped tables in partial/failed
+                    pass
+                elif result.overall_status in [Status.WARNING, Status.INFO]:
+                    # Treat WARNING/INFO as partial success
+                    summary.tables_partial += 1
+                elif self._is_partial_status(result):
+                    summary.tables_partial += 1
+                else:
+                    summary.tables_failed += 1
+                
+                # Count queries
+                summary.total_queries_executed += len(result.baseline_results)
+                summary.total_queries_executed += len(result.target_results)
+                
+                logger.info(f" {self._get_status_symbol(result.overall_status)} "
+                          f"{result.overall_status.value}")
+            
+            # Finalize summary
+            end_time = get_timestamp()
+            duration = int(time.time() - start_ts)
+            summary.end_time = end_time
+            summary.duration_seconds = duration
+            
+            self._log_summary(summary)
+            
+            return summary
+            
+        finally:
+            baseline_mgr.disconnect()
+            target_mgr.disconnect()
+    
+    def _investigate_table(
+        self,
+        schema: str,
+        table: str,
+        sql_path: Path,
+        baseline_executor: QueryExecutor,
+        target_executor: QueryExecutor
+    ) -> TableInvestigationResult:
+        """Execute investigation queries for a single table."""
+        
+        # Parse SQL file
+        queries = self.parser.parse_sql_file(sql_path)
+        
+        if not queries:
+            logger.warning(f"No valid queries found in {sql_path.name}")
+            return TableInvestigationResult(
+                schema=schema,
+                table=table,
+                sql_file_path=str(sql_path),
+                baseline_results=[],
+                target_results=[],
+                overall_status=Status.SKIP,
+                timestamp=get_timestamp()
+            )
+        
+        logger.debug(f"  └─ Executing {len(queries)} queries")
+        
+        # Execute on baseline
+        baseline_results = self._execute_queries(
+            queries,
+            baseline_executor,
+            "baseline"
+        )
+        
+        # Execute on target
+        target_results = self._execute_queries(
+            queries,
+            target_executor,
+            "target"
+        )
+        
+        # Determine overall status
+        overall_status = self._determine_overall_status(
+            baseline_results,
+            target_results
+        )
+        
+        return TableInvestigationResult(
+            schema=schema,
+            table=table,
+            sql_file_path=str(sql_path),
+            baseline_results=baseline_results,
+            target_results=target_results,
+            overall_status=overall_status,
+            timestamp=get_timestamp()
+        )
+    
+    def _execute_queries(
+        self,
+        queries: List[Tuple[int, str]],
+        executor: QueryExecutor,
+        environment: str
+    ) -> List[QueryExecutionResult]:
+        """Execute list of queries on one environment."""
+        results = []
+        
+        for query_num, query_text in queries:
+            logger.debug(f"    └─ Query {query_num} on {environment}")
+            
+            status, result_df, error_msg, exec_time = \
+                executor.execute_investigation_query(query_text)
+            
+            result = QueryExecutionResult(
+                query_number=query_num,
+                query_text=query_text,
+                status=status,
+                execution_time_ms=exec_time,
+                result_data=result_df,
+                error_message=error_msg,
+                row_count=len(result_df) if result_df is not None else 0
+            )
+            
+            results.append(result)
+            
+            logger.debug(f"      └─ {status.value} ({exec_time}ms, "
+                        f"{result.row_count} rows)")
+        
+        return results
+    
+    def _determine_overall_status(
+        self,
+        baseline_results: List[QueryExecutionResult],
+        target_results: List[QueryExecutionResult]
+    ) -> Status:
+        """Determine overall status for table investigation."""
+        
+        all_results = baseline_results + target_results
+        
+        if not all_results:
+            return Status.SKIP
+        
+        success_count = sum(1 for r in all_results if r.status == Status.PASS)
+        failed_count = sum(1 for r in all_results if r.status == Status.FAIL)
+        skipped_count = sum(1 for r in all_results if r.status == Status.SKIP)
+        
+        # All successful
+        if success_count == len(all_results):
+            return Status.PASS
+        
+        # All failed
+        if failed_count == len(all_results):
+            return Status.FAIL
+        
+        # All skipped
+        if skipped_count == len(all_results):
+            return Status.SKIP
+        
+        # Mixed results - use WARNING to indicate partial success
+        if success_count > 0:
+            return Status.WARNING
+        else:
+            return Status.FAIL
+    
+    def _is_partial_status(self, result: TableInvestigationResult) -> bool:
+        """Check if result represents partial success."""
+        all_results = result.baseline_results + result.target_results
+        if not all_results:
+            return False
+        
+        success_count = sum(1 for r in all_results if r.status == Status.PASS)
+        return 0 < success_count < len(all_results)
+    
+    def _get_status_symbol(self, status: Status) -> str:
+        """Get symbol for status."""
+        symbols = {
+            Status.PASS: "✓",
+            Status.FAIL: "✗",
+            Status.WARNING: "◐",
+            Status.SKIP: "○",
+            Status.ERROR: "🔴",
+            Status.INFO: "ℹ"
+        }
+        return symbols.get(status, "?")
+    
+    def _log_summary(self, summary: InvestigationSummary) -> None:
+        """Log investigation summary."""
+        logger.info("=" * 60)
+        logger.info("INVESTIGATION SUMMARY")
+        logger.info("=" * 60)
+        logger.info(f"  Tables Processed:  {summary.tables_processed}")
+        logger.info(f"  Successful:        {summary.tables_successful}")
+        logger.info(f"  Partial:           {summary.tables_partial}")
+        logger.info(f"  Failed:            {summary.tables_failed}")
+        logger.info(f"  Total Queries:     {summary.total_queries_executed}")
+        logger.info("=" * 60)
+        logger.info(f"Duration: {summary.duration_seconds} seconds")
+        logger.info(f"Success Rate: {summary.success_rate:.1f}%")
+        logger.info("=" * 60)
--- a/src/drt/services/sql_parser.py
+++ b/src/drt/services/sql_parser.py
@@ -0,0 +1,173 @@
+"""SQL file parser for investigation queries."""
+
+import re
+from pathlib import Path
+from typing import List, Tuple
+from drt.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+class SQLParser:
+    """Parser for investigation SQL files."""
+    
+    @staticmethod
+    def parse_sql_file(file_path: Path) -> List[Tuple[int, str]]:
+        """
+        Parse SQL file into individual queries with their numbers.
+        
+        Args:
+            file_path: Path to SQL file
+            
+        Returns:
+            List of tuples (query_number, query_text)
+            
+        Example:
+            >>> queries = SQLParser.parse_sql_file(Path("investigate.sql"))
+            >>> for num, query in queries:
+            ...     print(f"Query {num}: {query[:50]}...")
+        """
+        try:
+            content = file_path.read_text(encoding='utf-8')
+            
+            # Step 1: Remove markdown code blocks
+            content = SQLParser._remove_markdown(content)
+            
+            # Step 2: Split into queries
+            queries = SQLParser._split_queries(content)
+            
+            # Step 3: Clean and validate
+            cleaned_queries = []
+            for num, query in queries:
+                cleaned = SQLParser._clean_query(query)
+                if cleaned and SQLParser._is_valid_query(cleaned):
+                    cleaned_queries.append((num, cleaned))
+                else:
+                    logger.debug(f"Skipped invalid query {num} in {file_path.name}")
+            
+            logger.info(f"Parsed {len(cleaned_queries)} queries from {file_path.name}")
+            return cleaned_queries
+            
+        except Exception as e:
+            logger.error(f"Failed to parse {file_path}: {e}")
+            return []
+    
+    @staticmethod
+    def _remove_markdown(content: str) -> str:
+        """Remove markdown code blocks from content."""
+        # Remove opening ```sql
+        content = re.sub(r'```sql\s*\n?', '', content, flags=re.IGNORECASE)
+        # Remove closing ```
+        content = re.sub(r'```\s*\n?', '', content)
+        return content
+    
+    @staticmethod
+    def _split_queries(content: str) -> List[Tuple[int, str]]:
+        """
+        Split content into individual queries.
+        
+        Looks for patterns like:
+        -- Query 1: Description
+        -- Query 2: Description
+        """
+        queries = []
+        current_query = []
+        current_number = 0
+        
+        for line in content.split('\n'):
+            # Check if line is a query separator
+            match = re.match(r'^\s*--\s*Query\s+(\d+):', line, re.IGNORECASE)
+            
+            if match:
+                # Save previous query if exists
+                if current_query and current_number > 0:
+                    query_text = '\n'.join(current_query).strip()
+                    if query_text:
+                        queries.append((current_number, query_text))
+                
+                # Start new query
+                current_number = int(match.group(1))
+                current_query = []
+            else:
+                # Add line to current query
+                current_query.append(line)
+        
+        # Don't forget the last query
+        if current_query and current_number > 0:
+            query_text = '\n'.join(current_query).strip()
+            if query_text:
+                queries.append((current_number, query_text))
+        
+        return queries
+    
+    @staticmethod
+    def _clean_query(query: str) -> str:
+        """Clean query text."""
+        # Remove leading/trailing whitespace
+        query = query.strip()
+        
+        # Remove comment-only lines at start
+        lines = query.split('\n')
+        while lines and lines[0].strip().startswith('--'):
+            lines.pop(0)
+        
+        # Remove empty lines at start and end
+        while lines and not lines[0].strip():
+            lines.pop(0)
+        while lines and not lines[-1].strip():
+            lines.pop()
+        
+        return '\n'.join(lines)
+    
+    @staticmethod
+    def _is_valid_query(query: str) -> bool:
+        """Check if query is valid (not empty, not just comments)."""
+        if not query:
+            return False
+        
+        # Remove all comments and whitespace
+        cleaned = re.sub(r'--.*$', '', query, flags=re.MULTILINE)
+        cleaned = cleaned.strip()
+        
+        # Must have some SQL content
+        return len(cleaned) > 0
+
+
+def discover_sql_files(analysis_dir: Path) -> List[Tuple[str, str, Path]]:
+    """
+    Discover all *_investigate.sql files in analysis directory.
+    
+    Args:
+        analysis_dir: Root analysis directory
+        
+    Returns:
+        List of tuples (schema, table, file_path)
+        
+    Example:
+        >>> files = discover_sql_files(Path("analysis/output_20251209_184032"))
+        >>> for schema, table, path in files:
+        ...     print(f"{schema}.{table}: {path}")
+    """
+    sql_files = []
+    
+    # Pattern: dbo.TableName/dbo.TableName_investigate.sql
+    pattern = "**/*_investigate.sql"
+    
+    for sql_file in analysis_dir.glob(pattern):
+        # Extract schema and table from filename
+        # Example: dbo.A_COREC_NACES2008_investigate.sql
+        filename = sql_file.stem  # Remove .sql
+        
+        if filename.endswith('_investigate'):
+            # Remove _investigate suffix
+            full_name = filename[:-12]  # len('_investigate') = 12
+            
+            # Split schema.table
+            if '.' in full_name:
+                schema, table = full_name.split('.', 1)
+                sql_files.append((schema, table, sql_file))
+            else:
+                logger.warning(f"Could not parse schema.table from {filename}")
+    
+    logger.info(f"Discovered {len(sql_files)} investigation SQL files")
+    return sql_files
--- a/src/drt/utils/init.py
+++ b/src/drt/utils/init.py
@@ -0,0 +1,7 @@
+"""Utility functions and helpers."""
+
+from drt.utils.timestamps import get_timestamp, format_duration
+from drt.utils.patterns import matches_pattern
+from drt.utils.logging import setup_logging
+
+__all__ = ["get_timestamp", "format_duration", "matches_pattern", "setup_logging"]
--- a/src/drt/utils/logging.py
+++ b/src/drt/utils/logging.py
@@ -0,0 +1,75 @@
+"""Logging configuration and setup."""
+
+import logging
+import sys
+from pathlib import Path
+from typing import Optional
+from drt.utils.timestamps import get_timestamp
+
+
+def setup_logging(
+    log_level: str = "INFO",
+    log_dir: str = "./logs",
+    log_to_console: bool = True,
+    log_to_file: bool = True,
+) -> logging.Logger:
+    """
+    Configure logging for the framework.
+
+    Args:
+        log_level: Logging level (DEBUG, INFO, WARNING, ERROR)
+        log_dir: Directory for log files
+        log_to_console: Whether to log to console
+        log_to_file: Whether to log to file
+
+    Returns:
+        Configured logger instance
+    """
+    # Create logger
+    logger = logging.getLogger("drt")
+    logger.setLevel(getattr(logging, log_level.upper()))
+
+    # Remove existing handlers
+    logger.handlers.clear()
+
+    # Create formatter
+    log_format = "%(asctime)s | %(levelname)-8s | %(name)-20s | %(message)s"
+    date_format = "%Y%m%d_%H%M%S"
+    formatter = logging.Formatter(log_format, datefmt=date_format)
+
+    # Console handler
+    if log_to_console:
+        console_handler = logging.StreamHandler(sys.stdout)
+        console_handler.setLevel(getattr(logging, log_level.upper()))
+        console_handler.setFormatter(formatter)
+        logger.addHandler(console_handler)
+
+    # File handler
+    if log_to_file:
+        log_path = Path(log_dir)
+        log_path.mkdir(parents=True, exist_ok=True)
+
+        timestamp = get_timestamp()
+        log_file = log_path / f"drt_{timestamp}.log"
+
+        file_handler = logging.FileHandler(log_file, encoding="utf-8")
+        file_handler.setLevel(logging.DEBUG)  # Always log everything to file
+        file_handler.setFormatter(formatter)
+        logger.addHandler(file_handler)
+
+        logger.info(f"Logging to file: {log_file}")
+
+    return logger
+
+
+def get_logger(name: str) -> logging.Logger:
+    """
+    Get a logger instance for a specific module.
+
+    Args:
+        name: Logger name (typically __name__)
+
+    Returns:
+        Logger instance
+    """
+    return logging.getLogger(f"drt.{name}")
--- a/src/drt/utils/patterns.py
+++ b/src/drt/utils/patterns.py
@@ -0,0 +1,58 @@
+"""Pattern matching utilities for wildcard support."""
+
+import fnmatch
+from typing import List
+
+
+def matches_pattern(text: str, patterns: List[str]) -> bool:
+    """
+    Check if text matches any of the given wildcard patterns.
+
+    Args:
+        text: Text to match
+        patterns: List of wildcard patterns (e.g., "*_TEMP", "tmp*")
+
+    Returns:
+        True if text matches any pattern, False otherwise
+
+    Examples:
+        >>> matches_pattern("Orders_TEMP", ["*_TEMP", "*_TMP"])
+        True
+        >>> matches_pattern("Orders", ["*_TEMP", "*_TMP"])
+        False
+    """
+    if not patterns:
+        return False
+
+    for pattern in patterns:
+        if fnmatch.fnmatch(text.upper(), pattern.upper()):
+            return True
+
+    return False
+
+
+def filter_by_patterns(
+    items: List[str], include_patterns: List[str] = None, exclude_patterns: List[str] = None
+) -> List[str]:
+    """
+    Filter items by include and exclude patterns.
+
+    Args:
+        items: List of items to filter
+        include_patterns: Patterns to include (if None, include all)
+        exclude_patterns: Patterns to exclude
+
+    Returns:
+        Filtered list of items
+    """
+    result = items.copy()
+
+    # Apply include patterns if specified
+    if include_patterns:
+        result = [item for item in result if matches_pattern(item, include_patterns)]
+
+    # Apply exclude patterns
+    if exclude_patterns:
+        result = [item for item in result if not matches_pattern(item, exclude_patterns)]
+
+    return result
--- a/src/drt/utils/timestamps.py
+++ b/src/drt/utils/timestamps.py
@@ -0,0 +1,59 @@
+"""Timestamp utilities using YYYYMMDD_HHMMSS format."""
+
+from datetime import datetime
+
+
+def get_timestamp() -> str:
+    """
+    Get current timestamp in YYYYMMDD_HHMMSS format.
+
+    Returns:
+        Formatted timestamp string
+    """
+    return datetime.now().strftime("%Y%m%d_%H%M%S")
+
+
+def format_duration(seconds: int) -> str:
+    """
+    Format duration in seconds to human-readable string.
+
+    Args:
+        seconds: Duration in seconds
+
+    Returns:
+        Formatted duration string (e.g., "4 minutes 38 seconds")
+    """
+    if seconds < 60:
+        return f"{seconds} second{'s' if seconds != 1 else ''}"
+
+    minutes = seconds // 60
+    remaining_seconds = seconds % 60
+
+    if minutes < 60:
+        if remaining_seconds == 0:
+            return f"{minutes} minute{'s' if minutes != 1 else ''}"
+        return f"{minutes} minute{'s' if minutes != 1 else ''} {remaining_seconds} second{'s' if remaining_seconds != 1 else ''}"
+
+    hours = minutes // 60
+    remaining_minutes = minutes % 60
+
+    parts = [f"{hours} hour{'s' if hours != 1 else ''}"]
+    if remaining_minutes > 0:
+        parts.append(f"{remaining_minutes} minute{'s' if remaining_minutes != 1 else ''}")
+    if remaining_seconds > 0:
+        parts.append(f"{remaining_seconds} second{'s' if remaining_seconds != 1 else ''}")
+
+    return " ".join(parts)
+
+
+def parse_timestamp(timestamp_str: str) -> datetime:
+    """
+    Parse timestamp string in YYYYMMDD_HHMMSS format.
+
+    Args:
+        timestamp_str: Timestamp string to parse
+
+    Returns:
+        datetime object
+    """
+    return datetime.strptime(timestamp_str, "%Y%m%d_%H%M%S")
--- a/test_data/init_baseline.sql
+++ b/test_data/init_baseline.sql
@@ -0,0 +1,117 @@
+-- Baseline Database Initialization Script
+-- This creates a sample database structure for testing
+
+USE master;
+GO
+
+-- Create test database
+IF NOT EXISTS (SELECT name FROM sys.databases WHERE name = 'TestDB_Baseline')
+BEGIN
+    CREATE DATABASE TestDB_Baseline;
+END
+GO
+
+USE TestDB_Baseline;
+GO
+
+-- Create sample tables
+
+-- Dimension: Customers
+CREATE TABLE dbo.DimCustomer (
+    CustomerID INT PRIMARY KEY IDENTITY(1,1),
+    CustomerName NVARCHAR(100) NOT NULL,
+    Email NVARCHAR(100),
+    City NVARCHAR(50),
+    Country NVARCHAR(50),
+    CreatedDate DATETIME DEFAULT GETDATE()
+);
+
+-- Dimension: Products
+CREATE TABLE dbo.DimProduct (
+    ProductID INT PRIMARY KEY IDENTITY(1,1),
+    ProductName NVARCHAR(100) NOT NULL,
+    Category NVARCHAR(50),
+    UnitPrice DECIMAL(10,2),
+    IsActive BIT DEFAULT 1
+);
+
+-- Fact: Sales
+CREATE TABLE dbo.FactSales (
+    SaleID INT PRIMARY KEY IDENTITY(1,1),
+    CustomerID INT,
+    ProductID INT,
+    SaleDate DATE,
+    Quantity INT,
+    UnitPrice DECIMAL(10,2),
+    TotalAmount DECIMAL(10,2),
+    TaxAmount DECIMAL(10,2),
+    FOREIGN KEY (CustomerID) REFERENCES dbo.DimCustomer(CustomerID),
+    FOREIGN KEY (ProductID) REFERENCES dbo.DimProduct(ProductID)
+);
+
+-- Insert sample data (TEST DATA ONLY - NOT REAL CUSTOMERS)
+
+-- Customers
+INSERT INTO dbo.DimCustomer (CustomerName, Email, City, Country) VALUES
+('TestCustomer1', 'test1@test.local', 'City1', 'Country1'),
+('TestCustomer2', 'test2@test.local', 'City2', 'Country2'),
+('TestCustomer3', 'test3@test.local', 'City3', 'Country3'),
+('TestCustomer4', 'test4@test.local', 'City4', 'Country4'),
+('TestCustomer5', 'test5@test.local', 'City5', 'Country5');
+
+-- Products
+INSERT INTO dbo.DimProduct (ProductName, Category, UnitPrice, IsActive) VALUES
+('Laptop', 'Electronics', 999.99, 1),
+('Mouse', 'Electronics', 29.99, 1),
+('Keyboard', 'Electronics', 79.99, 1),
+('Monitor', 'Electronics', 299.99, 1),
+('Desk Chair', 'Furniture', 199.99, 1),
+('Desk', 'Furniture', 399.99, 1),
+('Notebook', 'Stationery', 4.99, 1),
+('Pen Set', 'Stationery', 12.99, 1);
+
+-- Sales (100 records)
+DECLARE @i INT = 1;
+WHILE @i <= 100
+BEGIN
+    INSERT INTO dbo.FactSales (CustomerID, ProductID, SaleDate, Quantity, UnitPrice, TotalAmount, TaxAmount)
+    VALUES (
+        (ABS(CHECKSUM(NEWID())) % 5) + 1,  -- Random CustomerID 1-5
+        (ABS(CHECKSUM(NEWID())) % 8) + 1,  -- Random ProductID 1-8
+        DATEADD(DAY, -ABS(CHECKSUM(NEWID())) % 365, GETDATE()),  -- Random date in last year
+        (ABS(CHECKSUM(NEWID())) % 10) + 1,  -- Random Quantity 1-10
+        (ABS(CHECKSUM(NEWID())) % 900) + 100.00,  -- Random price 100-1000
+        0,  -- Will be calculated
+        0   -- Will be calculated
+    );
+    
+    -- Calculate amounts
+    UPDATE dbo.FactSales
+    SET TotalAmount = Quantity * UnitPrice,
+        TaxAmount = Quantity * UnitPrice * 0.1
+    WHERE SaleID = @i;
+    
+    SET @i = @i + 1;
+END
+GO
+
+-- Create some views for testing
+CREATE VIEW dbo.vw_SalesSummary AS
+SELECT 
+    c.CustomerName,
+    p.ProductName,
+    s.SaleDate,
+    s.Quantity,
+    s.TotalAmount
+FROM dbo.FactSales s
+JOIN dbo.DimCustomer c ON s.CustomerID = c.CustomerID
+JOIN dbo.DimProduct p ON s.ProductID = p.ProductID;
+GO
+
+-- Create statistics
+CREATE STATISTICS stat_sales_date ON dbo.FactSales(SaleDate);
+CREATE STATISTICS stat_customer_country ON dbo.DimCustomer(Country);
+GO
+
+PRINT 'Baseline database initialized successfully';
+GO
--- a/test_data/init_target.sql
+++ b/test_data/init_target.sql
@@ -0,0 +1,131 @@
+-- Target Database Initialization Script
+-- This creates a similar structure with some intentional differences for testing
+
+USE master;
+GO
+
+-- Create test database
+IF NOT EXISTS (SELECT name FROM sys.databases WHERE name = 'TestDB_Target')
+BEGIN
+    CREATE DATABASE TestDB_Target;
+END
+GO
+
+USE TestDB_Target;
+GO
+
+-- Create sample tables (similar to baseline with some differences)
+
+-- Dimension: Customers (same structure)
+CREATE TABLE dbo.DimCustomer (
+    CustomerID INT PRIMARY KEY IDENTITY(1,1),
+    CustomerName NVARCHAR(100) NOT NULL,
+    Email NVARCHAR(100),
+    City NVARCHAR(50),
+    Country NVARCHAR(50),
+    CreatedDate DATETIME DEFAULT GETDATE()
+);
+
+-- Dimension: Products (slightly different - added column)
+CREATE TABLE dbo.DimProduct (
+    ProductID INT PRIMARY KEY IDENTITY(1,1),
+    ProductName NVARCHAR(100) NOT NULL,
+    Category NVARCHAR(50),
+    UnitPrice DECIMAL(10,2),
+    IsActive BIT DEFAULT 1,
+    LastModified DATETIME DEFAULT GETDATE()  -- Extra column for testing
+);
+
+-- Fact: Sales (same structure)
+CREATE TABLE dbo.FactSales (
+    SaleID INT PRIMARY KEY IDENTITY(1,1),
+    CustomerID INT,
+    ProductID INT,
+    SaleDate DATE,
+    Quantity INT,
+    UnitPrice DECIMAL(10,2),
+    TotalAmount DECIMAL(10,2),
+    TaxAmount DECIMAL(10,2),
+    FOREIGN KEY (CustomerID) REFERENCES dbo.DimCustomer(CustomerID),
+    FOREIGN KEY (ProductID) REFERENCES dbo.DimProduct(ProductID)
+);
+
+-- Insert sample data (TEST DATA ONLY - NOT REAL CUSTOMERS)
+
+-- Customers
+INSERT INTO dbo.DimCustomer (CustomerName, Email, City, Country) VALUES
+('TestCustomer1', 'test1@test.local', 'City1', 'Country1'),
+('TestCustomer2', 'test2@test.local', 'City2', 'Country2'),
+('TestCustomer3', 'test3@test.local', 'City3', 'Country3'),
+('TestCustomer4', 'test4@test.local', 'City4', 'Country4'),
+('TestCustomer5', 'test5@test.local', 'City5', 'Country5');
+
+-- Products (with LastModified)
+INSERT INTO dbo.DimProduct (ProductName, Category, UnitPrice, IsActive, LastModified) VALUES
+('Laptop', 'Electronics', 999.99, 1, GETDATE()),
+('Mouse', 'Electronics', 29.99, 1, GETDATE()),
+('Keyboard', 'Electronics', 79.99, 1, GETDATE()),
+('Monitor', 'Electronics', 299.99, 1, GETDATE()),
+('Desk Chair', 'Furniture', 199.99, 1, GETDATE()),
+('Desk', 'Furniture', 399.99, 1, GETDATE()),
+('Notebook', 'Stationery', 4.99, 1, GETDATE()),
+('Pen Set', 'Stationery', 12.99, 1, GETDATE());
+
+-- Sales (95 records - 5 fewer than baseline for testing)
+DECLARE @i INT = 1;
+WHILE @i <= 95
+BEGIN
+    INSERT INTO dbo.FactSales (CustomerID, ProductID, SaleDate, Quantity, UnitPrice, TotalAmount, TaxAmount)
+    VALUES (
+        (ABS(CHECKSUM(NEWID())) % 5) + 1,
+        (ABS(CHECKSUM(NEWID())) % 8) + 1,
+        DATEADD(DAY, -ABS(CHECKSUM(NEWID())) % 365, GETDATE()),
+        (ABS(CHECKSUM(NEWID())) % 10) + 1,
+        (ABS(CHECKSUM(NEWID())) % 900) + 100.00,
+        0,
+        0
+    );
+    
+    -- Calculate amounts
+    UPDATE dbo.FactSales
+    SET TotalAmount = Quantity * UnitPrice,
+        TaxAmount = Quantity * UnitPrice * 0.1
+    WHERE SaleID = @i;
+    
+    SET @i = @i + 1;
+END
+GO
+
+-- Create the same view
+CREATE VIEW dbo.vw_SalesSummary AS
+SELECT 
+    c.CustomerName,
+    p.ProductName,
+    s.SaleDate,
+    s.Quantity,
+    s.TotalAmount
+FROM dbo.FactSales s
+JOIN dbo.DimCustomer c ON s.CustomerID = c.CustomerID
+JOIN dbo.DimProduct p ON s.ProductID = p.ProductID;
+GO
+
+-- Create an extra table that doesn't exist in baseline
+CREATE TABLE dbo.TempProcessing (
+    ProcessID INT PRIMARY KEY IDENTITY(1,1),
+    ProcessName NVARCHAR(100),
+    Status NVARCHAR(20),
+    CreatedDate DATETIME DEFAULT GETDATE()
+);
+
+INSERT INTO dbo.TempProcessing (ProcessName, Status) VALUES
+('DataLoad', 'Completed'),
+('Validation', 'In Progress');
+GO
+
+-- Create statistics
+CREATE STATISTICS stat_sales_date ON dbo.FactSales(SaleDate);
+CREATE STATISTICS stat_customer_country ON dbo.DimCustomer(Country);
+GO
+
+PRINT 'Target database initialized successfully';
+GO
--- a/test_data/setup_test_environment.sh
+++ b/test_data/setup_test_environment.sh
@@ -0,0 +1,97 @@
+#!/bin/bash
+# Setup script for test SQL Server environment
+
+set -e
+
+echo "=========================================="
+echo "SQL Server Test Environment Setup"
+echo "=========================================="
+echo ""
+
+# Check if Docker is installed
+if ! command -v docker &> /dev/null; then
+    echo "Error: Docker is not installed"
+    echo "Please install Docker first: https://docs.docker.com/get-docker/"
+    exit 1
+fi
+
+# Check if Docker Compose is available (either standalone or plugin)
+if ! command -v docker-compose &> /dev/null && ! docker compose version &> /dev/null; then
+    echo "Error: Docker Compose is not installed"
+    echo "Please install Docker Compose first"
+    exit 1
+fi
+
+# Determine which compose command to use
+if docker compose version &> /dev/null; then
+    COMPOSE_CMD="docker compose"
+else
+    COMPOSE_CMD="docker-compose"
+fi
+
+echo "Step 1: Starting SQL Server containers..."
+$COMPOSE_CMD -f docker-compose.test.yml up -d
+
+echo ""
+echo "Step 2: Waiting for SQL Server to be ready..."
+echo "This may take 30-60 seconds..."
+
+# Set default password if not provided
+SA_PASSWORD=${SA_PASSWORD:-YourStrong!Passw0rd}
+
+# Wait for baseline server
+echo -n "Waiting for baseline server"
+for i in {1..30}; do
+    if docker exec drt-sqlserver-baseline /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P "$SA_PASSWORD" -C -Q "SELECT 1" &> /dev/null; then
+        echo " ✓"
+        break
+    fi
+    echo -n "."
+    sleep 2
+done
+
+# Wait for target server
+echo -n "Waiting for target server"
+for i in {1..30}; do
+    if docker exec drt-sqlserver-target /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P "$SA_PASSWORD" -C -Q "SELECT 1" &> /dev/null; then
+        echo " ✓"
+        break
+    fi
+    echo -n "."
+    sleep 2
+done
+
+echo ""
+echo "Step 3: Initializing baseline database..."
+docker exec -i drt-sqlserver-baseline /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P "$SA_PASSWORD" -C < test_data/init_baseline.sql
+
+echo ""
+echo "Step 4: Initializing target database..."
+docker exec -i drt-sqlserver-target /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P "$SA_PASSWORD" -C < test_data/init_target.sql
+
+echo ""
+echo "=========================================="
+echo "Setup completed successfully!"
+echo "=========================================="
+echo ""
+echo "SQL Server instances are running:"
+echo "  Baseline: localhost:1433"
+echo "  Target:   localhost:1434"
+echo ""
+echo "Credentials:"
+echo "  Username: sa"
+echo "  Password: (set via SA_PASSWORD environment variable)"
+echo ""
+echo "Test databases:"
+echo "  Baseline: TestDB_Baseline"
+echo "  Target:   TestDB_Target"
+echo ""
+echo "To test the connection:"
+echo "  drt discover --server localhost --database TestDB_Baseline --output config_test.yaml"
+echo ""
+echo "To stop the servers:"
+echo "  $COMPOSE_CMD -f docker-compose.test.yml down"
+echo ""
+echo "To stop and remove all data:"
+echo "  $COMPOSE_CMD -f docker-compose.test.yml down -v"
+echo ""
--- a/tests/init.py
+++ b/tests/init.py
@@ -0,0 +1,3 @@
+"""
+Test suite for Data Regression Testing Framework
+"""
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -0,0 +1,207 @@
+"""
+Unit tests for configuration management
+"""
+import pytest
+from pathlib import Path
+from drt.config.models import (
+    DatabaseConnection,
+    DatabasePair,
+    ComparisonSettings,
+    RowCountSettings,
+    SchemaSettings,
+    AggregateSettings,
+    ReportingSettings,
+    LoggingSettings,
+    Config
+)
+
+
+class TestDatabaseConnection:
+    """Test DatabaseConnection model"""
+    
+    def test_database_connection_minimal(self):
+        """Test creating a minimal database connection"""
+        conn = DatabaseConnection(
+            server="SQLSERVER01",
+            database="TestDB"
+        )
+        assert conn.server == "SQLSERVER01"
+        assert conn.database == "TestDB"
+        assert conn.timeout.connection == 30
+        assert conn.timeout.query == 300
+    
+    def test_database_connection_with_timeout(self):
+        """Test database connection with custom timeout"""
+        conn = DatabaseConnection(
+            server="SQLSERVER01",
+            database="TestDB",
+            timeout={"connection": 60, "query": 600}
+        )
+        assert conn.timeout.connection == 60
+        assert conn.timeout.query == 600
+
+
+class TestDatabasePair:
+    """Test DatabasePair model"""
+    
+    def test_database_pair_creation(self):
+        """Test creating a database pair"""
+        pair = DatabasePair(
+            name="Test_Pair",
+            enabled=True,
+            baseline=DatabaseConnection(
+                server="SQLSERVER01",
+                database="PROD_DB"
+            ),
+            target=DatabaseConnection(
+                server="SQLSERVER01",
+                database="TEST_DB"
+            )
+        )
+        assert pair.name == "Test_Pair"
+        assert pair.enabled is True
+        assert pair.baseline.database == "PROD_DB"
+        assert pair.target.database == "TEST_DB"
+
+
+class TestComparisonSettings:
+    """Test ComparisonSettings model"""
+    
+    def test_comparison_settings_health_check(self):
+        """Test health check mode settings"""
+        settings = ComparisonSettings(
+            mode="health_check",
+            row_count=RowCountSettings(enabled=True, tolerance_percent=0.0),
+            schema=SchemaSettings(
+                enabled=True,
+                checks={
+                    "column_names": True,
+                    "data_types": True
+                }
+            ),
+            aggregates=AggregateSettings(enabled=False)
+        )
+        assert settings.mode == "health_check"
+        assert settings.row_count.enabled is True
+        assert settings.aggregates.enabled is False
+    
+    def test_comparison_settings_full_mode(self):
+        """Test full mode settings"""
+        settings = ComparisonSettings(
+            mode="full",
+            row_count=RowCountSettings(enabled=True, tolerance_percent=0.0),
+            schema=SchemaSettings(enabled=True),
+            aggregates=AggregateSettings(enabled=True, tolerance_percent=0.01)
+        )
+        assert settings.mode == "full"
+        assert settings.aggregates.enabled is True
+        assert settings.aggregates.tolerance_percent == 0.01
+
+
+class TestReportingSettings:
+    """Test ReportingSettings model"""
+    
+    def test_reporting_settings_defaults(self):
+        """Test default reporting settings"""
+        settings = ReportingSettings()
+        assert settings.output_dir == "./reports"
+        assert settings.formats.html is True
+        assert settings.formats.csv is True
+        assert settings.formats.pdf is False
+        assert settings.include_timestamp is True
+    
+    def test_reporting_settings_custom(self):
+        """Test custom reporting settings"""
+        settings = ReportingSettings(
+            output_dir="./custom_reports",
+            filename_prefix="custom_test",
+            formats={"html": True, "csv": False, "pdf": True}
+        )
+        assert settings.output_dir == "./custom_reports"
+        assert settings.filename_prefix == "custom_test"
+        assert settings.formats.pdf is True
+
+
+class TestLoggingSettings:
+    """Test LoggingSettings model"""
+    
+    def test_logging_settings_defaults(self):
+        """Test default logging settings"""
+        settings = LoggingSettings()
+        assert settings.level == "INFO"
+        assert settings.output_dir == "./logs"
+        assert settings.console.enabled is True
+        assert settings.file.enabled is True
+    
+    def test_logging_settings_custom(self):
+        """Test custom logging settings"""
+        settings = LoggingSettings(
+            level="DEBUG",
+            console={"enabled": True, "level": "WARNING"}
+        )
+        assert settings.level == "DEBUG"
+        assert settings.console.level == "WARNING"
+
+
+class TestConfig:
+    """Test Config model"""
+    
+    def test_config_minimal(self):
+        """Test creating a minimal config"""
+        config = Config(
+            database_pairs=[
+                DatabasePair(
+                    name="Test",
+                    enabled=True,
+                    baseline=DatabaseConnection(
+                        server="SERVER01",
+                        database="PROD"
+                    ),
+                    target=DatabaseConnection(
+                        server="SERVER01",
+                        database="TEST"
+                    )
+                )
+            ],
+            comparison=ComparisonSettings(
+                mode="health_check",
+                row_count=RowCountSettings(enabled=True),
+                schema=SchemaSettings(enabled=True),
+                aggregates=AggregateSettings(enabled=False)
+            ),
+            tables=[]
+        )
+        assert len(config.database_pairs) == 1
+        assert config.comparison.mode == "health_check"
+        assert len(config.tables) == 0
+    
+    def test_config_with_tables(self):
+        """Test config with table definitions"""
+        from drt.models.table import TableInfo
+        
+        config = Config(
+            database_pairs=[
+                DatabasePair(
+                    name="Test",
+                    enabled=True,
+                    baseline=DatabaseConnection(server="S1", database="D1"),
+                    target=DatabaseConnection(server="S1", database="D2")
+                )
+            ],
+            comparison=ComparisonSettings(
+                mode="health_check",
+                row_count=RowCountSettings(enabled=True),
+                schema=SchemaSettings(enabled=True),
+                aggregates=AggregateSettings(enabled=False)
+            ),
+            tables=[
+                TableInfo(
+                    schema="dbo",
+                    name="TestTable",
+                    enabled=True,
+                    expected_in_target=True
+                )
+            ]
+        )
+        assert len(config.tables) == 1
+        assert config.tables[0].name == "TestTable"
--- a/tests/test_models.py
+++ b/tests/test_models.py
@@ -0,0 +1,186 @@
+"""
+Unit tests for data models
+"""
+import pytest
+from drt.models.enums import Status, CheckType
+from drt.models.table import TableInfo, ColumnInfo
+from drt.models.results import CheckResult, ComparisonResult
+
+
+class TestStatus:
+    """Test Status enum"""
+    
+    def test_status_values(self):
+        """Test status enum values"""
+        assert Status.PASS.value == "PASS"
+        assert Status.FAIL.value == "FAIL"
+        assert Status.WARNING.value == "WARNING"
+        assert Status.ERROR.value == "ERROR"
+        assert Status.INFO.value == "INFO"
+        assert Status.SKIP.value == "SKIP"
+    
+    def test_status_severity(self):
+        """Test status severity comparison"""
+        assert Status.FAIL.severity > Status.WARNING.severity
+        assert Status.WARNING.severity > Status.PASS.severity
+        assert Status.ERROR.severity > Status.FAIL.severity
+
+
+class TestCheckType:
+    """Test CheckType enum"""
+    
+    def test_check_type_values(self):
+        """Test check type enum values"""
+        assert CheckType.TABLE_EXISTENCE.value == "TABLE_EXISTENCE"
+        assert CheckType.ROW_COUNT.value == "ROW_COUNT"
+        assert CheckType.SCHEMA.value == "SCHEMA"
+        assert CheckType.AGGREGATE.value == "AGGREGATE"
+
+
+class TestTableInfo:
+    """Test TableInfo model"""
+    
+    def test_table_info_creation(self):
+        """Test creating a TableInfo instance"""
+        table = TableInfo(
+            schema="dbo",
+            name="TestTable",
+            enabled=True,
+            expected_in_target=True
+        )
+        assert table.schema == "dbo"
+        assert table.name == "TestTable"
+        assert table.enabled is True
+        assert table.expected_in_target is True
+        assert table.aggregate_columns == []
+    
+    def test_table_info_with_aggregates(self):
+        """Test TableInfo with aggregate columns"""
+        table = TableInfo(
+            schema="dbo",
+            name="FactSales",
+            enabled=True,
+            expected_in_target=True,
+            aggregate_columns=["Amount", "Quantity"]
+        )
+        assert len(table.aggregate_columns) == 2
+        assert "Amount" in table.aggregate_columns
+
+
+class TestColumnInfo:
+    """Test ColumnInfo model"""
+    
+    def test_column_info_creation(self):
+        """Test creating a ColumnInfo instance"""
+        column = ColumnInfo(
+            name="CustomerID",
+            data_type="int",
+            is_nullable=False,
+            is_primary_key=True
+        )
+        assert column.name == "CustomerID"
+        assert column.data_type == "int"
+        assert column.is_nullable is False
+        assert column.is_primary_key is True
+
+
+class TestCheckResult:
+    """Test CheckResult model"""
+    
+    def test_check_result_pass(self):
+        """Test creating a passing check result"""
+        result = CheckResult(
+            check_type=CheckType.ROW_COUNT,
+            status=Status.PASS,
+            message="Row counts match",
+            baseline_value=1000,
+            target_value=1000
+        )
+        assert result.status == Status.PASS
+        assert result.baseline_value == 1000
+        assert result.target_value == 1000
+    
+    def test_check_result_fail(self):
+        """Test creating a failing check result"""
+        result = CheckResult(
+            check_type=CheckType.ROW_COUNT,
+            status=Status.FAIL,
+            message="Row count mismatch",
+            baseline_value=1000,
+            target_value=950
+        )
+        assert result.status == Status.FAIL
+        assert result.baseline_value != result.target_value
+
+
+class TestComparisonResult:
+    """Test ComparisonResult model"""
+    
+    def test_comparison_result_creation(self):
+        """Test creating a ComparisonResult instance"""
+        result = ComparisonResult(
+            schema="dbo",
+            table="TestTable"
+        )
+        assert result.schema == "dbo"
+        assert result.table == "TestTable"
+        assert len(result.checks) == 0
+    
+    def test_add_check_result(self):
+        """Test adding check results"""
+        comparison = ComparisonResult(
+            schema="dbo",
+            table="TestTable"
+        )
+        
+        check = CheckResult(
+            check_type=CheckType.ROW_COUNT,
+            status=Status.PASS,
+            message="Row counts match"
+        )
+        
+        comparison.checks.append(check)
+        assert len(comparison.checks) == 1
+        assert comparison.checks[0].status == Status.PASS
+    
+    def test_overall_status_all_pass(self):
+        """Test overall status when all checks pass"""
+        comparison = ComparisonResult(
+            schema="dbo",
+            table="TestTable"
+        )
+        
+        comparison.checks.append(CheckResult(
+            check_type=CheckType.TABLE_EXISTENCE,
+            status=Status.PASS,
+            message="Table exists"
+        ))
+        
+        comparison.checks.append(CheckResult(
+            check_type=CheckType.ROW_COUNT,
+            status=Status.PASS,
+            message="Row counts match"
+        ))
+        
+        assert comparison.overall_status == Status.PASS
+    
+    def test_overall_status_with_failure(self):
+        """Test overall status when one check fails"""
+        comparison = ComparisonResult(
+            schema="dbo",
+            table="TestTable"
+        )
+        
+        comparison.checks.append(CheckResult(
+            check_type=CheckType.TABLE_EXISTENCE,
+            status=Status.PASS,
+            message="Table exists"
+        ))
+        
+        comparison.checks.append(CheckResult(
+            check_type=CheckType.ROW_COUNT,
+            status=Status.FAIL,
+            message="Row count mismatch"
+        ))
+        
+        assert comparison.overall_status == Status.FAIL
--- a/tests/test_utils.py
+++ b/tests/test_utils.py
@@ -0,0 +1,83 @@
+"""
+Unit tests for utility functions
+"""
+import pytest
+from datetime import datetime
+from drt.utils.timestamps import format_timestamp, format_duration
+from drt.utils.patterns import matches_pattern
+
+
+class TestTimestamps:
+    """Test timestamp utilities"""
+    
+    def test_format_timestamp(self):
+        """Test timestamp formatting"""
+        dt = datetime(2024, 1, 15, 14, 30, 45)
+        formatted = format_timestamp(dt)
+        assert formatted == "20240115_143045"
+    
+    def test_format_timestamp_current(self):
+        """Test formatting current timestamp"""
+        formatted = format_timestamp()
+        # Should be in YYYYMMDD_HHMMSS format
+        assert len(formatted) == 15
+        assert formatted[8] == "_"
+    
+    def test_format_duration_seconds(self):
+        """Test duration formatting for seconds"""
+        duration = format_duration(45.5)
+        assert duration == "45.50s"
+    
+    def test_format_duration_minutes(self):
+        """Test duration formatting for minutes"""
+        duration = format_duration(125.0)
+        assert duration == "2m 5.00s"
+    
+    def test_format_duration_hours(self):
+        """Test duration formatting for hours"""
+        duration = format_duration(3725.0)
+        assert duration == "1h 2m 5.00s"
+
+
+class TestPatterns:
+    """Test pattern matching utilities"""
+    
+    def test_exact_match(self):
+        """Test exact pattern matching"""
+        assert matches_pattern("TestTable", "TestTable") is True
+        assert matches_pattern("TestTable", "OtherTable") is False
+    
+    def test_wildcard_star(self):
+        """Test wildcard * pattern"""
+        assert matches_pattern("TestTable", "Test*") is True
+        assert matches_pattern("TestTable", "*Table") is True
+        assert matches_pattern("TestTable", "*est*") is True
+        assert matches_pattern("TestTable", "Other*") is False
+    
+    def test_wildcard_question(self):
+        """Test wildcard ? pattern"""
+        assert matches_pattern("Test1", "Test?") is True
+        assert matches_pattern("TestA", "Test?") is True
+        assert matches_pattern("Test12", "Test?") is False
+        assert matches_pattern("Test", "Test?") is False
+    
+    def test_combined_wildcards(self):
+        """Test combined wildcard patterns"""
+        assert matches_pattern("Test_Table_01", "Test_*_??") is True
+        assert matches_pattern("Test_Table_1", "Test_*_??") is False
+    
+    def test_case_sensitivity(self):
+        """Test case-sensitive matching"""
+        assert matches_pattern("TestTable", "testtable") is False
+        assert matches_pattern("TestTable", "TestTable") is True
+    
+    def test_empty_pattern(self):
+        """Test empty pattern"""
+        assert matches_pattern("TestTable", "") is False
+        assert matches_pattern("", "") is True
+    
+    def test_special_characters(self):
+        """Test patterns with special characters"""
+        assert matches_pattern("Test.Table", "Test.Table") is True
+        assert matches_pattern("Test_Table", "Test_*") is True
+        assert matches_pattern("Test-Table", "Test-*") is True