Add parallel checker execution with connection pooling

Implements Level 2 parallelization for row_count, schema, and
aggregate checkers, improving performance by 2-3x for tables with
multiple enabled checks.

Changes:
- Add max_workers config option (default: 4)
- Add ConnectionPool module with SQLAlchemy QueuePool
- Add URL encoding for connection strings
- Implement parallel checker execution with ThreadPoolExecutor
- Add fail-fast behavior on checker errors
- Update executor for SQLAlchemy 2.0 compatibility
- Fix engine disposal resource leak
- Cache pooled engines in ConnectionManager
- Add disconnect() cleanup for pooled engines

Performance:
- Sequential: 3 checkers × 100ms = 300ms
- Parallel: 3 checkers ≈ 100ms (2-3x speedup)

Configuration:
  execution:
    max_workers: 4  # Controls parallel checker execution
    continue_on_error: true
This commit is contained in:
DevOps Team
2026-02-11 21:46:10 +07:00
parent f5b190c91d
commit 40bc615bf7
8 changed files with 643 additions and 47 deletions

View File

@@ -195,22 +195,19 @@ logging:
# Configure execution behavior
# ============================================================================
execution:
# Parallel execution (future feature)
parallel:
enabled: false
max_workers: 4
# Continue execution even if a table check fails
continue_on_error: true
# Maximum number of parallel workers for checker execution
# Higher values = more parallel execution, but more database connections
# Recommended: 4 for most scenarios, 8 for high-performance servers
# Connection pool size = max_workers + 2
max_workers: 4
# Retry settings for transient failures
retry:
enabled: true
max_attempts: 3
attempts: 3
delay_seconds: 5
# Performance settings
performance:
batch_size: 1000 # Rows per batch for large queries
use_nolock: true # Use NOLOCK hints (read uncommitted)
connection_pooling: true
# ============================================================================
# FILTERS