# Energy Test Data Preparation of test data for energy trading strategy demo. ## Overview This project generates and processes realistic test data for energy trading strategies, including: - **Electricity Prices**: Day-ahead and real-time market prices for European regions (FR, BE, DE, NL, UK) - **Battery Capacity**: Storage system states with charge/discharge cycles - **Renewable Generation**: Solar, wind, and hydro generation with forecast errors - **Conventional Generation**: Gas, coal, and nuclear plant outputs - **Load Profiles**: Regional electricity demand with weather correlations - **Data Centers**: Power demand profiles including mining client - **Mining**: Hashrate, price (EUR), power efficiency, demand, revenue, and profit per MWh - **Transmission Capacity**: Region-to-region interconnector limits and efficiency - **Transmission Cost**: Transmission costs including losses, congestion charges, and fees ## Project Structure ``` energy-test-data/ ├── data/ │ ├── processed/ # Final Parquet files (<200MB total) │ ├── raw/ # Unprocessed source data │ └── metadata/ # Data documentation and reports ├── scripts/ │ ├── 01_generate_synthetic.py # Generate synthetic data │ ├── 02_fetch_historical.py # Fetch historical data │ ├── 03_process_merge.py # Process and compress │ └── 04_validate.py # Validate and report ├── config/ │ ├── data_config.yaml # Configuration parameters │ └── schema.yaml # Data schema definitions ├── requirements.txt └── README.md ``` ## Installation ```bash pip install -r requirements.txt ``` ## Usage ### Generate all test data Run scripts in sequence: ```bash python scripts/01_generate_synthetic.py python scripts/02_fetch_historical.py python scripts/03_process_merge.py python scripts/04_validate.py ``` Or run all at once: ```bash python scripts/01_generate_synthetic.py && \ python scripts/02_fetch_historical.py && \ python scripts/03_process_merge.py && \ python scripts/04_validate.py ``` ### Individual scripts **01_generate_synthetic.py**: Creates synthetic data for battery systems, renewable generation, conventional generation, data centers, and transmission capacity/cost. **02_fetch_historical.py**: Fetches electricity prices, mining data (with EUR pricing and power metrics), and load profiles from public APIs (or generates realistic synthetic data when APIs are unavailable). **03_process_merge.py**: Merges datasets, optimizes memory usage, and saves to compressed Parquet format. **04_validate.py**: Validates data quality, checks for missing values and outliers, and generates validation reports. ## Configuration Edit `config/data_config.yaml` to customize: - **Time range**: Start/end dates and granularity - **Regions**: Market regions to include - **Data sources**: Synthetic vs historical for each dataset - **Generation parameters**: Noise levels, outlier rates, missing value rates - **Battery parameters**: Capacity ranges, efficiency, degradation - **Generation parameters**: Plant capacities, marginal costs - **Mining parameters**: Hashrate ranges, power efficiency - **Transmission parameters**: Capacity ranges, efficiency, congestion surcharges, fees ## Data Specifications | Dataset | Rows | Actual Size | |---------|------|-------------| | electricity_prices | 72,005 | ~2.0 MB | | battery_capacity | 144,010 | ~4.0 MB | | renewable_generation | 216,015 | ~5.4 MB | | conventional_generation | 144,010 | ~3.0 MB | | load_profiles | 72,005 | ~1.7 MB | | data_centers | 72,005 | ~1.0 MB | | mining | 14,401 | ~0.5 MB | | transmission_capacity | 20 | ~0.01 MB | | transmission_cost | 20 | ~0.01 MB | | **Total** | **734,491** | **~17.9 MB** | ## Output Format All processed datasets are saved as Parquet files with Snappy compression in `data/processed/`. To read a dataset: ```python import pandas as pd df = pd.read_parquet('data/processed/electricity_prices.parquet') print(df.head()) ``` ## Data Sources - **Electricity Prices**: Hybrid (synthetic patterns based on EPEX Spot market characteristics) - **Mining**: Hybrid (mempool.space API + synthetic patterns) - **Load Profiles**: Hybrid (ENTSO-E transparency platform patterns + synthetic) ## Validation Reports After processing, validation reports are generated in `data/metadata/`: - `validation_report.json`: Data quality checks, missing values, range violations - `final_metadata.json`: Dataset sizes, row counts, processing details