- Add Transmission Capacity and Transmission Cost to overview - Update mining description to reflect EUR pricing and power metrics - Update script descriptions to include transmission data - Add transmission parameters to configuration section - Update data specifications table with actual values and 2 new datasets
Energy Test Data
Preparation of test data for energy trading strategy demo.
Overview
This project generates and processes realistic test data for energy trading strategies, including:
- Electricity Prices: Day-ahead and real-time market prices for European regions (FR, BE, DE, NL, UK)
- Battery Capacity: Storage system states with charge/discharge cycles
- Renewable Generation: Solar, wind, and hydro generation with forecast errors
- Conventional Generation: Gas, coal, and nuclear plant outputs
- Load Profiles: Regional electricity demand with weather correlations
- Data Centers: Power demand profiles including mining client
- Mining: Hashrate, price (EUR), power efficiency, demand, revenue, and profit per MWh
- Transmission Capacity: Region-to-region interconnector limits and efficiency
- Transmission Cost: Transmission costs including losses, congestion charges, and fees
Project Structure
energy-test-data/
├── data/
│ ├── processed/ # Final Parquet files (<200MB total)
│ ├── raw/ # Unprocessed source data
│ └── metadata/ # Data documentation and reports
├── scripts/
│ ├── 01_generate_synthetic.py # Generate synthetic data
│ ├── 02_fetch_historical.py # Fetch historical data
│ ├── 03_process_merge.py # Process and compress
│ └── 04_validate.py # Validate and report
├── config/
│ ├── data_config.yaml # Configuration parameters
│ └── schema.yaml # Data schema definitions
├── requirements.txt
└── README.md
Installation
pip install -r requirements.txt
Usage
Generate all test data
Run scripts in sequence:
python scripts/01_generate_synthetic.py
python scripts/02_fetch_historical.py
python scripts/03_process_merge.py
python scripts/04_validate.py
Or run all at once:
python scripts/01_generate_synthetic.py && \
python scripts/02_fetch_historical.py && \
python scripts/03_process_merge.py && \
python scripts/04_validate.py
Individual scripts
01_generate_synthetic.py: Creates synthetic data for battery systems, renewable generation, conventional generation, data centers, and transmission capacity/cost.
02_fetch_historical.py: Fetches electricity prices, mining data (with EUR pricing and power metrics), and load profiles from public APIs (or generates realistic synthetic data when APIs are unavailable).
03_process_merge.py: Merges datasets, optimizes memory usage, and saves to compressed Parquet format.
04_validate.py: Validates data quality, checks for missing values and outliers, and generates validation reports.
Configuration
Edit config/data_config.yaml to customize:
- Time range: Start/end dates and granularity
- Regions: Market regions to include
- Data sources: Synthetic vs historical for each dataset
- Generation parameters: Noise levels, outlier rates, missing value rates
- Battery parameters: Capacity ranges, efficiency, degradation
- Generation parameters: Plant capacities, marginal costs
- Mining parameters: Hashrate ranges, power efficiency
- Transmission parameters: Capacity ranges, efficiency, congestion surcharges, fees
Data Specifications
| Dataset | Rows | Actual Size |
|---|---|---|
| electricity_prices | 72,005 | ~2.0 MB |
| battery_capacity | 144,010 | ~4.0 MB |
| renewable_generation | 216,015 | ~5.4 MB |
| conventional_generation | 144,010 | ~3.0 MB |
| load_profiles | 72,005 | ~1.7 MB |
| data_centers | 72,005 | ~1.0 MB |
| mining | 14,401 | ~0.5 MB |
| transmission_capacity | 20 | ~0.01 MB |
| transmission_cost | 20 | ~0.01 MB |
| Total | 734,491 | ~17.9 MB |
Output Format
All processed datasets are saved as Parquet files with Snappy compression in data/processed/.
To read a dataset:
import pandas as pd
df = pd.read_parquet('data/processed/electricity_prices.parquet')
print(df.head())
Data Sources
- Electricity Prices: Hybrid (synthetic patterns based on EPEX Spot market characteristics)
- Mining: Hybrid (mempool.space API + synthetic patterns)
- Load Profiles: Hybrid (ENTSO-E transparency platform patterns + synthetic)
Validation Reports
After processing, validation reports are generated in data/metadata/:
validation_report.json: Data quality checks, missing values, range violationsfinal_metadata.json: Dataset sizes, row counts, processing details