- Add Transmission Capacity and Transmission Cost to overview - Update mining description to reflect EUR pricing and power metrics - Update script descriptions to include transmission data - Add transmission parameters to configuration section - Update data specifications table with actual values and 2 new datasets
130 lines
4.5 KiB
Markdown
130 lines
4.5 KiB
Markdown
# Energy Test Data
|
|
|
|
Preparation of test data for energy trading strategy demo.
|
|
|
|
## Overview
|
|
|
|
This project generates and processes realistic test data for energy trading strategies, including:
|
|
|
|
- **Electricity Prices**: Day-ahead and real-time market prices for European regions (FR, BE, DE, NL, UK)
|
|
- **Battery Capacity**: Storage system states with charge/discharge cycles
|
|
- **Renewable Generation**: Solar, wind, and hydro generation with forecast errors
|
|
- **Conventional Generation**: Gas, coal, and nuclear plant outputs
|
|
- **Load Profiles**: Regional electricity demand with weather correlations
|
|
- **Data Centers**: Power demand profiles including mining client
|
|
- **Mining**: Hashrate, price (EUR), power efficiency, demand, revenue, and profit per MWh
|
|
- **Transmission Capacity**: Region-to-region interconnector limits and efficiency
|
|
- **Transmission Cost**: Transmission costs including losses, congestion charges, and fees
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
energy-test-data/
|
|
├── data/
|
|
│ ├── processed/ # Final Parquet files (<200MB total)
|
|
│ ├── raw/ # Unprocessed source data
|
|
│ └── metadata/ # Data documentation and reports
|
|
├── scripts/
|
|
│ ├── 01_generate_synthetic.py # Generate synthetic data
|
|
│ ├── 02_fetch_historical.py # Fetch historical data
|
|
│ ├── 03_process_merge.py # Process and compress
|
|
│ └── 04_validate.py # Validate and report
|
|
├── config/
|
|
│ ├── data_config.yaml # Configuration parameters
|
|
│ └── schema.yaml # Data schema definitions
|
|
├── requirements.txt
|
|
└── README.md
|
|
```
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Generate all test data
|
|
|
|
Run scripts in sequence:
|
|
|
|
```bash
|
|
python scripts/01_generate_synthetic.py
|
|
python scripts/02_fetch_historical.py
|
|
python scripts/03_process_merge.py
|
|
python scripts/04_validate.py
|
|
```
|
|
|
|
Or run all at once:
|
|
|
|
```bash
|
|
python scripts/01_generate_synthetic.py && \
|
|
python scripts/02_fetch_historical.py && \
|
|
python scripts/03_process_merge.py && \
|
|
python scripts/04_validate.py
|
|
```
|
|
|
|
### Individual scripts
|
|
|
|
**01_generate_synthetic.py**: Creates synthetic data for battery systems, renewable generation, conventional generation, data centers, and transmission capacity/cost.
|
|
|
|
**02_fetch_historical.py**: Fetches electricity prices, mining data (with EUR pricing and power metrics), and load profiles from public APIs (or generates realistic synthetic data when APIs are unavailable).
|
|
|
|
**03_process_merge.py**: Merges datasets, optimizes memory usage, and saves to compressed Parquet format.
|
|
|
|
**04_validate.py**: Validates data quality, checks for missing values and outliers, and generates validation reports.
|
|
|
|
## Configuration
|
|
|
|
Edit `config/data_config.yaml` to customize:
|
|
|
|
- **Time range**: Start/end dates and granularity
|
|
- **Regions**: Market regions to include
|
|
- **Data sources**: Synthetic vs historical for each dataset
|
|
- **Generation parameters**: Noise levels, outlier rates, missing value rates
|
|
- **Battery parameters**: Capacity ranges, efficiency, degradation
|
|
- **Generation parameters**: Plant capacities, marginal costs
|
|
- **Mining parameters**: Hashrate ranges, power efficiency
|
|
- **Transmission parameters**: Capacity ranges, efficiency, congestion surcharges, fees
|
|
|
|
## Data Specifications
|
|
|
|
| Dataset | Rows | Actual Size |
|
|
|---------|------|-------------|
|
|
| electricity_prices | 72,005 | ~2.0 MB |
|
|
| battery_capacity | 144,010 | ~4.0 MB |
|
|
| renewable_generation | 216,015 | ~5.4 MB |
|
|
| conventional_generation | 144,010 | ~3.0 MB |
|
|
| load_profiles | 72,005 | ~1.7 MB |
|
|
| data_centers | 72,005 | ~1.0 MB |
|
|
| mining | 14,401 | ~0.5 MB |
|
|
| transmission_capacity | 20 | ~0.01 MB |
|
|
| transmission_cost | 20 | ~0.01 MB |
|
|
| **Total** | **734,491** | **~17.9 MB** |
|
|
|
|
## Output Format
|
|
|
|
All processed datasets are saved as Parquet files with Snappy compression in `data/processed/`.
|
|
|
|
To read a dataset:
|
|
|
|
```python
|
|
import pandas as pd
|
|
|
|
df = pd.read_parquet('data/processed/electricity_prices.parquet')
|
|
print(df.head())
|
|
```
|
|
|
|
## Data Sources
|
|
|
|
- **Electricity Prices**: Hybrid (synthetic patterns based on EPEX Spot market characteristics)
|
|
- **Mining**: Hybrid (mempool.space API + synthetic patterns)
|
|
- **Load Profiles**: Hybrid (ENTSO-E transparency platform patterns + synthetic)
|
|
|
|
## Validation Reports
|
|
|
|
After processing, validation reports are generated in `data/metadata/`:
|
|
|
|
- `validation_report.json`: Data quality checks, missing values, range violations
|
|
- `final_metadata.json`: Dataset sizes, row counts, processing details
|