Files
energy-test-data/README.md
2026-02-10 23:42:37 +07:00

125 lines
4.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Energy Test Data
Preparation of test data for energy trading strategy demo.
## Overview
This project generates and processes realistic test data for energy trading strategies, including:
- **Electricity Prices**: Day-ahead and real-time market prices for European regions (FR, BE, DE, NL, UK)
- **Battery Capacity**: Storage system states with charge/discharge cycles
- **Renewable Generation**: Solar, wind, and hydro generation with forecast errors
- **Conventional Generation**: Gas, coal, and nuclear plant outputs
- **Load Profiles**: Regional electricity demand with weather correlations
- **Data Centers**: Power demand profiles including Bitcoin mining client
- **Mining**: Hashrate, price, and profitability data (from mempool.space)
## Project Structure
```
energy-test-data/
├── data/
│ ├── processed/ # Final Parquet files (<200MB total)
│ ├── raw/ # Unprocessed source data
│ └── metadata/ # Data documentation and reports
├── scripts/
│ ├── 01_generate_synthetic.py # Generate synthetic data
│ ├── 02_fetch_historical.py # Fetch historical data
│ ├── 03_process_merge.py # Process and compress
│ └── 04_validate.py # Validate and report
├── config/
│ ├── data_config.yaml # Configuration parameters
│ └── schema.yaml # Data schema definitions
├── requirements.txt
└── README.md
```
## Installation
```bash
pip install -r requirements.txt
```
## Usage
### Generate all test data
Run scripts in sequence:
```bash
python scripts/01_generate_synthetic.py
python scripts/02_fetch_historical.py
python scripts/03_process_merge.py
python scripts/04_validate.py
```
Or run all at once:
```bash
python scripts/01_generate_synthetic.py && \
python scripts/02_fetch_historical.py && \
python scripts/03_process_merge.py && \
python scripts/04_validate.py
```
### Individual scripts
**01_generate_synthetic.py**: Creates synthetic data for battery systems, renewable generation, conventional generation, and data centers.
**02_fetch_historical.py**: Fetches electricity prices, mining data, and load profiles from public APIs (or generates realistic synthetic data when APIs are unavailable).
**03_process_merge.py**: Merges datasets, optimizes memory usage, and saves to compressed Parquet format.
**04_validate.py**: Validates data quality, checks for missing values and outliers, and generates validation reports.
## Configuration
Edit `config/data_config.yaml` to customize:
- **Time range**: Start/end dates and granularity
- **Regions**: Market regions to include
- **Data sources**: Synthetic vs historical for each dataset
- **Generation parameters**: Noise levels, outlier rates, missing value rates
- **Battery parameters**: Capacity ranges, efficiency, degradation
- **Generation parameters**: Plant capacities, marginal costs
- **Mining parameters**: Hashrate ranges, mining efficiency
## Data Specifications
| Dataset | Time Range | Rows (10d × 1min) | Est. Size |
|---------|-----------|-------------------|-----------|
| electricity_prices | 10 days | 72,000 | ~40MB |
| battery_capacity | 10 days | 144,000 | ~20MB |
| renewable_generation | 10 days | 216,000 | ~35MB |
| conventional_generation | 10 days | 144,000 | ~25MB |
| load_profiles | 10 days | 72,000 | ~30MB |
| data_centers | 10 days | 72,000 | ~15MB |
| mining | 10 days | 14,400 | ~20MB |
| **Total** | | | **~185MB** |
## Output Format
All processed datasets are saved as Parquet files with Snappy compression in `data/processed/`.
To read a dataset:
```python
import pandas as pd
df = pd.read_parquet('data/processed/electricity_prices.parquet')
print(df.head())
```
## Data Sources
- **Electricity Prices**: Hybrid (synthetic patterns based on EPEX Spot market characteristics)
- **Mining**: Hybrid (mempool.space API + synthetic patterns)
- **Load Profiles**: Hybrid (ENTSO-E transparency platform patterns + synthetic)
## Validation Reports
After processing, validation reports are generated in `data/metadata/`:
- `validation_report.json`: Data quality checks, missing values, range violations
- `final_metadata.json`: Dataset sizes, row counts, processing details