kbt-devops bdda04705f Update README for transmission datasets and mining data changes
- Add Transmission Capacity and Transmission Cost to overview
- Update mining description to reflect EUR pricing and power metrics
- Update script descriptions to include transmission data
- Add transmission parameters to configuration section
- Update data specifications table with actual values and 2 new datasets
2026-02-11 01:21:39 +07:00

Energy Test Data

Preparation of test data for energy trading strategy demo.

Overview

This project generates and processes realistic test data for energy trading strategies, including:

  • Electricity Prices: Day-ahead and real-time market prices for European regions (FR, BE, DE, NL, UK)
  • Battery Capacity: Storage system states with charge/discharge cycles
  • Renewable Generation: Solar, wind, and hydro generation with forecast errors
  • Conventional Generation: Gas, coal, and nuclear plant outputs
  • Load Profiles: Regional electricity demand with weather correlations
  • Data Centers: Power demand profiles including mining client
  • Mining: Hashrate, price (EUR), power efficiency, demand, revenue, and profit per MWh
  • Transmission Capacity: Region-to-region interconnector limits and efficiency
  • Transmission Cost: Transmission costs including losses, congestion charges, and fees

Project Structure

energy-test-data/
├── data/
│   ├── processed/              # Final Parquet files (<200MB total)
│   ├── raw/                    # Unprocessed source data
│   └── metadata/               # Data documentation and reports
├── scripts/
│   ├── 01_generate_synthetic.py    # Generate synthetic data
│   ├── 02_fetch_historical.py      # Fetch historical data
│   ├── 03_process_merge.py         # Process and compress
│   └── 04_validate.py              # Validate and report
├── config/
│   ├── data_config.yaml            # Configuration parameters
│   └── schema.yaml                 # Data schema definitions
├── requirements.txt
└── README.md

Installation

pip install -r requirements.txt

Usage

Generate all test data

Run scripts in sequence:

python scripts/01_generate_synthetic.py
python scripts/02_fetch_historical.py
python scripts/03_process_merge.py
python scripts/04_validate.py

Or run all at once:

python scripts/01_generate_synthetic.py && \
python scripts/02_fetch_historical.py && \
python scripts/03_process_merge.py && \
python scripts/04_validate.py

Individual scripts

01_generate_synthetic.py: Creates synthetic data for battery systems, renewable generation, conventional generation, data centers, and transmission capacity/cost.

02_fetch_historical.py: Fetches electricity prices, mining data (with EUR pricing and power metrics), and load profiles from public APIs (or generates realistic synthetic data when APIs are unavailable).

03_process_merge.py: Merges datasets, optimizes memory usage, and saves to compressed Parquet format.

04_validate.py: Validates data quality, checks for missing values and outliers, and generates validation reports.

Configuration

Edit config/data_config.yaml to customize:

  • Time range: Start/end dates and granularity
  • Regions: Market regions to include
  • Data sources: Synthetic vs historical for each dataset
  • Generation parameters: Noise levels, outlier rates, missing value rates
  • Battery parameters: Capacity ranges, efficiency, degradation
  • Generation parameters: Plant capacities, marginal costs
  • Mining parameters: Hashrate ranges, power efficiency
  • Transmission parameters: Capacity ranges, efficiency, congestion surcharges, fees

Data Specifications

Dataset Rows Actual Size
electricity_prices 72,005 ~2.0 MB
battery_capacity 144,010 ~4.0 MB
renewable_generation 216,015 ~5.4 MB
conventional_generation 144,010 ~3.0 MB
load_profiles 72,005 ~1.7 MB
data_centers 72,005 ~1.0 MB
mining 14,401 ~0.5 MB
transmission_capacity 20 ~0.01 MB
transmission_cost 20 ~0.01 MB
Total 734,491 ~17.9 MB

Output Format

All processed datasets are saved as Parquet files with Snappy compression in data/processed/.

To read a dataset:

import pandas as pd

df = pd.read_parquet('data/processed/electricity_prices.parquet')
print(df.head())

Data Sources

  • Electricity Prices: Hybrid (synthetic patterns based on EPEX Spot market characteristics)
  • Mining: Hybrid (mempool.space API + synthetic patterns)
  • Load Profiles: Hybrid (ENTSO-E transparency platform patterns + synthetic)

Validation Reports

After processing, validation reports are generated in data/metadata/:

  • validation_report.json: Data quality checks, missing values, range violations
  • final_metadata.json: Dataset sizes, row counts, processing details
Description
No description provided
Readme 70 KiB
Languages
Python 100%