As part of tasks T2.2 and T2.4 in the project transpAIrent.energy, we developed an open-source data pipeline for collecting, preprocessing, and quality-checking energy system data. The pipeline continuously ingests data from public APIs, processes it into a standardized bi-temporal database, and applies automated quality validation. All components - source code, pipeline configurations, and data quality results - are published under a permissive license.
The data pipeline is split into three independently deployable components that together cover the full path from raw API responses to quality-assured, analysis-ready time series. Data is fetched on a fixed schedule, stored as immutable event logs, validated against physical and contextual bounds, and finally consolidated into a bi-temporal history using Dagster as the pipeline orchestrator. A web-based dashboard exposes quality metrics and data lineage to support transparency and reproducibility.
Historian#
Manages continuous retrieval of input data from energy-related APIs (APG, ENTSO-E, electricitymaps, UBIMET). Scheduled jobs fetch data and store it in a PostgreSQL database, feeding downstream ML/AI tasks and IESopt.jl optimizations.
Check online at: github.com/transpAIrent-energy/historian
Auditor#
Web-based frontend and backend for visualizing data quality based on the bi-temporal librarian database. Provides an interface to inspect and monitor the integrity of collected energy data.
Check online at: github.com/transpAIrent-energy/auditor
Librarian#
Manages data quality of records collected by the Historian. Runs quality checks on fetched data and maintains a bi-temporal database pipeline (fact → history) using Dagster for orchestration.
Check online at: github.com/transpAIrent-energy/librarian