Data Collection Methodology

Learn how Tracenable collects, standardizes, and validates corporate energ data through a five-step human-in-the-loop methodology, with links to detailed subpages on sources, standardization, and QAs.

Introduction

The value of energy data lies not just in its availability, but in its clarity, comparability, and traceability. At Tracenable, we designed a data collection methodology that combines rigorous research, comprehensive sourcing, and advanced human–AI workflows to produce corporate energy metrics that are both granular and broadly applicable.

Our approach is built around four principles: define with authority, collect comprehensively, standardize precisely, and validate rigorously.


Our Five-Step Energy Data Collection Approach

1

Defining the Schema through Research

We start by grounding our work in foundational references such as the UN International Recommendations for Energy Statistics (IRES) & Standard International Energy Classification (SIEC). From there, we study voluntary frameworks like GRI 302, ESRS E1, and SASB to understand disclosure expectations.

This theoretical research is paired with empirical research: analyzing how companies actually report energy data in practice across industries and regions. By combining both, we design a data schema that strikes the right balance: as granular as possible, but general enough to apply across thousands of companies worldwide.

2

Comprehensive Collection of Disclosures

Corporate energy data can appear in many places: sustainability reports, regulatory filings, standalone data spreadsheets, or hidden on a webpage deep in a company’s site. Our infrastructure is designed to capture all of it.

Through automated web monitoring and targeted expert retrieval, we ensure that no disclosure is overlooked. This comprehensive approach minimizes blind spots and provides the broadest possible coverage of corporate energy data globally.

3

Converting Disclosures into Structured Data

Energy data disclosures come in many formats: PDFs, Excel annexes, HTML tables, and narrative text. Our AI-driven pipelines first convert raw files into a unified structure (e.g., PDF to markdown).

From there:

  • Computer vision parses tables and figures.

  • NLP models identify energy-related passages, detect units, and extract values.

  • Classification rules map energy data across four dimensions: source, energy type, flow type, and renewability

The result: machine-readable, standardized data points that preserve traceability to the original disclosure.

4

Data Human-in-the-Loop Validation

AI brings speed and scalability, but human expertise ensures accuracy and context. Each extracted data point is flagged with quality indicators, guiding our analysts in review. Two independent reviewers typically validate energy data, with arbitration applied where discrepancies remain.

This process allows us to:

  • Correct errors where AI misclassifies complex energy dimensions.

  • Preserve context from narrative disclosures.

  • Continuously improve our models through feedback.

The outcome is audit-grade energy data that users can trust.

5

Rigorous Quality Assurance

Finally, our Energy Dataset undergoes multi-layered quality checks:

  • Automated tests catch obvious anomalies (negative values, implausible spikes, inconsistent units).

  • Machine learning models detect statistical outliers through unsupervised methods and unusual time-series patterns.

  • Manual audits ensure nothing slips through the cracks.

This combination of automation and human oversight guarantees that every energy metric delivered is reliable, comparable, and ready for use in compliance, benchmarking, and research.


Learn More

To explore the methodology in detail, visit:

  • Data Sources - Where energy data comes from and how it is collected.

  • Standardization Guidelines - How disclosures are normalized into consistent energy sources, energy type, flow type, and renewability.

  • Calculation Logic - How missing values are inferred and totals are computed using transparent accounting rules.

  • Quality Assurance - The validations and controls that safeguard data integrity.