> For the complete documentation index, see [llms.txt](https://docs.tracenable.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.tracenable.com/eu-taxonomy/data-collection-methodology.md).

# Data Collection Methodology

## Introduction

The value of EU Taxonomy data lies not just in its availability, but in its clarity, comparability, and traceability. At Tracenable, we designed a data collection methodology that combines rigorous research, comprehensive sourcing, and advanced human–AI workflows to produce EU Taxonomy metrics that are both granular and broadly applicable.

Our approach is built around four principles: define with authority, collect comprehensively, standardize precisely, and validate rigorously.

***

## Our Five-Step Data Collection Approach

{% stepper %}
{% step %}

### Defining the Schema through Research

Our EU Taxonomy data is meticulously standardized to align with the Commission Delegated Regulation (EU) 2023/2486, capturing both activity-level details and aggregate metrics across turnover, CAPEX, and OPEX. This rigorous standardization ensures seamless regulatory compliance and cross-company comparability.
{% endstep %}

{% step %}

### Comprehensive Collection of Disclosures

EU Taxonomy data can appear in many places: annual reports, sustainability reports, regulatory filings (e.g., SFDR/CSRD templates), investor presentations, standalone data spreadsheets, or hidden on a webpage deep in a company’s site.&#x20;

Our infrastructure is designed to capture all of it. Through automated web monitoring and targeted expert retrieval, we ensure that no disclosure is overlooked. This comprehensive approach minimizes blind spots and provides the broadest possible coverage of EU Taxonomy reporting worldwide.
{% endstep %}

{% step %}

### Converting Disclosures into Structured Data

EU Taxonomy disclosures vary widely in format: from PDFs and Excel annexes to HTML tables or embedded text within narrative sections. Our AI-driven pipelines convert these raw files into a unified, machine-readable structure (e.g., PDF to markdown).

From there:

* Computer vision extracts and parses tables and figures.
* NLP models detect taxonomy-related passages, extract KPI values, and identify eligibility classifications.
* Classification rules map disclosures into aligned, eligible but not aligned, non-eligible, and combined categories under each KPI.

The result: machine-readable, standardized data points that preserve traceability to the original disclosure.
{% endstep %}

{% step %}

### Data Human-in-the-Loop Validation

AI brings speed and scalability, but human expertise ensures accuracy and context. Each extracted data point is flagged with quality indicators, guiding our analysts in review. Two independent reviewers typically validate taxonomy data, with arbitration applied where discrepancies remain.

This process allows us to:

* Correct AI misclassifications when disclosures are complex or ambiguous.
* Preserve context from narrative disclosures.
* Continuously improve our models through feedback.

The outcome is audit-grade EU Taxonomy data that users can trust.
{% endstep %}

{% step %}

### Rigorous Quality Assurance

Finally, the EU Taxonomy dataset undergoes multi-layered quality checks:

* Automated tests flag anomalies (e.g., KPIs not summing correctly, negative percentages, implausible trends).
* Machine learning models detect outliers across time series and peer groups.
* Manual audits ensure completeness and resolve edge cases.

This combination of automation and human oversight guarantees that every metric delivered is reliable, comparable, and decision-ready for use in compliance, benchmarking, and research.
{% endstep %}
{% endstepper %}

***

## Learn More

To explore the methodology in detail, visit:

* [**Data Sources**](/eu-taxonomy/data-collection-methodology/data-sources.md) – Where EU Taxonomy disclosures come from and how they are collected.
* [**Standardization Guidelines** ](/eu-taxonomy/data-collection-methodology/standardization-guidelines.md)– How activities, KPIs, and eligibility categories are normalized for consistency.
* [**Calculation Logic**](/eu-taxonomy/data-collection-methodology/calculation-logic-up-to-2025.md) – How derived values are computed using transparent rules.
* [**Quality Assurance**](/eu-taxonomy/data-collection-methodology/quality-assurance.md) - The validations and controls that safeguard data integrity.

***


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.tracenable.com/eu-taxonomy/data-collection-methodology.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.