Fragmented visibility and processes across the data estate
Carrefour Brazil has a sophisticated data environment processing over 133 PB of data daily across 33K+ tables, 600+ Tableau dashboards, and 500+ data pipelines. The data governance team held a number of workshops with their supporting data teams, and identified a number of overlapping areas for improvement:
Lack of data discovery and catalog
Data practitioners didn’t have any visibility into what data was available, or what the “official” data source was across multiple sources. Even in a data source, it was unclear which columns could be used.
Poor data quality standardization
Data teams needed a better understanding of the quality of the data they were working with and processes for addressing quality issues. Stakeholders were being notified of issues late in the process, eroding trust.
Unclear data ownership
There was difficulty identifying who was responsible for different data assets, leading to delays in notifications and unclear accountability.
Lack of data lineage
With no clear way to track data flow and dependencies across systems, it was difficult to understand common data between domains or understand how data transformations affected downstream processes across their 40+ GCP projects.
No common language and documentation on data
Lack of standardized glossary and inconsistent data definitions hindered effective communication and collaboration. There was no systematic way to share and maintain data knowledge of the business rules in the data across ingestion and consumption.

A unified, open source metadata platform
To address these challenges, Carrefour Brazil implemented OpenMetadata in multiple waves to establish a single source of truth for data across the entire organization. A “Datathon” Hackathon held was to catalog tables and dashboards, with additional workshops with data engineering and business teams.
Comprehensive metadata cataloging
Unified cataloging of tables, dashboards, and data assets with automated ingestion.
Data quality framework
Implemented multiple quality dimensions including update frequency, uniqueness, nullity, consistency, and accuracy checks.
Quality certification seals
Implemented Bronze, Silver, and Gold certification levels for data assets, with standardized criteria upon ingest to certify quality.
Data governance structure
Established 20+ domains and 30+ data owners with clear responsibilities and improved organization.
Automated notification workflows
Integration with JIRA for issue tracking and email notifications for data owners.
Knowledge management and glossary
Created 300+ glossary terms defining business concepts and rules to align how data was used across teams.

A roadmap for further investment and improvements
The success that Carrefour Brasil has achieved with their implementation was showcased at Carrefour’s world meeting with other global subsidiaries from Poland, France, Argentina and more. Additional investment into cataloging further assets including DAGs, ML Models & APIs, further data lifecycle governance, and expanded API integrations & automations.
Deepened Data Democratization & Trust
Created a single source of truth for data assets across 650+ tables and 500+ dashboards, enabling 500+ users to confidently find the right data sources and use them correctly.
Improved Risk Management & Governance
Defined ownership and accountability across the organization with assigned data owners and business domains improved data stewardship of data and ensured the right people were notified for any issues.
Standardized Knowledge Management
The data dictionary standardized business concepts and rules across the organization and reduced confusion between teams.
Enhanced Operational Productivity
Implemented automated data quality monitoring with custom tests and JIRA integration, allowing teams to proactively detect and address data issues before they impact business operations.
Data Quality Assurance
Centralized data documentation and quality seals across different data teams ensure high standards for data quality and understanding. Lineage helped to map this quality across upstream and downstream data sources.
