Understanding the Medallion Architecture
The Medallion Architecture is one of the most widely adopted patterns for organising data in a modern data lakehouse. By structuring data into distinct layers — Bronze, Silver, and Gold — teams can manage data quality progressively while keeping raw data accessible and auditable.
What Is the Medallion Architecture?
Originally popularised by Databricks, the Medallion Architecture (also called a multi-hop architecture) defines a series of data quality layers:
- Bronze — raw, ingested data
- Silver — cleaned and conformed data
- Gold — business-ready, aggregated data
Each layer adds value by applying transformations, validations, or aggregations appropriate to that stage.
Bronze Layer: The Source of Truth
The Bronze layer stores data exactly as it arrives from source systems. No transformations, no cleaning — just raw ingestion.
Key characteristics:
- Append-only writes where possible
- Full schema preservation, including unexpected fields
- Metadata such as ingestion timestamp and source system ID
- Long retention (often indefinitely)
This makes Bronze the single source of truth for all downstream layers. If something goes wrong in Silver or Gold, you can always replay from Bronze.
Silver Layer: Cleansed and Conformed
The Silver layer applies business rules and data quality checks. Data is cleaned, deduplicated, and conformed to a standard schema.
Common transformations:
- Null handling and type casting
- Deduplication and SCD (Slowly Changing Dimension) logic
- Schema evolution management
- Data quality filtering (removing records that fail validation)
Silver is typically where analytics engineers and data scientists go for their raw material — it’s reliable, but not yet optimised for any specific use case.
Gold Layer: Business-Ready Data
Gold tables are purpose-built for specific business domains or consumers. They are often denormalised for query performance, pre-aggregated, and documented with clear ownership.
Examples:
- Daily sales summary tables
- Customer 360 views
- KPI dashboard aggregated tables
Gold layer tables are the primary target for BI tools, reporting, and ML feature stores.
Why It Works
The Medallion Architecture succeeds for several reasons:
- Separation of concerns — ingestion, cleaning, and business logic are kept separate
- Auditability — raw data is always preserved in Bronze
- Incremental processing — each layer can be processed independently
- Scalability — works at petabyte scale with Delta Lake, Iceberg, or Hudi
Common Pitfalls
- Over-engineering Gold — not every table needs a Gold layer; Silver may be sufficient for many use cases
- Skipping Silver — jumping directly from Bronze to Gold creates brittle pipelines
- No data contracts — without agreed schemas between layers, downstream consumers break frequently
Conclusion
The Medallion Architecture provides a clear, scalable framework for managing data quality in a modern data platform. Whether you are building on Databricks, Snowflake, or an open-source stack, this pattern helps teams move from raw ingestion to trusted, business-ready data with confidence.