Databricks Governance: ABAC, Governed Tags, and Unity Catalog

Data governance used to mean a spreadsheet listing who owns what, reviewed once a quarter, accurate never. At platform scale that breaks down fast. Databricks Unity Catalog, combined with attribute-based access control (ABAC) and governed tags, offers a more durable model: policies declared once, enforced everywhere, automatically applied as data and users change.

This article unpacks how these three pieces fit together — what they are, how they interact, and how to build a governance model that actually holds up in production.

The Governance Problem at Scale

Role-based access control (RBAC) is the starting point for most platforms. Principals are assigned to roles; roles carry permissions on objects. It is simple to reason about and straightforward to implement.

The cracks appear when:

  • You have thousands of tables across dozens of catalogs and schemas
  • Sensitive columns live inside tables alongside non-sensitive columns
  • Access should vary by the sensitivity of the data, not just which object it lives in
  • New tables appear daily and need the right permissions from day one

RBAC asks you to manage permissions object by object. ABAC flips the model: you describe attributes of subjects and objects, write policies against those attributes, and let the engine enforce them automatically.

Databricks Unity Catalog is the platform where this plays out. It is the centralised metastore that governs data, compute, models, and volumes across all workspaces in an account. Everything goes through Unity Catalog, which means governance policies applied there cover the entire estate.

Unity Catalog: The Governance Plane

Before ABAC and tags make sense, it is worth being precise about what Unity Catalog provides.

Unity Catalog sits above workspaces. A single Unity Catalog metastore can serve multiple workspaces, which means:

  • Permissions are defined once and apply everywhere
  • Lineage, audit logs, and data quality metrics are aggregated account-wide
  • There is no need to re-grant permissions when a team spins up a new workspace

The object hierarchy is:

Metastore
└── Catalog
    └── Schema
        └── Table / View / Volume / Model

Permissions cascade downward — granting USE CATALOG allows traversal; granting SELECT on a schema grants it on all tables in that schema. This hierarchy is the foundation on which tag-based and attribute-based policies are layered.

Governed Tags

Tags in Unity Catalog are key-value metadata attached to securable objects: catalogs, schemas, tables, columns. They describe what the data is — its sensitivity, domain, classification, or regulatory scope.

Governed tags are the enterprise-grade version of this. Rather than letting any user apply any tag string ad hoc, governed tags enforce:

  • A controlled vocabulary of allowed tag keys and values
  • Permissions on who can create, apply, and remove tags
  • Inheritance rules so tags propagate to child objects

Defining a Tag Taxonomy

A practical tag taxonomy typically covers three concerns:

Sensitivity classification:

sensitivity = public | internal | confidential | restricted

Regulatory scope:

regulation = gdpr | hipaa | pci | sox | none

Data domain:

domain = finance | hr | product | marketing | platform

These tags live on the objects themselves — applied to a table, a schema, or an individual column — and travel with those objects across the entire account.

Applying Tags

Tags are applied via SQL using ALTER TABLE … SET TAGS:

-- Tag an entire table
ALTER TABLE finance.gold.transactions
SET TAGS ('sensitivity' = 'confidential', 'regulation' = 'pci');

-- Tag a specific column
ALTER TABLE hr.gold.employees
ALTER COLUMN salary
SET TAGS ('sensitivity' = 'restricted', 'regulation' = 'gdpr');

Column-level tags are critical — most tables contain a mix of sensitive and non-sensitive columns. Tagging at the column level lets you enforce row and column masking policies against exactly the right data, not the entire table.

Tag Inheritance

Tags applied to a schema propagate to all tables within it. Tags applied to a catalog propagate further. This means you can tag a schema domain = hr and every table created inside it automatically inherits that attribute — no manual tagging required as the schema grows.

The precedence rule: a more specific tag (column) overrides a less specific one (table), which overrides a schema tag. This lets you set defaults at the schema level and override exceptions at the column level.

Attribute-Based Access Control

ABAC in Databricks is built on top of the tag infrastructure. Instead of saying “grant SELECT on this table to this group”, you write a policy that says “principals with attribute team=finance may access objects tagged domain=finance and sensitivity=internal or lower.”

The mechanism that enforces this is row-level security and column masking via dynamic views or, in Unity Catalog, native Delta Sharing policy expressions. But the conceptual model is ABAC: access is determined by comparing attributes of the requestor against attributes of the resource.

Principals and Attributes

In Databricks, principal attributes come from the identity provider (IdP) — Entra ID, Okta, or another SCIM-compatible directory. Groups are the primary carrier of attributes:

Group: finance-analysts         → members can read finance data up to confidential
Group: finance-executives       → members can read finance data up to restricted
Group: hr-bp                    → members can read hr data, PII masked
Group: data-platform-engineers  → can manage catalogs, cannot bypass column masks

Groups are synchronised from the IdP via SCIM into the Databricks account console. Unity Catalog grants are made against these groups, never against individual users — individual grants are unmanageable at scale and become a liability during offboarding.

Dynamic Column Masking

Column masking policies attach a masking function to a tagged column. When a query touches that column, Databricks evaluates the function and either returns the real value or a masked substitute, depending on the current principal’s group membership.

-- Define a masking function
CREATE OR REPLACE FUNCTION masks.mask_pii(col STRING)
RETURNS STRING
RETURN CASE
  WHEN is_member('hr-full-access') THEN col
  WHEN is_member('hr-bp')          THEN regexp_replace(col, '(?<=.{2}).', '*')
  ELSE '***REDACTED***'
END;

-- Attach the mask to a column tagged sensitivity=restricted
ALTER TABLE hr.gold.employees
ALTER COLUMN national_id
SET MASK masks.mask_pii;

The masking function is transparent to consumers — they write a plain SELECT, and the platform decides what they see. No application-level filtering, no view gymnastics, no trust that the analyst’s query remembered to apply a WHERE clause.

Row-Level Security

Row filters work similarly. A filter function is attached to a table and evaluated per query, restricting which rows are visible to the current principal:

-- Row filter: analysts see only their region's data
CREATE OR REPLACE FUNCTION filters.region_filter(region_col STRING)
RETURNS BOOLEAN
RETURN is_member('global-read') OR region_col = current_user_region();

ALTER TABLE sales.gold.opportunities
SET ROW FILTER filters.region_filter ON (region);

Combined with column masking, you can model complex access patterns purely in the platform layer — no views, no application logic, no duplicated datasets.

Putting It Together: A Governance Model

The practical architecture looks like this:

Identity Provider (Entra ID / Okta)
         │
    SCIM sync
         │
Databricks Account Console (Groups)
         │
Unity Catalog Grants (RBAC foundation)
         │
Governed Tags on Objects (data attributes)
         │
Column Masks + Row Filters (ABAC enforcement)

Step 1: Define the tag taxonomy

Work with data owners and the security team to agree the controlled vocabulary. Keep it small — six values for sensitivity is better than twenty. Publish it in a data contract or governance wiki so teams know what to apply when ingesting new datasets.

Step 2: Enforce tagging at ingestion

Tag application should be part of the pipeline, not an afterthought. In a dbt project, tags can be applied in post-hooks:

-- dbt post-hook for a model in the hr domain
 SET TAGS ('domain' = 'hr', 'sensitivity' = 'confidential')"
) }}

For Databricks Asset Bundles or Terraform-managed infrastructure, apply tags as part of the resource definition so they are version-controlled and auditable.

Step 3: Write masking and filter functions centrally

Store all masking functions in a dedicated masks schema and filter functions in filters. Grant EXECUTE on these functions only to the service principals that manage catalog infrastructure — never to end users. This prevents circumvention.

Step 4: Attach policies to tagged columns

Rather than attaching masks table by table, use the tag-based policy association where available. In Unity Catalog, the SYSTEM.INFORMATION_SCHEMA can be queried to find all columns carrying a specific tag, and automation can ensure masks are consistently attached:

-- Find all columns tagged sensitivity=restricted without a mask
SELECT table_catalog, table_schema, table_name, column_name
FROM system.information_schema.column_tags
WHERE tag_name = 'sensitivity'
  AND tag_value = 'restricted'
  AND column_name NOT IN (
    SELECT column_name
    FROM system.information_schema.column_masks
  );

Running this as a scheduled job surfaces governance drift before it becomes a problem.

Step 5: Audit continuously

Unity Catalog writes all access events to system.access.audit. This includes every SELECT, every GRANT, every tag change, and every policy evaluation. Querying the audit log against the tag taxonomy gives you a real picture of who accessed what sensitive data, when, and from which notebook or job:

SELECT
    event_time,
    user_name,
    action_name,
    request_params:table_full_name AS table_name,
    response:statusCode            AS status
FROM system.access.audit
WHERE action_name IN ('SELECT', 'commandSubmit')
  AND event_time >= current_date() - INTERVAL 7 DAYS
ORDER BY event_time DESC;

Cross-referencing audit events with the tag taxonomy answers the questions auditors ask: “Who accessed PCI-scoped data last month? Were they in the approved group?”

Common Pitfalls

Tagging too late. Tags applied after data is already in consumers’ hands do not retroactively restrict access. Make tagging part of the ingestion checklist, enforced by CI.

Too many groups. If every team has a bespoke group for every sensitivity level, the group matrix becomes unmanageable. Define group roles by function (analyst, engineer, data steward) and let sensitivity tags — not groups — determine what each role can see.

Masking functions that can be bypassed. A masking function on a column in a table does not mask that column if a user queries a derivative view that selects the column directly. Audit your view layer. Where possible, prefer native column masks over view-based masking.

Gaps at schema creation time. When a new schema is created, it has no tags. Tables created inside it before tagging is applied are ungoverned by default. Automate schema tagging via Terraform or Databricks Asset Bundles so the tag is set before any table exists.

Why This Model Holds Up

The RBAC-only model fails because it is object-centric. Every new table is a new surface to manage. ABAC with governed tags inverts this: policies are written once against attributes, and objects self-classify via their tags. A new table ingested into the hr schema automatically inherits domain=hr, automatically triggers any masks attached to columns bearing sensitivity=restricted, and automatically appears in audit queries filtered on regulation=gdpr.

Governance stops being a backlog of permission tickets and becomes a set of standing policies that the platform enforces without manual intervention. That is the model that scales.

Summary

Capability What it does
Unity Catalog Centralised metastore; single governance plane across all workspaces
Governed tags Controlled vocabulary of key-value metadata applied to objects and columns
Tag inheritance Tags propagate from catalog → schema → table → column
Column masking Per-column masking functions evaluated at query time based on principal group
Row filters Per-table row predicates limiting visible rows by principal attribute
System audit log Append-only record of all access events, queryable in SQL

Together they form a layered governance model: coarse-grained control via Unity Catalog grants, fine-grained enforcement via ABAC policies, and continuous assurance via the audit log. Each layer reinforces the others, and all three are declared in code — auditable, version-controlled, and testable.