What you'll take away
The data governance programme had been running for eight months. A working group met monthly. A governance policy document was thirty-two pages long and had been approved by the executive team. A data catalogue had been purchased and a vendor had spent six weeks implementing it. The catalogue contained 847 data assets.
When the compliance audit came and asked which employees had access to which customer data assets, the answer was in neither the policy document nor the catalogue. The permissions were in a spreadsheet maintained by the database administrator.
This is what most data governance programmes actually produce: documentation of what good governance looks like, in a format that nobody consults when making the decisions that governance is supposed to govern. It is governance as performance — visible enough to satisfy a board question about data maturity, but not operational enough to change behaviour.
Why Governance Programmes Die in Committees
Traditional data governance programmes fail for a predictable set of reasons. They are designed by compliance and legal teams rather than by data engineers, so the outputs are policy documents rather than technical controls. They are operated by committees rather than platform teams, so decisions are slow and accountability is diffuse. And they attempt to govern everything simultaneously — establishing comprehensive standards for data quality, lineage, access control, retention, classification, and privacy in a single initiative — rather than governing the one or two things that actually matter most, first.
The result is a programme that is permanently in design mode: forever defining standards, never enforcing them. A data governance programme that does not change the behaviour of data producers and consumers is not governance — it is documentation.
The Engineering-First Governance Model
Effective data governance starts from the engineering question: how do we make the governed behaviour the path of least resistance? Policy documents make the governed behaviour the expectation. Platform controls make it the requirement. The difference between a policy that says 'all data assets must have an owner' and a platform that requires ownership assignment before a data asset can be published is the difference between aspiration and governance.
Data Contracts: Governance at the Source
A data contract is a formal agreement between a data producer (the team that owns and publishes a data asset) and its consumers (the teams that query it) about what the data contains, how reliable it is, and how changes will be communicated. Data contracts are the foundational governance primitive: they make implicit assumptions explicit, create accountability for data quality at the source rather than the destination, and provide the basis for automated quality monitoring.
A minimal data contract specifies: the schema and data types of the asset, the expected update frequency and freshness SLA, the quality guarantees (null rates, uniqueness constraints, referential integrity), the change notification process (how consumers will be informed of schema changes and how much advance notice they will receive), and the owner responsible for meeting these commitments. Teams that implement data contracts before any other governance capability find that most other governance problems become significantly more tractable.
Access Governance Through Code
Access control managed through spreadsheets and email requests is not governed — it is administered. Access governance managed through infrastructure-as-code (role definitions in Terraform, column-level security policies in Snowflake or BigQuery, row-level security filters in the semantic layer) is auditable, reviewable, and enforceable. When a compliance audit asks who has access to which customer data, the answer is in the Git repository — not in a spreadsheet on someone's laptop.
Cataloguing What Actually Matters
Is your data stack slowing down your AI?
48-hour turnaround. No obligation.
Data catalogues fail when organisations try to catalogue everything before establishing why the catalogue exists. A catalogue containing 847 data assets with varying documentation completeness is less useful than one containing 50 critical assets — the ones feeding executive dashboards, the ones in scope for the compliance audit, the ones causing the most data quality incidents — with complete, accurate, current documentation.
The cataloguing discipline that delivers value: identify the 20% of data assets that drive 80% of analytical and compliance requirements, document those completely, and expand catalogue scope only as the documentation practice is proven to stay current. Tools like DataHub (open source), Alation, or Collibra can support this — but the tool is not the governance, it is the container for governance decisions that need to be made before the tool is deployed.
Gartner's 2024 Chief Data Officer survey found that 74% of organisations reporting 'successful' data governance programmes had implemented automated governance controls — policy-as-code, automated quality gates, access control through infrastructure — rather than relying on policy documents and manual oversight.
The Lightweight Governance Stack
- Data contracts in YAML or JSON, version-controlled in Git, and validated automatically when a producer publishes a new version of a data asset. Tools like Soda Core, Great Expectations, or dbt tests can enforce the quality guarantees specified in the contract.
- Column-level access control in the warehouse: Snowflake, BigQuery, and Databricks all support column-level and row-level security. Define access policies in Terraform, version-control them, and enforce at the warehouse layer rather than relying on application-level access control.
- dbt for transformation governance: version-controlled transformation logic with automated testing means every change to how data is calculated is reviewed, tested, and documented before reaching production.
- A focused data catalogue for critical assets: DataHub (open source) scoped to the critical asset inventory rather than the full data estate.
- An on-call rotation for data contract violations: automated quality gates are only as effective as the response to the alerts they generate. A rotation that creates accountability for investigating and resolving violations closes the governance loop.
Governance for Regulated Industries
Financial services and healthcare organisations face data governance requirements that go beyond operational best practice into legal obligation: UK GDPR and DPIA requirements for personal data processing, FCA reporting requirements for transaction data, HIPAA requirements for protected health information, and SOC 2 requirements for customer data handling. In these environments, governance-as-documentation is not just ineffective — it is a compliance liability.
The engineering-first governance model is particularly valuable in regulated industries because it produces artefacts that satisfy audit requirements: Git histories demonstrating that data access policies were reviewed and approved, automated quality gate logs demonstrating that data quality controls were operational, and infrastructure-as-code demonstrating that access controls were enforced rather than just documented.
GYSP's Data Engineering & Analytics practice designs governance frameworks that are operational rather than documentary — starting with data contracts and access governance, expanding to cataloguing and lineage as the practice matures, and implementing automated enforcement so that governance is a property of the platform rather than a burden on the people.
“A governance policy that lives in a document is a governance intention. A governance control that is enforced by the platform is governance. The distance between those two things is exactly the distance between the audit that goes well and the audit that does not.”
— Dhaval Rana, Founder & CEO — GYSP.tech
