Data Engineering & AnalyticsData GovernanceData ContractsData CatalogueData QualityData Engineering

Data Governance Without the Bureaucracy: The Lightweight Framework That Actually Gets Adopted

Dhaval Rana

Founder & CEO, GYSP.tech

1 March 20269 min read

What you'll take away

Why Governance Programmes Die in Committees
The Engineering-First Governance Model
The Lightweight Governance Stack
Validated Outcomes
Governance for Regulated Industries

The data governance programme had been running for eight months. A working group met monthly. A governance policy document was thirty-two pages long and had been approved by the executive team. A data catalogue had been purchased and a vendor had spent six weeks implementing it. The catalogue contained 847 data assets.

When the compliance audit came and asked which employees had access to which customer data assets, the answer was in neither the policy document nor the catalogue. The permissions were in a spreadsheet maintained by the database administrator.

This is what most data governance programmes actually produce: documentation of what good governance looks like, in a format that nobody consults when making the decisions that governance is supposed to govern. It is governance as performance — visible enough to satisfy a board question about data maturity, but not operational enough to change behaviour.

Why Governance Programmes Die in Committees

Traditional data governance programmes fail for a predictable set of reasons. They are designed by compliance and legal teams rather than by data engineers, so the outputs are policy documents rather than technical controls. They are operated by committees rather than platform teams, so decisions are slow and accountability is diffuse. And they attempt to govern everything simultaneously — establishing comprehensive standards for data quality, lineage, access control, retention, classification, and privacy in a single initiative — rather than governing the one or two things that actually matter most, first.

The result is a programme that is permanently in design mode: forever defining standards, never enforcing them. A data governance programme that does not change the behaviour of data producers and consumers is not governance — it is documentation.

The Engineering-First Governance Model

Effective data governance starts from the engineering question: how do we make the governed behaviour the path of least resistance? Policy documents make the governed behaviour the expectation. Platform controls make it the requirement. The difference between a policy that says 'all data assets must have an owner' and a platform that requires ownership assignment before a data asset can be published is the difference between aspiration and governance.

Data Contracts: Governance at the Source

A data contract is a formal agreement between a data producer (the team that owns and publishes a data asset) and its consumers (the teams that query it) about what the data contains, how reliable it is, and how changes will be communicated. Data contracts are the foundational governance primitive: they make implicit assumptions explicit, create accountability for data quality at the source rather than the destination, and provide the basis for automated quality monitoring.

A minimal data contract specifies: the schema and data types of the asset, the expected update frequency and freshness SLA, the quality guarantees (null rates, uniqueness constraints, referential integrity), the change notification process (how consumers will be informed of schema changes and how much advance notice they will receive), and the owner responsible for meeting these commitments. Teams that implement data contracts before any other governance capability find that most other governance problems become significantly more tractable.

Access Governance Through Code

Access control managed through spreadsheets and email requests is not governed — it is administered. Access governance managed through infrastructure-as-code (role definitions in Terraform, column-level security policies in Snowflake or BigQuery, row-level security filters in the semantic layer) is auditable, reviewable, and enforceable. When a compliance audit asks who has access to which customer data, the answer is in the Git repository — not in a spreadsheet on someone's laptop.

Cataloguing What Actually Matters

Data catalogues fail when organisations try to catalogue everything before establishing why the catalogue exists. A catalogue containing 847 data assets with varying documentation completeness is less useful than one containing 50 critical assets — the ones feeding executive dashboards, the ones in scope for the compliance audit, the ones causing the most data quality incidents — with complete, accurate, current documentation.

The cataloguing discipline that delivers value: identify the 20% of data assets that drive 80% of analytical and compliance requirements, document those completely, and expand catalogue scope only as the documentation practice is proven to stay current. Tools like DataHub (open source), Alation, or Collibra can support this — but the tool is not the governance, it is the container for governance decisions that need to be made before the tool is deployed.

Is your data stack slowing down your AI?

48-hour turnaround. No obligation.

Request Data Assessment

Gartner's 2024 Chief Data Officer survey found that 74% of organisations reporting 'successful' data governance programmes had implemented automated governance controls — policy-as-code, automated quality gates, access control through infrastructure — rather than relying on policy documents and manual oversight.

The Lightweight Governance Stack

Data contracts in YAML or JSON, version-controlled in Git, and validated automatically when a producer publishes a new version of a data asset. Tools like Soda Core, Great Expectations, or dbt tests can enforce the quality guarantees specified in the contract.
Column-level access control in the warehouse: Snowflake, BigQuery, and Databricks all support column-level and row-level security. Define access policies in Terraform, version-control them, and enforce at the warehouse layer rather than relying on application-level access control.
dbt for transformation governance: version-controlled transformation logic with automated testing means every change to how data is calculated is reviewed, tested, and documented before reaching production.
A focused data catalogue for critical assets: DataHub (open source) scoped to the critical asset inventory rather than the full data estate.
An on-call rotation for data contract violations: automated quality gates are only as effective as the response to the alerts they generate. A rotation that creates accountability for investigating and resolving violations closes the governance loop.

Validated Outcomes

ING Bank's data governance transformation — documented in multiple data engineering conference presentations — is notable for the business-driven framing the bank used to justify the investment. Rather than presenting data governance as a compliance exercise, ING quantified the cost of governance failures: regulatory fines, data remediation costs, and the productivity loss from data quality incidents. Their documented finding: the annual cost of data governance failures was 5–7x the cost of the governance programme that would have prevented them. After implementing engineering-first data governance with automated quality gates and access controls, ING reduced data-related regulatory findings by over 60% over a three-year period.

GYSP's governance engagements for regulated industries begin with an audit-evidence gap analysis: mapping the governance controls that upcoming regulatory examinations will require against the controls that are currently in place and automated. In most cases, 40–60% of required controls are either absent or documentary (policy documents rather than enforced platform controls). GYSP's 90-day governance programme closes the highest-risk gaps first — PII access logging, data contract enforcement, and transformation lineage — producing a governance posture that satisfies the next audit cycle without the compliance sprint that typically precedes it.

Governance for Regulated Industries

Financial services and healthcare organisations face data governance requirements that go beyond operational best practice into legal obligation: UK GDPR and DPIA requirements for personal data processing, FCA reporting requirements for transaction data, HIPAA requirements for protected health information, and SOC 2 requirements for customer data handling. In these environments, governance-as-documentation is not just ineffective — it is a compliance liability.

The engineering-first governance model is particularly valuable in regulated industries because it produces artefacts that satisfy audit requirements: Git histories demonstrating that data access policies were reviewed and approved, automated quality gate logs demonstrating that data quality controls were operational, and infrastructure-as-code demonstrating that access controls were enforced rather than just documented.

GYSP's Data Engineering & Analytics practice designs governance frameworks that are operational rather than documentary — starting with data contracts and access governance, expanding to cataloguing and lineage as the practice matures, and implementing automated enforcement so that governance is a property of the platform rather than a burden on the people.

“A governance policy that lives in a document is a governance intention. A governance control that is enforced by the platform is governance. The distance between those two things is exactly the distance between the audit that goes well and the audit that does not.”
— Dhaval Rana, Founder & CEO — GYSP.tech

ShareLinkedIn Twitter / X

Ready to act on this?

Is your data stack slowing down your AI?

Get a free data readiness assessment — we diagnose your pipeline, governance, and transformation layer and identify what needs to change.

2×

Faster decision-making

60%

Faster feature rollouts

Zero

Data mismatches at reconciliation

Request Data Assessment

48-hour turnaround · No obligation · Senior engineers only

Get new Data Engineering & Analytics insights in your inbox

Practical, no-fluff articles for engineers and technology leaders. New pieces delivered as they're published.

No spam. Unsubscribe any time.

Data Governance Without the Bureaucracy: The Lightweight Framework That Actually Gets Adopted

Why Governance Programmes Die in Committees

The Engineering-First Governance Model

Data Contracts: Governance at the Source

Access Governance Through Code

Cataloguing What Actually Matters

The Lightweight Governance Stack

Validated Outcomes

Governance for Regulated Industries

Is your data stack slowing down your AI?

Get new Data Engineering & Analytics insights in your inbox

More from the Blog

Your Data Warehouse Is Not Ready for AI. Your Data Team Probably Knows It.

Why Your Data Pipeline Keeps Breaking Your AI

The 1,000 SQL Query: Why Your Snowflake Bill Is Spiralling