What you'll take away
The meeting is fifteen minutes in when the CFO says it: 'These numbers don't match what I'm seeing in the other report.' The room goes quiet. The analyst who built the dashboard looks at the screen. The analyst who built the other report opens their laptop. Twenty minutes later, someone finds the discrepancy: two different SQL queries are calculating 'monthly recurring revenue' using slightly different logic — one includes trial accounts, one does not.
This is not an edge case. It is the most common data crisis in organisations that have grown their reporting capabilities faster than their data transformation discipline. The problem is not that the data is wrong. The problem is that nobody owns the definition of 'monthly recurring revenue' at the transformation layer — so every analyst defines it themselves, and the definitions diverge.
Analytics engineering is the discipline that makes this problem structural rather than incidental. And dbt is the tool that made analytics engineering mainstream.
The Transformation Problem
Every organisation that runs a data warehouse has a transformation problem. Raw data from source systems — CRMs, ERPs, transactional databases, third-party APIs — arrives in the warehouse in the schema and format of the source, not the format that makes business sense for analysis. Transforming raw source data into business-ready models requires SQL logic — and that SQL logic is the source of the trust problem.
In organisations without an analytics engineering practice, transformation logic lives in multiple places simultaneously: in BI tool calculated fields, in analyst SQL queries saved in Looker or Tableau, in Excel formulas in spreadsheets sent by email, and in stored procedures in the data warehouse written years ago by someone who is no longer with the company. When the same calculation lives in six different places with six slightly different implementations, 'which number is right?' becomes unanswerable.
What Analytics Engineering Changes
Analytics engineering moves the transformation logic from BI tools and ad-hoc queries into a version-controlled, tested, documented transformation layer — typically built using dbt (data build tool). The transformation layer becomes the single source of truth for how raw data is translated into business metrics, and every report, dashboard, and analysis is built on the same tested models rather than on independent implementations.
One Definition, Everywhere
When 'monthly recurring revenue' is defined once, in a dbt model, with the business logic documented in a schema.yml file and the definition accessible to every analyst — there is no disagreement about what the number means. Analysts build on the definition rather than reimplementing it. When the definition needs to change (trials are now excluded from MRR), it changes in one place and propagates to every downstream report automatically.
Testing That Transformations Are Correct
dbt enables automated testing of transformation logic: every model can have assertions attached — 'this column should never be null', 'these values should be unique', 'this foreign key should always have a match in the reference table'. These tests run on every deployment, catching transformation errors before they reach a business consumer. This is the transformation-layer equivalent of unit testing in software engineering — and it is almost entirely absent in organisations that have not adopted analytics engineering practices.
Documentation That Stays Current
dbt generates documentation automatically from the model code and schema definitions, producing a searchable data catalogue that shows how every model is defined, what its columns mean, and where the data comes from. Because the documentation is generated from the code rather than written separately, it stays current with the actual transformation logic — eliminating the stale documentation problem that makes manually maintained data dictionaries useless within months of creation.
Organisations that adopt analytics engineering practices with dbt report 60–70% reductions in time spent on data discrepancy investigations — questions that previously required an analyst to trace through multiple implementations are answered by reading the documented model definition.
The Semantic Layer: One Metric, One Definition
The semantic layer takes analytics engineering a step further: instead of just version-controlling the transformation SQL, it creates a metric store where business metrics are defined once, in language that non-technical stakeholders can understand, and consumed by any BI tool that connects to the semantic layer.
Tools like dbt Semantic Layer (MetricFlow), Cube, and AtScale sit between the data warehouse and the BI layer, translating business metric definitions into warehouse-optimised SQL at query time. When 'monthly recurring revenue' is defined in the semantic layer, it is consistently calculated regardless of whether the query comes from Tableau, Power BI, Looker, or a direct API call from an internal application.
The Analytics Engineering Stack
Is your data stack slowing down your AI?
48-hour turnaround. No obligation.
- dbt Core or dbt Cloud: the transformation framework — SQL-based model definitions, testing, documentation, and lineage. dbt Cloud adds a managed scheduler, IDE, and CI/CD integration.
- A cloud data warehouse: Snowflake, BigQuery, Databricks, or Redshift — the compute layer where transformations execute.
- A version control system: Git for the dbt project — every model change is tracked, reviewed, and reversible.
- A BI tool that connects to the transformation layer output: Looker, Tableau, Power BI, or Metabase — consuming the tested, documented models rather than raw source tables.
- An orchestration tool: Airflow, Prefect, or Dagster — scheduling dbt runs and managing dependencies between models and upstream pipeline stages.
The Analytics Engineer Role
Analytics engineering requires a role that sits between data engineering and data analysis — technically fluent enough to write production-grade SQL and manage a dbt codebase, but oriented toward business value rather than infrastructure. The analytics engineer translates business metric definitions into dbt models, reviews analyst SQL for consistency with established transformation patterns, and owns the semantic layer that every BI tool queries.
Most organisations that adopt analytics engineering practices discover they have analyst SQL in production that should be formalised into dbt models, transformation logic in BI tools that should be moved upstream, and metric definitions that exist in spreadsheets and emails but nowhere in the data infrastructure. The analytics engineer's first six months are typically spent on this formalisation — with the payoff being a data environment where 'which number is right?' stops being a recurring meeting agenda item.
The 2026 Layer: Natural Language Querying and AI-Generated Narrative
Analytics engineering solves the trust problem. A mature semantic layer solves the consistency problem. Neither solves the access problem: most business stakeholders still cannot query the data warehouse directly, and their interaction with data remains mediated by a dashboard someone else built, answering questions someone else decided were worth answering.
Natural language querying (NLQ) addresses this directly. A semantic layer with well-defined metrics and documented business logic can serve as the foundation for an AI query interface: a stakeholder types 'what was our average revenue per account by region last quarter, excluding trial accounts?' and receives a data-backed answer in seconds — without opening Looker, without finding the right dashboard, and without emailing an analyst.
Tools like Cube's AI layer, dbt's natural language querying integration, and commercial BI tools' AI-generated narrative features (Tableau Pulse, Power BI Copilot, ThoughtSpot Sage) are now production-ready for organisations with a clean semantic layer underneath them. The quality of the AI query response is determined entirely by the quality of the semantic layer: a poorly defined metric produces a confidently incorrect AI answer that is harder to catch than a confidently incorrect dashboard number.
Natural language querying built on an undefined semantic layer is a trust problem generator, not a trust problem solver. An AI that confidently answers 'what was our MRR last month?' using its own interpretation of MRR — when your semantic layer doesn't define MRR — produces the exact same 'which number is right?' crisis, just with an AI's authority behind the wrong answer.
AI-Generated Narrative Insights
The next layer above NLQ is proactive AI-generated narrative: instead of waiting for a stakeholder to ask a question, the system monitors metric trends and proactively surfaces notable changes with explanations. 'Weekly active users dropped 12% week-on-week. The decline is concentrated in the enterprise segment, specifically accounts provisioned before Q3 2025, and correlates with the reduction in push notification frequency that took effect Monday.' This is the beginning of data interaction that does not require a dashboard at all.
Implementing this layer effectively requires the same prerequisite: a clean, tested, documented transformation foundation. AI narrative generation fed from untested, undocumented metric definitions produces narrative that sounds authoritative and is unreliable. The analytics engineering investment is not just a prerequisite for dashboard trust — it is the prerequisite for the AI data interaction layer that is becoming the enterprise standard.
GYSP's Data Engineering & Analytics practice designs and implements analytics engineering stacks for organisations that have outgrown their ad-hoc transformation layer — building the dbt foundation, migrating existing transformation logic, establishing testing and documentation practices, and where appropriate implementing the semantic layer and NLQ interface that makes data accessible to non-technical stakeholders without sacrificing the consistency that makes it trustworthy.
“A BI dashboard is only as trustworthy as the transformation layer underneath it. An AI that answers data questions is only as trustworthy as the semantic layer it draws from. The investment in analytics engineering is not a prerequisite for better dashboards — it is a prerequisite for every data interaction model that comes after dashboards.”
— Ankush, Chief Technology Officer — GYSP.tech
