Data Engineering & AnalyticsSnowflakeData WarehouseFinOpsQuery OptimisationCloud Data

The 1,000 SQL Query: Why Your Snowflake Bill Is Spiralling

Ankush

Chief Technology Officer, GYSP.tech

15 April 20259 min read

What you'll take away

The Three Root Causes of Snowflake Cost Spirals
The Workload Patterns That Destroy Snowflake Economics
The Optimisation Levers That Matter Most
Validated Outcomes
The Governance Dimension

The Snowflake migration was a success. Query performance improved dramatically. The data team loves the separation of storage and compute. The business intelligence team can now run workloads in parallel without fighting for resources. Then the first full month's invoice arrives and someone from Finance appears at the data team's Slack channel asking what happened to the data budget.

Snowflake's credit-based pricing model is well-designed and genuinely transparent — but it exposes query patterns and configuration decisions that were harmless or invisible in an on-premise environment. A query that ran for three minutes on a dedicated cluster becomes a three-minute charge on a Snowflake warehouse. Run that query a thousand times per day and you have a billing line item that didn't exist before.

The Three Root Causes of Snowflake Cost Spirals

1. Full Table Scans on Clustered Data

Snowflake uses micro-partitioning — automatically dividing tables into compressed, immutable storage partitions — and clustering keys to determine which partitions a query needs to scan. When a query filters on a column that isn't part of the clustering key, Snowflake must scan all partitions to find matching rows. On large tables, this can be hundreds of times more expensive than a well-clustered scan. Teams that migrate queries from on-premise environments without reviewing clustering keys often discover that their most expensive queries are performing full-table scans on billion-row tables.

2. Warehouse Sizing Carried Over from On-Premise

On-premise, warehouse size was a fixed capital expense — you bought what you could afford and used it constantly. In Snowflake, warehouse size determines compute per second, and a larger warehouse runs the same query faster but costs more per second. The optimal sizing depends on the query pattern: for interactive queries where latency matters, a larger warehouse may be justified. For batch ETL jobs running overnight, a smaller warehouse that takes twice as long is often half the cost.

3. Excessive Auto-Suspension Delays

Snowflake warehouses can be configured to auto-suspend after a period of inactivity. The default in many configurations is ten minutes. If your data team runs a query, reads the results for eight minutes, then runs another query, the warehouse may never suspend — and you're billed for the full idle period between queries. Setting aggressive auto-suspension (sixty seconds for interactive workloads, even shorter for batch) and auto-resumption can eliminate a significant fraction of credits consumed by idle compute.

The most common Snowflake cost issue we diagnose: a warehouse set to XL or 2XL for a migration 'just to make sure it's fast enough' that was never resized after the migration. Running production workloads on a 2XL warehouse that a Medium would handle adequately costs four times more per credit.

The Workload Patterns That Destroy Snowflake Economics

Exploratory queries on large tables without partition pruning: Data analysts running ad-hoc queries that scan entire fact tables because the filter column isn't in the clustering key
High-cardinality GROUP BY without pre-aggregation: Queries that aggregate billions of rows to produce a small summary that could have been pre-materialised in a downstream table
Repeated identical queries without result caching: Applications that run the same query dozens of times per minute instead of caching results at the application layer
Multi-statement transactions on large tables: Transactional patterns from OLTP systems that are antipatterns in a columnar warehouse
Unconstrained third-party BI tool queries: Self-service BI tools generating inefficient SQL against large tables without query governance

The Optimisation Levers That Matter Most

Is your data stack slowing down your AI?

48-hour turnaround. No obligation.

Request Data Assessment

The highest-ROI Snowflake optimisation levers, in order of typical impact: First, review and optimise clustering keys on your largest, most-queried tables — correct clustering can reduce scan costs by 80% or more on tables with clear time or category-based access patterns. Second, right-size warehouses by workload type: create separate, correctly-sized warehouses for interactive, ETL, and data science workloads rather than sharing a single large warehouse across all patterns. Third, set aggressive auto-suspension and use scheduled scaling to pre-warm warehouses before known peak periods rather than leaving them running idle.

Fourth, implement result cache discipline — understand which of your queries are hitting Snowflake's result cache (free) versus computing results (charged) and architect your application and BI tool query patterns to maximise cache hits. Fifth, consider materialised views and pre-aggregated tables for commonly-run expensive queries that always produce the same result on the same data.

Validated Outcomes

DoorDash's data engineering team published a detailed case study in 2023 documenting a Snowflake cost optimisation initiative that reduced their monthly Snowflake spend by approximately 35%. The two largest contributors were warehouse right-sizing — identifying warehouses consistently running at low utilisation and either downsizing or consolidating them — and query optimisation, where 10% of queries accounted for over 70% of credit consumption. DoorDash's approach was entirely analytical: no data was moved, no architecture was restructured. The savings came from visibility and targeted intervention that the cost data, once organised, made obvious.

GYSP's Snowflake optimisation engagements follow a consistent methodology: credit consumption audit by warehouse and query pattern, identification of the top 20 queries by credit cost, warehouse sizing review, and Resource Monitor implementation. The 40–60% reduction in credit consumption GYSP regularly delivers in 90-day engagements is achievable because Snowflake environments accumulate inefficiency predictably — and the same optimisation categories surface in almost every environment that has not had a structured review.

The Governance Dimension

Technical optimisation alone is insufficient if the organisation doesn't have visibility into which teams and workloads are consuming credits. Snowflake's Resource Monitors allow you to set credit quotas by warehouse and trigger alerts or suspensions when thresholds are reached. Without this governance layer, a single runaway query or a newly-onboarded team with poor query habits can wipe out a month's optimisation savings in a day.

GYSP's Data Engineering & Analytics practice has optimised Snowflake environments for mid-market and enterprise clients across financial services, e-commerce, and healthcare. The typical outcome of a structured optimisation engagement: 40–60% reduction in credit consumption within ninety days, with sustainable governance controls that prevent regression.

“Snowflake is a phenomenal platform for analytics — but it doesn't optimise itself. Teams that migrated from on-premise warehouses without rethinking their query patterns are paying for their old infrastructure's limitations at cloud pricing.”
— Ankush, Chief Technology Officer — GYSP.tech

ShareLinkedIn Twitter / X

In this article

Is your data stack slowing down your AI?

Get a free data readiness assessment — we diagnose your pipeline, governance, and transformation layer and identify what needs to change.

60–70%

less time on data discrepancy investigations

after analytics engineering with dbt and a defined semantic layer — one definition, everywhere

Request Data Assessment

4.7 on Clutch · 31 reviews

Or call: +1 (929) 588-8364

About the Author

Ankush

Chief Technology Officer, GYSP.tech

Related Services

Ready to act on this?

Is your data stack slowing down your AI?

Get a free data readiness assessment — we diagnose your pipeline, governance, and transformation layer and identify what needs to change.

2×

Faster decision-making

60%

Faster feature rollouts

Zero

Data mismatches at reconciliation

Request Data Assessment

48-hour turnaround · No obligation · Senior engineers only

Get new Data Engineering & Analytics insights in your inbox

Practical, no-fluff articles for engineers and technology leaders. New pieces delivered as they're published.

No spam. Unsubscribe any time.

The 1,000 SQL Query: Why Your Snowflake Bill Is Spiralling

The Three Root Causes of Snowflake Cost Spirals

1. Full Table Scans on Clustered Data

2. Warehouse Sizing Carried Over from On-Premise

3. Excessive Auto-Suspension Delays

The Workload Patterns That Destroy Snowflake Economics

The Optimisation Levers That Matter Most

Validated Outcomes

The Governance Dimension

Is your data stack slowing down your AI?

Get new Data Engineering & Analytics insights in your inbox

More from the Blog

Your Data Warehouse Is Not Ready for AI. Your Data Team Probably Knows It.

Why Your Data Pipeline Keeps Breaking Your AI

Stop Looking at the Cloud Bill: Why Unit Economics Is the Only Metric That Matters