What you'll take away
The batch pipeline was a reasonable engineering decision when it was built. Processing yesterday's transactions overnight, loading the results by 06:00, and making them available for the morning dashboard was a workable pattern when real-time compute was prohibitive and the business did not need sub-hour data freshness to operate.
That context has changed. The compute cost of real-time streaming has dropped dramatically — Apache Kafka runs on commodity infrastructure; managed services like Confluent, AWS Kinesis, and Google Pub/Sub have made streaming operationally accessible to teams without infrastructure specialisation. And the business cases that once tolerated T+1 data — fraud detection, personalisation, dynamic pricing, operational alerting — have evolved into cases where T+1 is not just inconvenient but structurally inadequate.
The question for most organisations is not whether real-time data is strategically valuable. It clearly is. The question is which workloads to prioritise and what the realistic engineering investment looks like.
The Business Cost of Batch Latency
Batch processing latency is not simply a technical inefficiency — it creates measurable business costs that compound over time as competitive environments move faster.
Fraud and Risk Decisions
Fraud detection that runs on yesterday's data catches yesterday's fraud patterns. A fraudster who tests a card with a £1 transaction at midnight and charges £4,000 at 01:00 escapes a batch risk model entirely — the test transaction is not visible to the model until the following morning's batch, by which point the damage is done. Real-time fraud scoring, running on streaming transaction data, catches the test-and-exploit pattern within seconds. McKinsey's 2024 analysis of financial services firms found that those operating with real-time transaction data reduced fraud losses by an average of 35% compared to peers running batch risk models.
Inventory and Operational Decisions
A retailer whose inventory system updates overnight makes replenishment decisions on data that is up to 24 hours stale. A product that sold out at 14:00 on Tuesday still shows as available until the overnight batch runs. Every customer who sees that product as available and places an order creates a fulfilment failure — at a cost of the failed order, the customer service interaction, and the logistics of the return.
Personalisation and Recommendation
A personalisation engine that updates user preference models in overnight batch shows users recommendations based on yesterday's behaviour. A user who browses running shoes extensively on Monday morning and returns Monday afternoon gets recommendations built on Sunday's session. Real-time behavioural streaming cuts the personalisation latency from hours to seconds, consistently producing 10–20% uplift in recommendation relevance and click-through rates across e-commerce implementations.
Confluent's 2024 Data Streaming Report found that 84% of organisations running real-time streaming infrastructure reported measurable competitive advantages over peers still operating on batch pipelines — with the most significant advantages in fraud detection, personalisation, and operational alerting.
The Streaming Architecture — Core Components
The Event Broker
Apache Kafka is the de facto standard for high-throughput event streaming. It provides the durable, ordered, replayable event log that downstream consumers process at their own pace — decoupling producers from consumers and providing the fault tolerance that production streaming systems require. Confluent Cloud, AWS MSK, and Azure Event Hubs provide managed Kafka-compatible services that remove the operational burden of self-hosted Kafka. For lower-throughput use cases, AWS Kinesis or Google Pub/Sub provide simpler managed alternatives with less operational overhead.
The Stream Processor
Is your data stack slowing down your AI?
48-hour turnaround. No obligation.
Apache Flink is the leading open-source stream processing engine for complex, stateful streaming workloads — aggregations over time windows, joins between streams, and sophisticated event pattern matching. Apache Spark Structured Streaming provides a near-real-time alternative (micro-batch, typically 1–30 second latency) that is more familiar to teams with existing Spark expertise and suitable for workloads that do not require true sub-second latency. For simple transformations and routing, Kafka Streams or ksqlDB can process events entirely within the Kafka ecosystem without a separate processing cluster.
The Serving Layer
Processed streaming data needs to be materialised somewhere that downstream consumers can query efficiently. For operational use cases — fraud scoring, real-time recommendations — the serving layer is typically a low-latency key-value store (Redis, DynamoDB) or a feature store. For analytical use cases, OLAP databases such as Apache Druid, ClickHouse, and StarRocks ingest streaming data and provide sub-second query performance, enabling real-time dashboards that were previously only possible with batch pipelines.
The Migration Path: Batch to Streaming
The most common mistake in streaming migrations is treating it as a wholesale replacement of the batch pipeline. Most organisations have dozens of batch pipelines — not all of them warrant real-time processing, and attempting to stream everything simultaneously creates a risk surface that is difficult to manage.
- 1Identify the high-value, latency-sensitive workloads: fraud detection, operational alerting, personalisation engines, and dynamic pricing are almost universally the highest-value targets — the workloads where batch latency is actively costing the business money.
- 2Start with event-driven architecture on new workloads: rather than migrating existing batch pipelines, architect new capabilities as streaming-first. This builds streaming competency without the migration risk of touching production batch pipelines.
- 3Layer streaming on top of existing batch for enrichment: use streaming for the latency-sensitive component of a workflow — real-time fraud scoring, for example — while keeping the batch pipeline for comprehensive T+1 processing. The streaming layer provides speed; the batch layer provides completeness.
- 4Migrate batch pipelines iteratively, starting with the lowest-complexity, highest-latency-cost pipelines: the overnight ETL feeding a morning operational dashboard is a natural migration candidate — high value from reduced latency, relatively straightforward to re-implement as a streaming pipeline.
The Streaming Tax: Where Investment Is Required
Real-time streaming is not simply a faster version of batch processing — it is a different paradigm that requires different engineering skills, different testing approaches, and different operational tooling. Late-arriving events, exactly-once processing semantics, stateful aggregation across unbounded streams, and watermarking for time-window calculations are streaming-specific concerns without direct batch analogues.
Organisations moving from batch to streaming typically need to invest in: streaming-competent data engineers (rarer and more expensive than batch-oriented practitioners), streaming-specific testing infrastructure (event generators, chaos testing for consumer lag scenarios), and streaming-specific operational monitoring (consumer group lag, partition skew, processing latency percentiles).
GYSP's Data Engineering & Analytics practice designs and builds streaming data architectures for clients across fintech, e-commerce, and logistics — environments where batch latency is a measurable competitive disadvantage and the engineering investment in real-time infrastructure has a quantifiable return.
“The question we hear most often is 'is our business ready for real-time data?' The right question is: 'what is the cost of the decisions we are making on data that is already 18 hours old?' Once you answer that, the investment case answers itself.”
— Ankush, Chief Technology Officer — GYSP.tech
