Confluent has heavily invested in adding Apache Flink to Confluent Cloud. They hope to strengthen their offering (and growth prospects) in data streaming by providing a comprehensive solution that integrates both Kafka and Flink into one platform.

You may be asking yourself: Confluent already has a best-in-class data streaming solution for Kafka - what gives? This post will review how teams are using Kafka and Flink together, analyze pricing for Confluent’s and Amazon’s Flink offering, and share some anonymized data from thousands of infrastructure accounts to pinpoint the massive opportunity ahead for Confluent.

Recap of Batch Processing and Data Streaming

A key consideration between batch processing and data streaming is the difference between unbounded and bounded data. Unbounded data in data streaming refers to a continuous, potentially infinite stream of data with no predefined endpoint. Bounded data in batch processing refers to a finite set of data with a defined start and end. It consists of a fixed collection of records that are static and do not change once the dataset is complete.

Consider the following example: a retail company wants to analyze sales data in real-time to optimize inventory management, detect fraud, and provide personalized recommendations. There are many different data sources, such as online transactions, inventory management systems, app interactions, and more. Some analysis involves batch processing, and some involves data streaming. Two scenarios:

  • Nightly jobs to identify popular products and update inventory (batch processing).
  • Real-time analysis of incoming sales data for fraud detection and recommendation engines (data streaming).

Kafka, Confluent, and MSK

Apache Kafka is the most widely used event streaming platform. Kafka is known for its fault tolerance, scalability, and high throughput. With Kafka, you can publish and subscribe (pub/sub) to streams of events, store streams of events, and process streams of events. In our retail example, Kafka could be used to collect and store the data streams from various sources, connecting events to both real-time and batch processing systems.

Tools like Confluent and Amazon Managed Streaming for Apache Kafka (Amazon MSK) offer managed Kafka services for a cost which we previously compared. Running Kafka on its own is widely considered to be an operational challenge so these managed services abstract to those issues.

Apache Flink is a data processing framework that applies stateful computations to process unbounded and bounded data sets. It can handle millions to billions of real-time events concurrently and process massive amounts of data, scaling up to several terabytes of application state. It has advanced stream processing capabilities, such as event time processing, watermarks, and windowing.

Flink also ensures fault tolerance through mechanisms such as checkpointing, which allows it to recover from failures and maintain data consistency and processing correctness. With checkpointing, Flink is able to guarantee exactly-once results, meaning there will be no unprocessed or duplicate data.

An example large-scale deployment of Flink is from Shopify, where the Shopify engineering team used Flink to help process data from millions of merchant stores. In our retail example, Flink could collect multiple transactions and store fraud data about them such as rate of transaction, credit card details, and location - in order to determine if a user should be blocked from making additional purchases.

Flink and Kafka are commonly deployed together as complementary tools. Since Flink does not have a storage layer, it can be integrated with Kafka, leveraging it as the storage layer. Moreover, given Kafka’s comparatively weaker data streaming and analytics capabilities, pairing it with Flink provides a robust computation layer on top of Kafka’s storage and message delivery layer.

Flink and Kafka both include data streaming, but they handle mostly different use cases. Kafka consists of five major APIs with the API that most closely resembles Flink called the Kafka Streams API. We compare their functionality in the table below.

Aspect Flink Kafka Streams
Streams Supports both unbounded and bounded streams Only supports unbounded streams
Performance Generally achieves lower latency due to event-driven processing Delivers low latency and high throughput
Deployment Model Standalone cluster or can be integrated with resource managers like Apache Mesos, Kubernetes, or Apache YARN Deployed as part of a Kafka cluster and runs as a library within an application
Data Sources Data is ingested from multiple sources Data is ingested solely from Kafka topics
Storage No underlying storage Uses Kafka storage

Comparison table of Flink vs Kafka Streams

Kafka Streams may be sufficient for simpler stream processing tasks within the Kafka ecosystem; however, Flink is the better choice for workflows with complex scenarios that require higher performance, fault tolerance, or integration with other systems beyond Kafka. Also, as we mentioned previously, you can pair Flink with Kafka to take advantage of Kafka benefits, such as storage.

Amazon Managed Flink, renamed from Amazon Kinesis Data Analytics, is used to easily create and deploy applications with Flink. With Amazon Managed Flink you do not need to manage clusters or set up any infrastructure. A benefit of Amazon Managed Flink is that you can easily integrate with other Amazon services, such as MSK.

The following chart uses anonymous data from Vantage users to illustrate the breakdown of total customer spend across the Amazon streaming services. Each service’s portion of the pie represents the percentage of the total spend attributed to it.

Pie chart representing the percentages of Kinesis, MSK, and Amazon Managed Flink Costs in Vantage

Pie chart representing the percentages of Kinesis, MSK, and Amazon Managed Flink Costs in Vantage

For Confluent, adding full Flink support represents potentially a double digit boost in revenue overnight, if they only hit the benchmarks from AWS in our data. In fact, Confluent’s specialization with Kafka likely means that Flink could be even more popular with Confluent users than it is with the MSK customer base.

Confluent is expanding their product to include Flink in its service as a unified platform, wherein Flink is the streaming compute layer and Kafka is the storage layer.

Confluent’s aim is to make Flink with Confluent simple, serverless, and scalable. One way Confluent is doing this is by streamlining the development and deployment process by removing the need to size apps. Additionally, they are consolidating the two connectors that integrate with Kafka into a single unified Kafka connector, simplifying the integration process and reducing complexity for users. Lastly, Confluent Flink will automate tedious tasks, for example, removing the need for CREATE TABLE statements or data type mappings.

For this comparison, we are focusing only on Flink pricing, not additional costs that may come from integration with Kafka, or additional AWS or Confluent charges. Pricing is in the US East region.

With Amazon Managed Service for Flink, you are only charged for what you use, with no need to provision resources. You are charged at an hourly rate determined by your of Kinesis Processing Units (KPUs) usage. KPUs are units representing system resources. One KPU corresponds to 1 vCPU compute and 4 GB of memory. You are also charged for running application storage (50GB per KPU) for stateful processing capabilities stateful processing capabilities.

There are two options for running Flink jobsManaged Service for Flink and Managed Service for Flink Studio. With Managed Service for Flink you use Java, Scala, or Python (and embedded SQL) with Flink Datastream or Table APIs to build Flink applications. Here, the number of KPUs required will be scaled as memory and compute fluctuate, or you can provision KPUs. With Managed Service for Flink Studio, you use standard SQL, Python, and Scala to build Flink applications. With Managed Service for Flink you are charged one additional KPU per application for orchestration, whereas with Managed Service for Flink Studio, you are charged two additional KPU per Studio application, one for orchestration and one for the environment.

Dimension Cost
KPU $0.11 per hour
Running Application Storage $0.10 per GB-month

Amazon Managed Service for Flink Pricing pricing table

Similar to Amazon Managed Service for Flink, with Confluent Cloud for Flink Pricing you are only charged for resources used. However, with Confluent’s billing system, it is even more exact since you are charged per CFU per minute your query is running.

To understand Confluent Cloud for Flink Pricing there are two concepts to understand—compute pools and CFUs. Compute pools are sets of region-based compute resources used to run your SQL statements where resources are shared between all statements.

CFUs are units of processing power that are used to measure the capacity of a compute pool. CFUs scale automatically based on the resources consumed, so while you cannot set the number of CFUs per individual statement, you can set the maximum size of your compute pool. Note—each Flink statement uses at least one CFU-minute.

Dimension Cost
CFU $0.0035 CFU-minute

Confluent Cloud for Flink pricing table

The biggest difference is the billing dimension—With Amazon Managed Service you are charged hourly per KPU. With Confluent Cloud for Flink, you are charged per statement, allowing for a more finely-grained billing structure.

Conclusion

The integration of Flink and Kafka within Confluent’s robust ecosystem represents a strategic maneuver by Confluent in the quickly evolving data processing landscape. The ability to use Flink and Kafka in one unified platform is unique to Confluent and makes it even more competitive as a product. The unified platform offers benefits such as simplified deployment, reduced operational overhead, and improved efficiency in building and managing real-time data pipelines.