Hazelcast Jet outperforms Flink and Spark with 40ms average latency benchmark
Hazelcast, the leading open source in-memory data grid (IMDG) with hundreds of thousands of installed clusters and over 26 million server starts per month, today announced the 0.4 release of Hazelcast Jet – an application-embeddable, distributed processing engine for big data stream and batch. Major new functionality in Hazelcast Jet 0.4 includes event-time processing with tumbling, sliding and session windowing. Using these new capabilities, users benefit from a feature-rich stream processing architecture which provides a flexible mechanism to build and evaluate windows over continuous data streams. Easy to use, deploy and program, Hazelcast Jet is appropriate for applications such as sensor updates in IoT architectures (house thermostats, lighting systems), in-store e-commerce systems and social media platforms.
Stream processing has overtaken batch processing as a preferred method of processing big data sets for companies that require immediate insight into data. However, to get value from data, it must be partitioned i.e. take a fragment of the stream and analyse it. To classify data windows during processing, each data element in the stream needs to be associated with a timestamp. In Hazelcast Jet 0.4 this is achieved via event-time processing (a logical, data-dependent timestamp, embedded in the event itself). However, a major drawback of event-time processing is that events may arrive out of order or late, so you can never be sure if you see all events in a given time window.
To alleviate this issue, the latest release of Hazelcast Jet also includes windowing functionality which enables users to evaluate stream processing jobs at regular time intervals, regardless of how many incoming messages the job is processing. Hazelcast Jet offers three types of windows:
â Fixed/tumbling – time is partitioned into same-length, non-overlapping chunks. Each event belongs to exactly one window.
â Sliding – windows have fixed length, but are separated by a time interval (step) which can be smaller than the window length. Typically the window interval is a multiplicity of the step.
â Session – windows have various sizes and are defined basing on data, which should carry some session identifiers.
Additional enhancements in Hazelcast Jet 0.4 also include:
â Users are now able to use the ICache/Hazelcast integration as a source and sink of data.
â java.util.stream can be used on top of ICache to enable basic data processing.
â Streaming File Connector – improved connector allows users to watch files and directories for changes.
â Numerous Hazelcast Jet code samples are now available which can be used as building blocks for Jet applications, providing a gradual learning experience.
In a new latency benchmark study published today Hazelcast Jet outperformed its competitors with a 40ms average latency for stream processing computations which remained flat as messages increased. Flink and Spark’s execution latencies were hundreds of ms rising to seconds at the higher message throughputs.
The study compares the average latencies of Hazelcast Jet, Flink and Spark Streaming under various different criteria such as message rate and window size. The full benchmark is available here. Results can be viewed in the tables below (all results are given in milliseconds).
1 second tumbling window:
10 seconds by 1 second sliding window:
*Latencies increased as the framework was not able to keep up with input
With Hazelcast IMDG providing storage functionality, Hazelcast Jet is an Apache 2 licensed open source project that performs parallel execution to enable data-intensive applications to operate in near real-time. Built on top of a one-record-per-time architecture (sometimes known as continuous operators), Hazelcast Jet processes incoming records as soon as possible, opposed to accumulating records into micro-batches, consequently lowering latency for applications.
Greg Luck, CEO of Hazelcast, said: “The Jet project is progressing faster than we could have hoped. The new functionality in 0.4 brings stream processing for the first time. As with batch, we are achieving a new performance level, giving us a real edge over alternative market solutions. Jet’s architecture is performance and low latency driven, which is why there are no real surprises in the results of our latest benchmark. Driven by the community, Jet is an easy to deploy fast data solution for programmers built on the premise of simplicity.”
Hazelcast will be providing 24x7 enterprise support subscriptions for Hazelcast Jet.
Hazelcast is the leading provider of operational in-memory computing with hundreds of thousands of installed clusters and over 26 million server starts per month. The Hazelcast In-Memory Data Grid helps leading companies, like Capital One, Chicago Board Options Exchange, Deutsche Bank, Ellie Mae, and Mizuho Securities USA, manage their data and distribute processing using in-memory storage and parallel execution for breakthrough application speed and scale.
Hazelcast’s developer-friendly approach makes it easy to modernize existing applications while providing a platform for building new innovative solutions. Hazelcast is headquartered in Silicon Valley’s Palo Alto, with offices in Ankara, Istanbul, London, and New York City.