Stateful Stream Processing

Stateful Stream-Processing for Deep Introspection

Live systems generate streams of incoming events that need to be tracked, correlated, and analyzed to identify patterns and trends — and then generate immediate feedback and alerts to steer operations.

With today’s ever more complex real-time systems, it’s not enough to just analyze patterns within data streams using conventional techniques. Applications need deeper introspection to extract full value from the telemetry they receive. They need to build dynamic models of data sources that they can continuously update and analyze. Called stateful stream-processing and popularized as the “digital twin” by Gartner, this breakthrough approach can harness machine learning, neural networks, and other advanced techniques to enable deep introspection and provide precise, timely feedback for live systems.

To illustrate the power of stateful stream-processing with ScaleOut StreamServer’s architecture, consider a heart-rate monitoring application which receives telemetry from wearable devices. ScaleOut StreamServer can route millions of incoming events to dynamic, in-memory components called “real-time digital twins” which track each patient’s unique medical history and current condition, analyzing events and generating timely alerts to medical professionals when needed. Unlike traditional, “stateless” stream processing, which just examines aggregated incoming telemetry as it arrives, ScaleOut StateServer separately analyzes telemetry for each data source in the context its unique state information. In the medical monitoring example, this state information could include medical history, current medications, recent medical events, patient activity, and much more. The result is deeper introspection and more timely, effective feedback and alerting.

Illustration of Stateful Stream-Processing for a Medical Monitoring Application

Wide Range of Applications

Stateful stream-processing with digital twins can dramatically improve the quality of real-time feedback for virtually any stream processing application. Here are just a few examples:

Vehicle telematics: tracking fleets of vehicles, drivers, and cargo to quickly detect emerging issues and help ensure timely deliveries
Safety and security monitoring: tracking physical and cyber sensors, as well as badged personnel, to detect and react to intrusions and/or unsafe operations
Financial services: portfolio tracking, wire fraud detection, stock back-testing, usage-based insurance
Internet of Things (IoT): device tracking for manufacturing, vehicles, fixed and mobile devices, and smart cities
Healthcare: real-time patient monitoring, medical device tracking and alerting
Logistics: real-time inventory reconciliation, manufacturing flow optimization

Digital Twin Builder

The ScaleOut Digital Twin Builder™ software toolkit dramatically simplifies the development of Java, C#, JavaScript, and rules-based real-time digital twin models and their deployment on ScaleOut StreamServer. These application-defined models specify the properties and message-processing code required to track dynamic state information for each data source and analyze its incoming events with a richer context than previously possible. This leads to deeper real-time introspection and better feedback and alerting.

For example, using the real-time digital twin model, a rental car company can track and analyze telemetry from each car in its fleet with digital twins that have detailed knowledge about each car’s rental contract, the driver’s demographics and driving record, and maintenance issues. With this information the application could, for example, alert managers when a driver repeatedly exceeds the speed limit according to criteria specific to the driver’s age and driving history or violates other terms of the rental contract. All of this leads to new insights on telemetry received from vehicles that otherwise would not be available in real time:

By tracking the state of data sources, digital twin models add value in almost every imaginable stream-processing application. They enable real-time streaming analytics that previously could only be performed in offline, batch processing. Here are a few examples:

They help IoT applications do a better job of predictive analytics when processing event messages by tracking the parameters of each device, when maintenance was last performed, known anomalies, and much more.
They assist healthcare applications in interpreting real-time telemetry, such as blood-pressure and heart-rate readings, in the context of each patient’s medical history, medications, and recent incidents, so that more effective alerts can be generated when care is needed.
They enable ecommerce applications to interpret website clickstreams with the knowledge of each shopper’s demographics, brand preferences, and recent purchases to make more targeted product recommendations.

Check out how the ScaleOut Digital Twin Builder delivers breakthrough capabilities for stateful stream-processing here.

A Unified Architecture for Stateful Stream-Processing

By integrating a fast, scalable stream-processing engine with an in-memory data grid, ScaleOut Software has created a unified software platform for the next-generation of stream processing. Unlike mainstream platforms such as Apache Flink, Spark, and Storm, ScaleOut StreamServer enables applications to implement object-oriented models of data sources. It can host large populations of data objects in memory on a cluster of commodity servers and dispatch incoming streaming events to these objects for analysis. Applications now can process incoming data streams in a rich context of evolving state, enabling the use of sophisticated algorithms while delivering blazingly fast event handling.

ScaleOut StreamServer’s innovative architecture delivers both breakthrough capabilities and peak performance for stateful stream processing. It processes incoming data streams within an in-memory data grid — where the data lives — ensuring minimum latency and peak throughput. Other platforms need to pull state information from remote data stores, such as database servers and distributed caches; this creates delays and network bottlenecks.

Instead, ScaleOut StreamServer delivers streamed events directly to their associated state data, enabling immediate, fully contextual processing. Its transparently scalable platform minimizes the latency required for event tracking and analysis, ensuring timely feedback and/or alerts for the largest workloads.

Integrated Streaming & Batch

ScaleOut StreamServer’s integration of an in-memory data grid and compute engine enables it to simultaneously perform both stream-processing and data-parallel analysis on grid-based data. This means that while digital twin objects are processing incoming events, MapReduce (and other data-parallel techniques) can analyze and report aggregate patterns and trends.

Fast, Scalable, and Highly Available

ScaleOut StreamServer employs a unified, fully distributed architecture that transparently scales across commodity servers or cloud instances to increase throughput as needed. This enables the platform to handle large workloads and ensure fast event processing. The grid’s integrated high availability keeps mission-critical data safe at all times.

Unique Advantages for Streaming Data

Traditional CEP and stream processing platforms, such as Apache Flink and Spark Streaming, focus on analyzing incoming data streams without regard to the context in which the data was created. The next generation of stream-processing tracks the dynamic state of data sources as “digital twins,” offering a basis for much deeper introspection and more effective alerting. ScaleOut StreamServer’s unique architecture, which executes streaming operations within an in-memory data grid, creates a breakthrough that enables stateful stream-processing with digital twin models.

Rich Feature Set

A Powerful Platform for Stateful Stream Processing

ScaleOut StreamServer unleashes the power of stateful stream-processing. Its seamless integration of a scalable stream-processing engine and in-memory data grid (IMDG) creates a powerful, unified platform for building digital twin models and performing deep introspection on streaming data — with blazing performance and built-in high availability. These capabilities are delivered as an intuitive, easy to use SDK that makes application development in C# and Java simple and straightforward. Automatic code shipping to the in-memory data grid simplifies application deployment and helps ensure fast startup times.

Key Features and Capabilities

Integrated IMDG and stream-processing engine to enable digital twin models while avoiding unnecessary data motion
Automatic code shipping to grid servers to simplify application deployment
Support for Reactive Extensions APIs for fast, straightforward event processing using familiar APIs
Integration with Kafka and Microsoft Azure IoT Hub to enable seamless connectivity to Kafka and Azure IoT messaging pipelines
Transparently scalable Kafka and Azure IoT connections that maximize messaging throughput as the workload grows
Comprehensive time windowing libraries that make it easy to add time windowing to digital twin models
Automatic event routing to associated grid objects that scales throughput as grid servers are added
Data-parallel APIs for MapReduce and other operators that integrate batch and stream processing on the same, grid-based data set
Object-oriented application design in both Java and C# to simplify development and ensure a clean separation between application logic and grid orchestration.

The Power of Object-Oriented Storage for Live Data

A core requirement for stateful stream processing is to track the state of data sources based on incoming streaming data. ScaleOut StateServer addresses this need with its scalable, highly available, in-memory data grid. The grid’s power in enabling the construction of digital twin models derives from its object-oriented view of data, which allows millions of entities to be tracked as individual objects, each maintaining values for a set of class-defined properties. Applications track and correlate changes to these entities by updating these objects as events flow into the grid.

For example, imagine a system tracking millions of cable TV viewers as they select shows and change channels. By using ScaleOut StreamServer’s IMDG, viewers can be represented as a huge collection of in-memory objects which are individually updated as channel-switch events flow into the grid. Each stored object tracks the behavior of a single viewer, correlating a sequence of events and enriching this data with programming information and viewer preferences. ScaleOut StreamServer also uses a data-parallel compute engine to analyze the population of viewers in parallel and immediately detect important patterns and trends.

Scalable Kafka Connections and Reactive Extensions

ScaleOut StreamServer was designed to seamlessly integrate into existing Kafka streaming data pipelines as both a consumer and producer of streaming data. It takes full advantage of the in-memory data grid’s architecture to automatically scale the number of Kafka connections for large workloads and avoid bottlenecks.

Application developers can use familiar APIs from the popular Reactive Extensions (RX) library to implement extremely fast, lightweight event processing in both C# and Java. These APIs have been integrated with Kafka to allow posting of incoming Kafka messages to RX observables associated with grid objects for stateful stream processing. Likewise, RX observables can be associated with Kafka producers to quickly send outgoing messages.

Designed for Operational Systems

Tracking and correlating live events requires dispatching a continuous stream of events to individual objects with very low latency. This allows the system to maintain a model of the live system, analyze it, and generate immediate results. ScaleOut StreamServer’s integrated in-memory data grid with its flexible storage models was specifically designed to provide the very low access latency required for operational intelligence.

Unlike business intelligence, which uses offline batch-processing, stream-processing for operational intelligence requires continuous availability so that analysis results are always available to provide feedback to a live system. ScaleOut StreamServer was designed from the ground up to provide high availability in both its data storage and compute engine. It makes use of patented and patent-pending techniques to ensure that in-memory data is always available and that processing continues even if a server or network outage occurs.

Industry-Leading Ease of Use

Fast Application Development: No Special Knowledge Required

By integrating stream-processing within an in-memory data grid, ScaleOut StreamServer offers capabilities not found in other platforms and delivers on the promise of stateful stream-processing while maximizing ease of use. Its familiar, object-oriented data storage and computing model in C# and Java supports advanced analysis algorithms and leverages everything you already know about object-oriented programming. Equally important, it ensures a clean separation between application-specific code and the platform’s orchestration of event processing. The net result is that stateful streaming applications are straightforward to write and run without the need for specialized knowledge of complex platform semantics designed for many stateless stream-processing platforms.

Let’s take a closer look. ScaleOut StreamServer organizes in-memory object storage as straightforward object collections in which objects are individually accessible using simple create/read/update/delete APIs. This means that they can track and correlate incoming events from a live system using a straightforward object model. Likewise, collections of objects can be queried using their class properties and analyzed in parallel for batch processing just by defining and invoking class methods. All aspects of application design take full advantage of well understood object-oriented concepts while leveraging more than three decades of experience in simplifying parallel supercomputing.

Automatic Deployment

ScaleOut StreamServer takes ease of use a step further by automating all deployment steps. Using a unique software concept called an “invocation grid,” it lets the application developer pre-stage the execution environment by shipping all required executable programs and libraries to grid servers. This eliminates the need to manually deploy application code and libraries on each server in the cluster, and it ensures that all servers are properly configured. As a further benefit, invocation grids accelerate startup times by avoiding the overhead of shipping code for each parallel method invocation.

The ScaleOut Digital Twin Builder fully automates the use of invocation grids for deploying and running digital twin models. Application developers need only focus on the semantics of their models, and ScaleOut StreamServer takes care of the rest.

ScaleOut Management Pack™ Included!

To further simplify application development, ScaleOut StreamServer comes bundled with ScaleOut Management Pack which includes comprehensive tools for observing, managing, and archiving grid-based data. This means that developers can directly track the state of the grid and easily verify that their applications are running as intended. From development to deployment to testing, ScaleOut StreamServer makes it easy.

ScaleOut StreamServer® and ScaleOut StreamServer® DT

Use Stateful Stream Processing for Deep Introspection