The ability to process stream data is ubiquitous in modern information systems. The grand challenge in establishing a processing framework for powering such systems is how to strike the right balance between expressivity and computability in a highly dynamic setting. The expressivity of the framework reflects what kind of input data and what types of processing operations it enables. The computability corresponds to its ability to process a certain workload (e.g., processing workflow and data size) under an execution setting (e.g., CPU, RAM and network bandwidth).

So far, various research communities have independently addressed this challenge by imposing their application-specific trade-offs and assumptions to their underlying processing model. Such trade-offs and assumptions are driven by prior-knowledge on data characteristics (e.g., format, modality, schema and distribution), processing workload and computation settings. However, the recent developments of the Internet of Things and AI have brought completely new levels of expressivity of the processing pipeline as well as dynamicity of computation settings. For instance, a typical processing pipeline of a connected vehicle includes not only multimodal stream elements generated at real-time by unprecedented types of sensors but also very complex processing workflows including logic reasoning and statistical inference. Furthermore, this pipeline can be executed in a highly dynamic distributed setting, e.g. combining in-car processing units and cloud/edge computing infrastructures. The processing pipeline and the setup of this kind hence need a radical overhaul of the state of the art of several areas.

To this end, this project will aim to establish computing foundations that enable a unified processing framework to address this grand challenge. The targeted framework will propose a semantic-based processing model with a standard-oriented graph data model and query language fragments, called Semantic Stream Processing. Therefore, the project will carry out a systematic study on tractable classes of a wide range of processing operators, e.g, graph query pattern, logic reasoning, and statistical inference on stream data. The newly-invented tractable classes of processing operations will pave the way for designing efficient classes of incremental evaluation algorithms. To address the scalability, the project will also study how to elastically and robustly scale a highly expressive stream processing pipeline in a dynamic and distributed computing environment. Moreover, the project will investigate a novel optimisation mechanism which combines the logic optimisation algorithms which exploit rewriting rules and pruning constraints with adaptive optimisation algorithms which continuously optimise its execution plans based on runtime statistics. The proposed algorithms and framework will be extensively and systematically evaluated in two application domains, connected vehicles and the Web of Things.

Topics: