Scheduling in Distributed Stream Processing Systems

Khorlin, 2006

Category: Distributed Systems

Overall Rating

4.0/5 (28/35 pts)

This paper offers a unique, actionable path for modern research primarily through its clear and early articulation of the distributed stream processing scheduling problem as optimizing end-to-end QoS costs over a queuing network, highlighting critical challenges related to queuing delay and non-linear costs.
While its specific proposed algorithms are outdated and impractical due to scalability limitations, this foundational problem framing and the identified challenges provide a solid theoretical and experimental starting point for applying powerful modern techniques like Deep Reinforcement Learning and Graph Neural Networks...

The core idea of treating scheduling in distributed stream processing systems as an optimization problem over a queuing network, minimizing a global cost derived from quality of service (QoS) functions, holds significant latent novelty.
Modern advancements in machine learning, particularly Graph Neural Networks (GNNs) and sophisticated data-driven modeling techniques, could potentially build richer, more scalable predictive models of complex distributed queuing networks than were feasible then.
Modern reinforcement learning (RL) or deep reinforcement learning (DRL) techniques could be applied here. DRL agents could learn highly complex, dynamic local scheduling policies by observing richer local state...
The identified "Cost of Average vs. Average Cost" issue for non-linear QoS functions is a fundamental statistical challenge when optimizing based on means rather than distributions. Modern techniques for optimization under uncertainty... could offer new ways to address this problem...

The paper's setting is stream processing systems in 2006. The assumptions about distributed systems, network topologies, and computational paradigms have shifted significantly since then.
The SMBS approach, relying on Markov chains, faces a fundamental state-space explosion problem that limits it to a very small number of queues...
The "Cost of Average vs. Average Cost" problem (Section 4.7) highlights a significant theoretical weakness: the optimization metric (cost of average delay) is a poor proxy for the true objective (average cost) for non-linear QoS functions...
Modern distributed stream processing systems and cluster schedulers employ diverse techniques for resource allocation and scheduling that have largely superseded the specific approaches here.

Act