Skip to main content

Dataflow Programming

Hydroflow+ uses a dataflow programming model, which will be familiar if you have used APIs like Rust iterators. Instead of using RPCs or async/await to describe distributed computation, Hydroflow+ instead uses asynchronous streams, which represent data arriving over time. Streams can represent a series of asynchronous events (e.g. inbound network requests) or a sequence of data items.

Programs in Hydroflow+ describe how to transform entire collections of data using operators such as map (transforming elements one by one), fold (aggregating elements into a single value), or join (combining elements from multiple streams on matching keys).

If you are familiar with Spark, Flink or Pandas, you will find Hydroflow+ syntax familiar. However, note well that the semantics for asynchronous streams in Hydroflow+ differ significantly from bulk analytics systems like those above. In particular, Hydroflow+ uses the type system to distinguish between bounded streams (originating from finite data) and unbounded streams (originated from asynchronous input). Moreover, Hydroflow+ is designed to handle asynchronous streams of small, independent events very efficiently.