Hydroflow graphs are divided into two layers: the outer scheduled layer and inner compiled layer.
In the mermaid diagrams generated for Hydroflow, the scheduler is responsible for the outermost gray boxes (the subgraphs) and handoffs, while the insides of the gray boxes are compiled via the compiled layer. As a reminder, here is the graph from the reachability chapter, which we will use a running example:
The Hydroflow Architecture Design Doc contains a more detailed explanation of this section. Note that some aspects of the design doc are not implemented (e.g. early yielding) or may become out of date as time passes.
The scheduled layer is dynamic: it stores a list of operators (or "subgraphs") as well as buffers ("handoffs") between them. The scheduled layer chooses how to schedule the operators (naturally), and when each operator runs it pulls from its input handoffs and pushes to its output handoffs. This setup is extremely flexible: operators can have any number of input or output handoffs so we can easily represent any graph topology. However this flexibility comes at a performance cost due to the overhead of scheduling, buffering, and lack of inter-operator compiler optimization.
The compiled layer helps avoids the costs of the scheduled layer. We can combine several operators into a subgraph which are compiled and optimized as a single scheduled subgraph. The scheduled layer then runs the entire subgraph as if it was one operator with many inputs and outputs. Note the layering here: the compiled layer does not replace the scheduled layer but instead exists within it.
Rust already has a built-in API similar to dataflow:
When they work well, they work really well and tend to be as fast as for loops.
However, they are limited in the flow graphs they can represent. Each operator
Iterator takes ownership of, and pulls from, the previous operator.
Taking two iterators as inputs and merging them together (e.g. with
is natural and performant as we can pull from both. However if we want an
iterator to split into multiple outputs then things become tricky.
does just this by cloning each incoming item into two output buffers, but this
requires allocation and could cause unbounded buffer growth if the outputs are
not read from at relatively even rates.
However, if instead iterators were push-based, where each operator owns one or more output operators, then teeing is very easy, just clone each element and push to (i.e. run) both outputs. So that's what we did, created push-based iterators to allow fast teeing or splitting in the compiled layer.
Pull-based iterators can be connected to push-based iterators at a single "pivot" point. Together, this pull-to-push setup dictates the shape compiled subgraphs can take. Informally, this is like the roots and leaves of a tree. Water flows from the roots (the pull inputs), eventually all join together in the trunk (the pull-to-push pivot), the split up into multiple outputs in the leaves. We refer to this structure as an in-out tree.
|The Oaklandish Logo.|
In the reachability example above, the pivot is the
flat_map() operator: it pulls from the join that feeds it, and pushes to the tee that follows it. In essence that
flat_map() serves as the main thread of the compiled component.
See Subgraph In-Out Trees for more, including how to convert a graph into in-out trees.
Surface Syntax and APIs
Hydroflow's Surface Syntax hides
the distinction between these two layers. It offers a natural
Iterator-like chaining syntax for building
graphs that get parsed and compiled into a scheduled graph of one or more compiled subgraphs. Please see the Surface Syntax docs for more information.
Alternatively, the Core API allows you to interact with handoffs directly at a low
level. It doesn't provide any notion of chainable operators. You can use Rust
or any other arbitrary Rust code to implement the operators.
The Core API lives in
mainly in methods on the
Hydroflow struct. Compiled push-based iterators live in
hydroflow::compiled. We intend users to use the Surface Syntax as it is much more friendly, but as
Hydroflow is in active development some operators might not be available in
the Surface Syntax, in which case the Core API can be used instead. If you find
yourself in this sitation be sure to submit an issue!