Skip to main content

Streaming, Blocking and Stratification

Many DFIR operators (e.g. map, filter and join) work in a streaming fashion. Streaming operators process data as it arrives, generating outputs in the midst of processing inputs. If you restrict yourself to operators that work in this streaming fashion, then your transducer may start sending data across the network mid-tick, even while it is still consuming the data in the input batch.

But some operators are blocking, and must wait for all their input data to arrive before they can produce any output data. For example, a sort operator must wait for all its input data to arrive before it can produce a single output value. After all, the lowest value may be the last to arrive!

Examples of Blocking Operators

The discussion above should raise questions in your mind. What do we mean by "all the input data" in a long-running service? We don't want to wait until the end of time—this is one reason we break time up into discrete "ticks" at each transducer. So when we say that a blocking operator waits for "all the input data", we mean "all the input data in the current tick".

Consider the simple statement below, which receives data from a network source each tick, sorts that tick's worth of data, and prints it to stdout:

source_stream(inbound) -> sort() -> for_each(|x| println!("{:?}", x));

The runtime determines arbitrarily what batch of data is taken from the channel and fed into the source_stream_serde operator for this tick. The sort operator will need to know that the source_stream_serde operator has no more data to send this tick, so that it can sort the data that got buffered and then send the sorted data to the for_each operator, which prints it to stdout. To do this, the runtime provides a mechanism for the source_stream_serde operator to buffer its output and notify the sort operator that it has no more data to send. This is called a handoff.

You can see the mermaid graph for the statement above just below this paragraph. Notice the two outer rectangles and the handoff between them. Each rectangle is a subflow that is assigned a stratum number. ("Stratum" is latin for "layer"; the plural of "stratum" is "strata".)

At compile time, the DFIR spec is stratified: partitioned into subflows, where each subflow is assigned a stratum number. Subsequently at runtime, each tick executes the strata one-by-one in ascending order of stratum number. In the example above, the source_stream operator is in stratum 0, and the sort and for_each operators are in stratum 1. The runtime executes the source_stream operator first, buffering output in the Handoff. The sort operator will not receive any data until the source_stream operator has finished executing. When stratum 0 is complete, the subflow in stratum 1 is scheduled and executes the sort and for_each operators to complete the tick.

Let's look back at the difference operator as used in the Graph Unreachability example.

The difference operators is one with inputs of two different types. It is supposed to output all the items from its pos input that do not appear in its neg input. To achieve that, the neg input must be blocking, but the pos input can stream. Blocking on the neg input ensures that if the operator streams an output from the pos input, it will never need to retract that output.

Given these examples, we can refine our diagram of the DFIR transducer loop to account for stratified execution within each tick:

Technical Details

The concept of stratification is taken directly from stratified negation in the Datalog language. DFIR identifies a stratum boundary at any blocking input to an operator, where classical Datalog only stratifies its negation operator.

The DFIR compiler performs stratification via static analysis of the DFIR spec. The analysis is based on the following rules:

  • A Handoff is interposed in front of any blocking input to an operator (as documented in the operator definitions).
  • The flow is partitioned at the Handoffs into subflows called "strata".
  • The resulting graph of strata and Handoffs is tested to ensure that it's acyclic. (Cycles through blocking operators are forbidden as they not have well-defined behavior—note that the blocking operators in a cycle would deadlock waiting for each other.)

Given the acyclicity test, any legal DFIR program consists of a directed acyclic graph (DAG) of strata and handoffs. The strata are numbered in ascending order by assigning stratum number 0 to the "leaves" of the DAG (strata with no upstream operators), and then ensuring that each stratum is assigned a number that is one larger than any of its upstream strata.

As a DFIR operator executes, it is running on a particular transducer, in a particular tick, in a particular stratum.

Determining whether an operator should block: Monotonicity

Why are some inputs to operators streaming, and others blocking? Intuitively, the blocking operators must hold off on emitting outputs early because they may receive another input that would change their output. For example, a difference operator on integers cannot emit the number 4 from its pos input if it may subsequently receive a 4 within the same tick on the neg input. More generally, it cannot output anything until it has received all the neg input data.

By contrast, streaming operators like filter have the property that they can always emit an output, regardless of what other data they will receive later in the tick.

Mathematically, we can think of a dataflow operator as a function f(in) → out from one batch of data to another. We call a function monotone if its output is a growing function of its input. That is, f is classified as a monotone function if f(B) ⊆ f(C) whenever B ⊆ C.

By contrast, consider the output of a blocking operator like difference. The output of difference is a function of both its inputs, but it is non-monotone with respect to its neg input. That is, it is not the case that (A — B) ⊆ (A — C) whenever B ⊆ C.

DFIR is designed to use the monotonicity property to determine whether an operator should block. If an operator is monotone with respect to an input, that input is streaming. If an operator is non-monotone, it is blocking.

Monotonicity turns out to be particularly important for distributed systems. In particular, if all your transducers are fully monotone across ticks, then they can run in parallel without any coordination—they will always stream correct prefixes of the final outputs, and eventually will deliver the complete output. This is the positive direction of the CALM Theorem.

In future versions of DFIR, the type system will represent monotonicity explicitly and reason about it automatically.