Skip to content

Dataflow

The dataflow module (see flow::compute module) is the core computing module of flow. It takes a SQL query and transforms it into flow's internal execution plan. This execution plan is then rendered into an actual dataflow, which is essentially a directed acyclic graph (DAG) of functions with input and output ports. The dataflow is triggered to run when needed.

Currently, this dataflow only supports map and reduce operations. Support for join operations will be added in the future.

Internally, the dataflow handles data in row format, using a tuple (row, time, diff). Here, row represents the actual data being passed, which may contain multiple Value objects. time is the system time which tracks the progress of the dataflow, and diff typically represents the insertion or deletion of the row (+1 or -1). Therefore, the tuple represents the insert/delete operation of the row at a given system time.