Everything operation in Dask whether it's dataframe or array or delayed, creates some tasks. Each task knows its dependencies so Dask can create a graph of all the tasks and which depend on which. Put another way: a task graph is a DAG (Directed Acyclyic Graph). Read all about task graphs and high level graphs.
import dask.array as da
arr = da.random.random(size=(1_000, 1_000), chunks=(250, 250))
arr.dask
HighLevelGraph with 1 layers.
random_sample-247f4bfed181679f520df03050ae4862
layer_type | MaterializedLayer |
---|---|
is_materialized | True |
shape | (1000, 1000) |
dtype | float64 |
chunksize | (250, 250) |
type | dask.array.core.Array |
chunk_type | numpy.ndarray |
arr
is itself a graph, and doing computations on arr
produces another graph.
result = (arr + arr.T).sum(axis=0).mean()
result.dask
HighLevelGraph with 7 layers.
random_sample-247f4bfed181679f520df03050ae4862
layer_type | MaterializedLayer |
---|---|
is_materialized | True |
shape | (1000, 1000) |
dtype | float64 |
chunksize | (250, 250) |
type | dask.array.core.Array |
chunk_type | numpy.ndarray |
transpose-528fb1b5144ae969909c5aa7a6fc18ee
layer_type | Blockwise |
---|---|
is_materialized | False |
shape | (1000, 1000) |
dtype | float64 |
chunksize | (250, 250) |
type | dask.array.core.Array |
chunk_type | numpy.ndarray |
add-9091506d774063d175eb93d5dd9699cd
layer_type | Blockwise |
---|---|
is_materialized | False |
shape | (1000, 1000) |
dtype | float64 |
chunksize | (250, 250) |
type | dask.array.core.Array |
chunk_type | numpy.ndarray |
sum-2d3c814648914f3a7d5935d8889873cf
layer_type | Blockwise |
---|---|
is_materialized | False |
shape | (1000, 1000) |
dtype | float64 |
chunksize | (250, 250) |
type | dask.array.core.Array |
chunk_type | numpy.ndarray |
sum-aggregate-dea1aa6295b1188748f9c3ab24b06348
layer_type | MaterializedLayer |
---|---|
is_materialized | True |
shape | (1000,) |
dtype | float64 |
chunksize | (250,) |
type | dask.array.core.Array |
chunk_type | numpy.ndarray |
mean_chunk-96f1f080ecc4444b1e9f23e76f7a8d73
layer_type | Blockwise |
---|---|
is_materialized | False |
shape | (1000,) |
dtype | float64 |
chunksize | (250,) |
type | dask.array.core.Array |
chunk_type | numpy.ndarray |
mean_agg-aggregate-d461f56db078ce142b2daa1b1f15e194
layer_type | MaterializedLayer |
---|---|
is_materialized | True |
shape | () |
dtype | float64 |
chunksize | () |
type | dask.array.core.Array |
chunk_type | numpy.ndarray |
For small enough graphs, we can get a visual representation of the graph. Each layer in the drawing of the task graph (below) corresponds to a layer in the high level graph (above).
result.visualize()
Once the graph is constructed it goes through optimization, scheduling, and execution.
result.visualize(optimize_graph=True, color="order", node_attr={"penwidth": "4"})