import dask.array as da

arr = da.random.random(size=(1_000, 1_000), chunks=(250, 500))
arr


arr.sum()


_.compute()

499807.1679973727


(arr + arr.T).sum(axis=0).mean()


_.compute()

999.6143359947454


import dask

ddf = dask.datasets.timeseries()
ddf


ddf.groupby("name").sum()


_.compute()


ddf.x.resample("1H").mean().loc["2000-01-15"]

Dask Series Structure:
npartitions=1
2000-01-15 00:00:00.000000000    float64
2000-01-15 23:59:59.999999999        ...
Name: x, dtype: float64
Dask Name: loc, 151 tasks


_.compute()

timestamp
2000-01-15 00:00:00    0.000730
2000-01-15 01:00:00    0.002879
2000-01-15 02:00:00    0.000252
2000-01-15 03:00:00   -0.004852
2000-01-15 04:00:00   -0.004158
2000-01-15 05:00:00    0.006094
2000-01-15 06:00:00    0.004820
2000-01-15 07:00:00    0.008705
2000-01-15 08:00:00    0.014731
2000-01-15 09:00:00    0.007589
2000-01-15 10:00:00    0.005049
2000-01-15 11:00:00    0.000763
2000-01-15 12:00:00   -0.008005
2000-01-15 13:00:00    0.006928
2000-01-15 14:00:00    0.004477
2000-01-15 15:00:00   -0.005120
2000-01-15 16:00:00   -0.003076
2000-01-15 17:00:00   -0.004609
2000-01-15 18:00:00    0.010669
2000-01-15 19:00:00    0.001993
2000-01-15 20:00:00   -0.011027
2000-01-15 21:00:00    0.008612
2000-01-15 22:00:00    0.008686
2000-01-15 23:00:00   -0.009387
Freq: H, Name: x, dtype: float64

	id	x	y
name
Alice	99988092	3.672683	-603.242898
Bob	99491666	-30.556705	-107.608403
Charlie	99457254	-58.842019	168.359914
Dan	99888563	-15.870235	-162.695709
Edith	99729490	83.402682	-65.868199
Frank	99516999	99.684014	-267.065279
George	99325146	-44.165091	-235.215796
Hannah	99282032	70.649959	-228.623198
Ingrid	99679220	-230.035396	136.336888
Jerry	99702106	121.091422	-205.846074
Kevin	99516485	-192.304167	-22.732018
Laura	100257897	-281.199519	104.492034
Michael	99622137	-23.252458	-33.149406
Norbert	99976880	-133.840093	82.190995
Oliver	99923135	-49.054660	-69.711770
Patricia	99413582	167.956305	-35.632570
Quinn	99655323	81.625377	-81.761760
Ray	99445037	-201.510445	129.305448
Sarah	99224362	188.307865	-227.253566
Tim	99954279	173.205178	-362.966166
Ursula	99856910	115.580875	-35.032542
Victor	100077831	38.778105	-234.513779
Wendy	99980779	50.250366	-204.591499
Xavier	99468968	37.642749	-153.014391
Yvonne	99665690	-188.781225	-326.251246
Zelda	99879498	61.840510	166.020667

Overview¶

Dask Array - numpy-like¶

Dask Dataframe - pandas-like¶

	id	name	x	y
npartitions=30
2000-01-01	int64	object	float64	float64
2000-01-02	...	...	...	...
...	...	...	...	...
2000-01-30	...	...	...	...
2000-01-31	...	...	...	...