Setting up a DASK cluster on your local machine

This notebook shows how to run a DASK cluster without a scheduler, that’s what you would do on your workstation for example.

%matplotlib inline
import xarray as xr

First import the client, this is going to be our interface to see what happens on the DASK cluster:

from dask.distributed import Client

Then we will start a DASK cluster locally:

from dask.distributed import LocalCluster
cluster = LocalCluster()

Connect the client to the cluster:

client = Client(cluster)



  • Workers: 4
  • Cores: 8
  • Memory: 17.18 GB

Click on the dashboard link and this will open a new browser tab for the dask dashboard. That’s it we have a cluster up and running. Now let’s see what we can do with it:

Sample computation:

Open a sample dataset with xarray. The path to the data can be either the path to a directory or http address.

ds = xr.tutorial.open_dataset('air_temperature')
Dimensions:  (lat: 25, lon: 53, time: 2920)
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model

Compute the time average. This is done lazyly so it is very quick. At this point no computations have been performed.

temp_mean = ds['air'].mean(dim=['time'])

Asking for plot or numerical values will trigger the computation. Time to check out the DASK dashboard ;)

%time temp_mean.plot(figsize=[10,8], cmap='gist_ncar')
CPU times: user 48.2 ms, sys: 7.5 ms, total: 55.7 ms
Wall time: 53.4 ms
<matplotlib.collections.QuadMesh at 0x120fee860>

once finished, we can shut down the cluster and the client: