Knit enables data scientists to quickly launch, monitor, and destroy distributed programs on a YARN cluster.

Many YARN applications are simple distributed shell commands – running a shell command on several nodes in a cluster. Knit enables users to express and deploy applications and software environments managed under YARN through Python.

Knit is part of the `Dask`_ project, committed to bringing cluster-based data science within easy reach of python developers.


Knit was built to support batch-oriented non-JVM applications. For example, Knit can deploy Python based distributed applications such as IPython Parallel, with particular support for `Dask`_. Knit was built with the following motivations in mind:

  • PyData Support Bring the PyData stack into the Hadoop/YARN ecosystem
  • Easy Setup: Support a minimal installation effort and the common cases with easy to use Python interface.
  • Deployable Runtimes: Build and ship self contained environments along with the application. Knit uses conda to resolve library dependencies and deploy user libraries without IT infrastructure and management


Knit is not a full featured YARN solution. Knit focuses on the common case in scientific workloads of starting a distributed process on many workers for a relatively short period of time. Knit does handle some dynamic container management but it is not suitable for running long-term infrastructural applications.