knit

Knit enables data scientists to quickly launch, monitor, and destroy distributed programs on a YARN cluster.

Many YARN applications are simple distributed shell commands – running a shell command on several nodes in a cluster. Knit enables users to express and deploy applications and software environments managed under YARN through Python.

Knit is part of the `Dask`_ project, committed to bringing cluster-based data science within easy reach of python developers.

Motivation

Knit was built to support batch-oriented non-JVM applications. For example, Knit can deploy Python based distributed applications such as IPython Parallel, with particular support for `Dask`_. Knit was built with the following motivations in mind:

  • PyData Support Bring the PyData stack into the Hadoop/YARN ecosystem
  • Easy Setup: Support a minimal installation effort and the common cases with easy to use Python interface.
  • Deployable Runtimes: Build and ship self contained environments along with the application. Knit uses conda to resolve library dependencies and deploy user libraries without IT infrastructure and management

Scope

Knit enables data scientists to quickly launch, monitor, and destroy simple distributed programs.

Knit is not a full featured YARN solution. Knit focuses on the common case in scientific workloads of starting a distributed process on many workers for a relatively short period of time. Knit does handle some dynamic container management but it is not suitable for running long-term infrastructural applications.