The runtime requirements of knit are python, lxml, requests, py4j. Python versions 2.7, 3.5 and 3.6 are currently supported. Dask is required to launch a Dask cluster. These are all available via conda (py4j on the conda-forge channel).

Testing depends on pytest.


Use pip or conda to install:

$ conda install knit -c conda-forge
$ pip install knit --upgrade

For dask clusters, you also need dask itself:

$ conda install dask distributed


The following steps can be used to install and run knit from source.

Update and install system dependencies (e.g., for debian systems):

$ sudo apt-get update
$ sudo apt-get install git maven openjdk-7-jdk -y

or install these via conda

$ conda install -y -c conda-forge setuptools maven openjdk

Clone git repository and build maven project:

$ git clone
$ cd knit
$ python install mvn

Testing on Docker

If you would like to test this package, but don’t have a YARN cluster hanging around, you could make a small test one in your machine. This is essentially how the Continuous Integration tests work.

$ export CONTAINER_ID=`docker run -d mdurant/hadoop` $ docker exec -it $CONTAINER_ID bash # conda install dask distributed -y # conda install -c conda-forge lxml py4j knit # py.test -vv knit