Machine Learning Plugin project - Coding Phase 2 blog post

    jenkins gsoc logo small

    Welcome back folks!

    This blog post is about my coding phase 2 in Jenkins Machine Learning Plugin for this GSoC 2020. After successfully passing the evaluation and demo in the phase 1, our team went ahead for facing the challenges in phase 2.

    Summary

    This phase of coding was well spent by documentation and by fixing many bugs. As the main feature of connecting to an IPython Kernel is done in phase 1, we were able to focus on fixing minor/major bugs and documenting for the users. According to the JENKINS-62927 issue, a Docker agent was built to facilitate users without concerning plugin dependencies in python. In the act of deprecation of Python 2, we ported our plugin to support Python 3. We have tested our plugin in Conda, venv and Windows environments. Machine learning plugin has successfully passed the end to end test. A feature for a code editor is needed for further discussion/analysis as we have done a simple editor that may be useful in other ways in the future. PR#35

    Main features of Machine Learning plugin

    • Run Jupyter notebook, (Zeppelin) JSON and Python files

    • Run Python code directly

    • Convert Jupyter Notebooks to Python and JSON

    • Configure IPython kernel properties

    • Support to execute Notebooks/Python on Agent

    • Support for Windows and Linux

    Upcoming features

    • Extract graph/map/images from the code

    • Save artifacts according to the step name

    • Generate reports for corresponding build

    Future improvements

    • Usage of JupyterRestClient

    • Support for multiple language kernels

      • Note : There is no commitment on future improvements during GSoC period

    Docker agent

    The following Dockerfile can be used to build the Docker container as an agent for the Machine Learning plugin. This docker agent can be used to run notebooks or python scripts.

    Dockerfile
    FROM jenkins/agent:latest
    
    MAINTAINER Loghi <loghijiaha@gmail.com>
    
    USER root
    
    RUN apt update && apt install --no-install-recommends python3 -y \
        python3-pip \
        && rm -rf /var/lib/apt/lists/*
    
    COPY requirements.txt /requirements.txt
    
    RUN pip3 install --upgrade pip setuptools && \
        pip3 install --no-cache-dir -r /requirements.txt && \
        ln -sf /usr/bin/python3 /usr/bin/python && \
        ln -sf /usr/bin/pip3 /usr/bin/pip
    
    USER jenkins

    Ported to Python 3

    As discussed in the previous meeting, we concluded that the plugin should support Python 3 as Python 2.7+ has been deprecated since the beginning of 2020. Pull request for docker agent should be also ported to Python 3 support.

    Jupyter Rest Client API

    The Jupyter Notebook server API seemed to be promising that it can be also used to run notebooks and codes. There were 3 api implementations that were merged in the master. But we had to focus on what was proposed in the design document and had to finish all must-have issues/works. Jupyter REST client was left for future implementation. It is also a good start to contribute to the plugin from the community.

    Fixed bugs for running in agent

    There were a few bugs related to the file path of notebooks while building a job. The major problem was caused by the python dependencies needed to connect to a IPython kernel. All issues/bugs were fixed before the timeline given.

    R support as a future improvement

    This is what we tried to give a glimpse of knowledge that this plugin can be extended for multi language support in the future. There was a conclusion that the kernel should be selected dynamically using extension of the script file(like eval_model.rb or train_model.r), instead of scripting the same code for each kernel.

    Documentation and End to End testing

    A well explained documentation was published in the repository. A guided tutorial to run a notebook checked out from a git repo in an agent was included in the docs page. Mentors helped to test our plugin in both Linux and Windows.

    Code editor with rebuild feature

    Code editor was filtered as a nice to have feature in the design document. After grabbing the idea of Jenkinsfile replay editor, I could do the same for the code. At the same time, when we are getting the source code from git, it is not an elegant way of editing code in the original code. After the discussion, we had to leave the PR open that may have use cases in the future if needed.

    Jenkins LTS update

    The plugin has been updated to support Jenkins LTS 2.204.1 as 2.164.3 had some problems with installing pipeline supported API/plugin

    Installation for experimental version

    1. Enable the experimental update center

    2. Search for Machine Learning Plugin and check the box along it.

    3. Click on Install without restart

    The plugin should now be installed on your system.

    About the Author
    Loghi Perinpanayagam
    Loghi Perinpanayagam

    Computer Science and Engineering Student at University of Moratuwa, Sri Lanka. He has been selected for Machine Learning plugin for Data Science in GSoC 2020 for Jenkins project. Hightly interested and contributing on open source projects.