Git Plugin Performance Improvement: Final Phase and Release

    Since the beginning of the project, the core value which drove its progress was "To enhance the user experience for running Jenkins jobs by reducing the overall execution time".

    To achieve this goal, we laid out a path:

    • Compare the two existing git implementations i.e CliGitAPIImpl and JGitAPIImpl using performance benchmarking

    • Use the results to create a feature which would improve the overall performance of git plugin

    • Also, fix existing user reported performance issues

    Let’s take a journey to understand how we’ve built the new features. If you’d like to skip the journey part, you can directly go to the [major performance improvements] section and the [minor performance section] to see what we’ve done!

    Journey to release

    The project started with deciding to choose a git operation and then trying to compare the performance of that operation by using command line git and then with JGit.

    Stage 1: Benchmark results with git fetch

    git-fetch-results

    • The performance of git fetch (average execution time/op) is strongly correlated to the size of a repository

    • There exists an inflection point on the scale of repository size after which the nature of JGit performance changes (it starts to degrade)

    • After running multiple benchmarks, it is safe to say that for a large sized repository command line git would be a better choice of implementation.

    • We can use this insight to implement a feature which avoids JGit with large repositories.

    Stage 2: Comparing platforms

    The project was also concerned that there might be important differences between operating systems. For example, what if command line Git for Windows performed very differently than command line Git on Linux or FreeBSD? Benchmarks were run to compare fetch performance on several platforms.

    Running git fetch operation for a 400 MiB sized repository on:

    • AMD64 Microsoft Winders

    • AMD64 FreeBSD

    • IBM PowerPC 64 LE Ubuntu 18

    • IBM System 390 Ubuntu 18

    The result of running this experiment is given below:

    Performance on multiple platforms

    The difference in performance between git and JGit remains constant across all platforms.

    Benchmark results on one platform are applicable to all platforms.

    Stage 3: Performance of git fetch and repository structure

    git repo diagram

    The area of the circle enclosing each parameter signifies the strength of the positive correlation between the performance of a git fetch operation and that parameter. From the diagram:

    • Size of the aggregated objects is the dominant player in determining the execution time for a git fetch

    • Number of branches and Number of tags play a similar role but are strongly overshadowed by size of repository

    • Number of commits has a negligible effect on the performance of running git fetch

    After running these experiments from Stage-1 to Stage-3, we developed a solution called the GitToolChooser which is explained in the next stage

    Stage 4: Faster checkout with Git tool chooser

    This feature takes the responsibility of choosing the optimal implementation from the user and hands it to the plugin. It takes the decision of recommending an implementation on the basis of the size of the repository. Here is how it works.

    git perf improv

    The image above depicts the performance enhancements we have performed over the course of the GSoC project. These improvements have enabled the checkout step to be finished within half of what it used to take earlier in some cases.

    Let’s talk about performance improvements in two parts.

    Major performance improvements

    Major performance enhancements

    Building Tensorflow (~800 MiB) using a Jenkins pipeline, there is over 50% reduction in overall time spent in completing a job! The result is consistent multiple platforms.

    The reason for such a decrease is the fact that JGit degrades in performance when we are talking about large sized repositories. Since the GitToolChooser is aware of this fact, it chooses to recommend command line git instead which saves the user some time.

    Minor performance improvements

    Note: Enable JGit before using the new performance features to let GitToolChooser work with more optionsHere’s how

    git minor perf

    Building the git plugin (~ 20 MiB) using a Jenkins pipeline, there is a drop of a second across all platforms when performance enhancement is enabled. Also, eliminating a redundant fetch reduces unnecessary load on git servers.

    The reason for this change is the fact that JGit performs better than command line git for small sized repositories (<50MiB) as an already warmed up JVM favors the native Java implementation.

    Releases

    The road ahead

    • Support from other branch source plugins

      • Plugins like the GitHub Branch Source Plugin or GitLab Branch Source Plugin need to extend an extension point provided by the git plugin to facilitate the exchange of information related to size of a remote repository hosted by the particular git provider

    • JENKINS-63519: GitToolChooser predicts the wrong implementation

    • Addition of this feature to GitSCMSource

    • Detection of lock related delays accessing the cache directories present on the controller

      • This issue was reported by the plugin maintainer Mark Waite, there is a need to reproduce the issue first and then find a possible solution.

    Reaching out

    Feel free to reach out to us for any questions or feedback on the project’s Gitter Channel or the Jenkins Developer Mailing list. Report an issue at Jenkins Jira.

    About the Author
    Rishabh Budhouliya
    Rishabh Budhouliya

    GSoC 2020 student under the Jenkins project (Git Plugin Performance Improvements). Aspiring to be better at Software Development and participate more in the open source community.