This is a guest post by Sam Van Oort,
Software Engineer at CloudBees and contributor to
the Jenkins project.
Today I’m going to show you best practices to write scalable and robust Jenkins Pipelines. This is drawn from a
combination of work with the internals of Pipeline and observations with large-scale users.
Pipeline code works beautifully for its intended role of automating
build/test/deploy/administer tasks. As it is pressed into more complex roles
and unanticipated uses, some users hit issues. In these cases, applying the
best practices can make the difference between:
A single controller running
hundreds
of concurrent builds on low end hardware (4 CPU cores and 4 GB of
heap)
Running a couple dozen builds and bringing a controller to its knees or
crashing it…even with 16+ CPU cores and 20+ GB of heap!
This has been seen in the wild.
Fundamentals
To understand Pipeline behavior you must understand a few points about
how it executes.
Except for the steps themselves, all of the Pipeline logic, the Groovy conditionals, loops, etc execute on the controller. Whether simple or complex! Even inside a node block!
Steps may use executors to do work where appropriate, but each
step has a small on-controller overhead too.
Pipeline code is written as Groovy but the execution model is
radically transformed at compile-time to Continuation Passing Style
(CPS).
This transformation provides valuable safety and durability
guarantees for Pipelines, but it comes with trade-offs:
Steps can invoke Java and execute fast and efficiently, but Groovy
is much slower to run than normal.
Groovy logic requires far more memory, because an object-based
syntax/block tree is kept in memory.
Pipelines persist the program and its state frequently to be able to
survive failure of the controller.
From these we arrive at a set of best practices to make pipelines more
effective.
Best Practices For Pipeline Code
Think of Pipeline code as glue: just enough Groovy code to connect
together the Pipeline steps and integrate tools, and no more.
This makes code easier to maintain, more robust against bugs, and
reduces load on controllers.
Keep it simple: limit the amount of complex logic embedded in the
Pipeline itself (similarly to a shell script) and avoid treating it as a
general-purpose programming language.
Pipeline restricts all variables to Serializable types, so keeping
Pipeline logic simple helps avoid a NotSerializableException - see
appendix at the bottom.
Use @NonCPS -annotated functions for slightly more complex work.
This means more involved processing, logic, and transformations. This
lets you leverage additional Groovy & functional features for more
powerful, concise, and performant code.
This still runs on controllers so be mindful of complexity, but is much
faster than native Pipeline code because it doesn’t provide durability
and uses a faster execution model. Still, be mindful of the CPU cost and
offload to executors for complex work (see below).
@NonCPS functions can use a much broader subset of the Groovy
language, such as iterators and functional features, which makes them
more terse and fast to write.
@NonCPS functions should not use Pipeline steps internally, however
you can store the result of a Pipeline step to a variable and use it
that as the input to a @NonCPS function.
Gotcha: It’s not guaranteed that use of a step will generate an
error (there is an open RFE to implement that), but you should not rely
on that behavior. You may see improper handling of exceptions, in
particular.
While normal Pipeline is restricted to serializable local variables
(see appendix at bottom), @NonCPS functions can use more complex,
nonserializable types internally (for example regex matchers, etc). Parameters
and return types should still be Serializable, however.
Gotcha: improper usages are not guaranteed to raise an error with
normal Pipeline (optimizations may mask the issue), but it is unsafe to
rely on this behavior.
Prefer external scripts/tools for complex or CPU-expensive
processing rather than Groovy language features. This offloads work
from the controller to external executors, allowing for easy scale-out of
hardware resources. It is also generally easier to test because these
components can be tested in isolation without the full on-controller
execution environment.
Many software vendors will provide easy command-line clients for
their tools in various programming languages. These are often robust,
performant, and easy to use. Plugins offer another option (see below).
Shell or batch steps are often the easiest way to integrate these
tools, which can be written in any language. For example: sh “java -jar
client.jar $endPointUrl $inputData” for a Java client, or sh “python
jiraClient.py $issueId $someParam” for a Python client.
Gotcha: especially avoid Pipeline XML or JSON parsing using Groovy’s XmlSlurper and JsonSlurper! Strongly prefer command-line tools or scripts.
The Groovy implementations are complex and as a result more brittle in Pipeline use.
XmlSlurper and JsonSlurper can carry a high memory and CPU cost in pipelines
xmllint and xmlstartlet are command-line tools offering XML extraction via xpath
jq offers the same functionality for JSON
These extraction tools may be coupled to curl or wget for fetching information from an HTTP API
Examples of other places to use command-line tools:
Templating large files
Nontrivial integration with external APIs (for bigger vendors,
consider a Jenkins plugin if a quality offering exists)
Simulations/complex calculations
Business logic
Consider existing plugins for external integrations. Jenkins has a
wealth of plugins, especially for source control, artifact management,
deployment systems, and systems automation. These can greatly reduce the
amount of Pipeline code to maintain. Well-written plugins may be
faster and more robust than Pipeline equivalents.
Consider both plugins and command-line clients (above) — one may be
easier than the other.
Plugins may be of widely varying quality. Look at the number of installations and how frequently and recently updates appear in the changelog. Poorly-maintained plugins
with limited installations may actually be worse than writing a little
custom Pipeline code.
As a last resort, if there is a good-quality plugin that is not
Pipeline-enabled, it is fairly easy to write a Pipeline wrapper to
integrate it or write a custom step that will invoke it.
Assume things will go wrong: don’t rely on workspaces being clean
of the remnants from previous executions, clean explicitly where needed.
Make use of timeouts and retry steps (that’s what they’re there for).
Within a git repository, git clean -fdx is a good way to
accomplish this and reduces the amount of SCM cloning
DO use parameterized Pipelines and variables to make your Pipeline
scripts more reusable. Passing in parameters is especially helpful for
handling different environments and should be preferred to applying
conditional lookup logic; however, try to limit parameterized pipelines invoking each other.
Try to limit business logic embedded in Pipelines. To some extent
this is inevitable, but try to focus on tasks to complete instead,
because this yields more maintainable, reusable, and often more
performant Pipeline code.
One code smell that points to a problem is many hard-coded
constants. Consider taking advantage of the options above to refactor
code for better composability.
For complex cases, consider using Jenkins integration options
(plugins, Jenkins API calls, invoking input steps externally) to offload
implementation of more complex business rules to an external system if
they fit more naturally there.
Please, think of these as guidelines, not strict rules – Jenkins
Pipeline provides a great deal of power and flexibility, and it’s there
to be used.
Breaking enough of these rules at scale can cause controllers to fail by
placing an unsustainable load on them.
For additional guidance, I also recommend
this
Jenkins World talk
on how to engineer Pipelines for speed and performance:
Appendix: Serializable vs. Non-Serializable Types:
To assist with Pipeline development, here are common serializable and
non-serializable types, to assist with deciding if your logic can be CPS
or should be in a @NonCPS function to avoid issues.
Common Serializable Types (safe everywhere):
All primitive types and their object wrappers: byte, boolean, int,
double, short, char
Strings
enums
Arrays of serializable types
ArrayLists and normal Groovy Lists
Sets: HashSet
Maps: normal Groovy Map, HashMap, TreeMap
Exceptions
URLs
Dates
Regex Patterns (compiled patterns)
Common non-Serializable Types (only safe in @NonCPS functions):
Iterators: this is a common problem. You need to use C-style loop, i.e.
for(int i=0; i
Regex Matchers (you can use the
built-in functions in String, etc, just not the Matcher itself)
Important: JsonObject, JsonSlurper, etc in Groovy 2+ (used in some 2.x+
versions of Jenkins).
This is due to an internal implementation change — earlier versions may serialize.