Recent Blog Posts

Jun
29
Migrating from chained Freestyle jobs to Pipelines
This is a guest post by R. Tyler Croy, who is a long-time contributor to Jenkins and the primary contact for Jenkins project infrastructure. He is also a Jenkins Evangelist at CloudBees, Inc. For ages I have used the "Build After" feature in Jenkins to cobble together what one might refer to as a "pipeline" of sorts. The Jenkins project itself, a major consumer of Jenkins, has used these daisy-chained Freestyle jobs to drive a myriad of delivery pipelines in our infrastructure. One such "pipeline" helped drive the complex process of generating the pretty blue charts on stats.jenkins.io. This statistics generation process primarily performs two major tasks, on rather large sets of data: Generate aggregate monthly "census data." Process the census data and create trend charts The chained jobs allowed us to resume the independent stages of the pipeline, and allowed us to run different stages on different hardware (different capabilities) as needed. Below is a diagram of what this looked like: The infra_generate_monthly_json would run periodically creating the aggregated census data, which would then be picked up by infra_census_push whose sole responsibility was to take census data and publish it to the necessary hosts inside the project’s infrastructure. The second, semi-independent, "pipeline" would also run periodically. The infra_statistics job’s responsibility was to use the census data, pushed earlier by infra_census_push, to generate the myriad of pretty blue charts before triggering the infra_checkout_stats job which would make sure stats.jenkins.io was properly updated. Suffice it to say, this "pipeline" had grown organically over a period time when more advanced tools weren’t quite available. When we migrated to newer infrastructure for ci.jenkins.io earlier this year I took the opportunity to do some cleaning up. Instead of migrating jobs verbatim, I pruned stale jobs and refactored a number of others into proper Pipelines, statistics generation being an obvious target! Our requirements for statistics generation, in their most basic form, are: Enable a sequence of dependent tasks to be executed as a logical group (a pipeline) Enable executing those dependent tasks on various pieces of infrastructure which support different requirements Actually generate those pretty blue charts If you wish to skip ahead, you can jump straight to the Jenkinsfile which implements our new Pipeline. The first iteration of the Jenkinsfile simply defined the conceptual stages we would need: node { stage 'Sync raw data and census files' stage 'Process raw logs' stage 'Generate census data' stage 'Generate stats' stage 'Publish census' stage 'Publish stats' } How exciting! Although not terrifically useful. When I began actually implementing the first couple stages, I noticed that the Pipeline might sync dozens of gigabytes of data every time it ran on a new agent in the cluster. While this problem will soon be solved by the External Workspace Manager plugin, which is currently being developed. Until it’s ready, I chose to mitigate the issue by pinning the execution to a consistent agent. /* `census` is a node label for a single machine, ideally, which will be * consistently used for processing usage statistics and generating census data */ node('census && docker') { /* .. */ } Restricting a workload which previously used multiple agents to a single one introduced the next challenge. As an infrastructure administrator, technically speaking, I could just install all the system dependencies that I want on this one special Jenkins agent. But what kind of example would that be setting! The statistics generation process requires: JDK8 Groovy A running MongoDB instance Fortunately, with Pipeline we have a couple of useful features at our disposal: tool auto-installers and the CloudBees Docker Pipeline plugin. Tool Auto-Installers Tool Auto-Installers are exposed in Pipeline through the tool step and on ci.jenkins.io we already had JDK8 and Groovy available. This meant that the Jenkinsfile would invoke tool and Pipeline would automatically install the desired tool on the agent executing the current Pipeline steps. The tool step does not modify the PATH environment variable, so it’s usually used in conjunction with the withEnv step, for example: node('census && docker') { /* .. */ def javaHome = tool(name: 'jdk8') def groovyHome = tool(name: 'groovy') /* Set up environment variables for re-using our auto-installed tools */ def customEnv = [ "PATH+JDK=${javaHome}/bin", "PATH+GROOVY=${groovyHome}/bin", "JAVA_HOME=${javaHome}", ] /* use our auto-installed tools */ withEnv(customEnv) { sh 'java --version' } /* .. */ } CloudBees Docker Pipeline plugin Satisfying the MongoDB dependency would still be tricky. If I caved in and installed MongoDB on a single unicorn agent in the cluster, what could I say the next time somebody asked for a special, one-off, piece of software installed on our Jenkins build agents? After doing my usual complaining and whining, I discovered that the CloudBees Docker Pipeline plugin provides the ability to run containers inside of a Jenkinsfile. To make things even better, there are official MongoDB docker images readily available on DockerHub! This feature requires that the machine has a running Docker daemon which is accessible to the user running the Jenkins agent. After that, running a container in the background is easy, for example: node('census && docker') { /* .. */ /* Run MongoDB in the background, mapping its port 27017 to our host's port * 27017 so our script can talk to it, then execute our Groovy script with * tools from our `customEnv` */ docker.image('mongo:2').withRun('-p 27017:27017') { container -> withEnv(customEnv) { sh "groovy parseUsage.groovy --logs ${usagestats_dir} --output ${census_dir} --incremental" } } /* .. */ } The beauty, to me, of this example is that you can pass a closure to withRun which will execute while the container is running. When the closure is finished executin, just the sh step in this case, the container is destroyed. With that system requirement satisfied, the rest of the stages of the Pipeline fell into place. We now have a single source of truth, the Jenkinsfile, for the sequence of dependent tasks which need to be executed, accounting for variations in systems requirements, and it actually generates those pretty blue charts! Of course, a nice added bonus is the beautiful visualization of our new Pipeline! Links Pipeline documentation CloudBees Docker Pipeline plugin documentation Live statistics Pipeline
R. Tyler Croy
- pipeline
- infra
May
18
Partnering with Microsoft to run Jenkins infrastructure on Azure
I am pleased to announce that we have partnered with Microsoft to migrate and power the Jenkins project’s infrastructure with Microsoft Azure. The partnership comes at an important time, after the recent launch of Jenkins 2.0, Jenkins users are more readily adopting Pipeline as Code and many other plugins at an increasing rate, elevating the importance of Jenkins infrastructure to the overall success of the project. That strong and continued growth has brought new demands to our infrastructure’s design and implementation, requiring the next step in its evolution. This partnership helps us grow with the rest of the project by unifying our existing infrastructure under one comprehensive, modern and scalable platform. In March we discussed the potential partnership in our regularly scheduled project meeting, highlighting some of the infrastructure challenges that we face: Currently we have infrastructure in four different locations, with four different infrastructure providers, each with their own APIs and tools for managing resources, each with varying capabilities and capacities. Project infrastructure is managed by a team of volunteers, operating more than 15 different services and managing a number of additional external services. Our current download/mirror network, while geographically distributed, is relatively primitive and its implementation prevents us from using more modern distribution best practices. In essence, five years of tremendous growth for Jenkins has outpaced our organically grown, unnecessarily complex, project infrastructure. Migrating to Azure simplifies and improves our infrastructure in a dramatic way that would not be possible without a comprehensive platform consisting of: compute, CDN, storage and data-store services. Our partnership covers, at minimum, the next three years of the project’s infrastructure needs, giving us a great home for the future. Azure also enables a couple of projects that I have long been dreaming of providing to Jenkins users and contributors: End-to-end TLS encrypted distribution of Jenkins packages, plugins and metadata via the Azure CDN. More complete build/test/release support and capacity on ci.jenkins.io for plugin developers using Azure Container Service and generic VMs. The Jenkins infrastructure is all open source which means all of our Docker containers, Puppet code and many of our tools are all available on GitHub. Not only can you watch the migration process to Azure as it happens, but I also invite you to participate in making our project’s infrastructure better (join us in the #jenkins-infra channel on Freenode or our mailing list). Suffice it to say, I’m very excited about the bright [blue] future for the Jenkins project and the infrastructure that powers it!
R. Tyler Croy
April
22
Possible Jenkins Project Infrastructure Compromise
Last week, the infrastructure team identified the potential compromise of a key infrastructure machine. This compromise could have taken advantage of, what could be categorized as, an attempt to target contributors with elevated access. Unfortunately, when facing the uncertainty of a potential compromise, the safest option is to treat it as if it were an actual incident, and react accordingly. The machine in question had access to binaries published to our primary and secondary mirrors, and to contributor account information. Since this machine is not the source of truth for Jenkins binaries, we verified that the files distributed to Jenkins users: plugins, packages, etc, were not tampered with. We cannot, however, verify that contributor account information was not accessed or tampered with and, as a proactive measure, we are issuing a password reset for all contributor accounts. We have also spent significant effort migrating all key services off of the potentially compromised machine to (virtual) hardware so the machine can be re-imaged or decommissioned entirely. What you should do now If you have ever filed an issue in JIRA, edited a wiki page, released a plugin or otherwise created an account via the Jenkins website, you have a Jenkins community account. You should be receiving a password reset email shortly, but if you have re-used your Jenkins account password with other services we strongly encourage you to update your passwords with those other services. If you’re not already using one, we also encourage the use of a password manager for generating and managing service-specific passwords. The generated password sent out is temporary and will expire if you do not use it to update your account. Once it expires you will need recover your account with the password reset in the accounts app. This does not apply to your own Jenkins installation, or any account that you may use to log into it. If you do not have a Jenkins community account, there is no action you need to take. What we’re doing to prevent events like this in the future As stated above, the potentially compromised machine is being removed from our infrastructure. That helps address the immediate problem but doesn’t put guarantees in place for the future. To help prevent potential issues in the future we’re taking the following actions: Incorporating more security policy enforcement into our Puppet-driven infrastructure. Without a configuration management tool enforcing a given state for some legacy services, user error and manual mis-configurations can adversely affect project security. As of right now, all key services are managed by Puppet. Balkanizing our machine and permissions model more. The machine affected was literally the first independent (outside of Sun) piece of project infrastructure and like many legacy systems, it grew to host a multitude of services. We are rapidly evolving away from that model with increasing levels of user and host separation for project services. In a similar vein, we have also introduced a trusted zone in our infrastructure which is not routable on the public internet, where sensitive operations, such as generating update center information, can be managed and secured more effectively. We are performing an infrastructure permissions audit. Some portions of our infrastructure are 6+ years old and have had contributors come and go. Any inactive users with unnecessarily elevated permissions in the project infrastructure will have those permissions revoked. I would like to extend thanks, on behalf of the Jenkins project, to CloudBees for their help in funding and migrating this infrastructure. If you have further questions about the Jenkins project infrastructure, you can join us in the #jenkins-infra channel on Freenode or in an Infrastructure Q&A session I’ve scheduled for next Wednesday (April 27) at 20:00 UTC (12:00 PST).
R. Tyler Croy
- infra
- security