Plain and simple: Multi-stage Docker builds simplify your life. Small tweaks to your DockerFile cannot only make your CI/CD process easier, but will also make it easier for other developers to use your codebase.
A misunderstanding?
In the era of constant evolving state of software environments, it is difficult to keep up with the next "thing". You need to stay sharp and up-to-date on your language of choice, supporting frameworks, Docker, Kubernetes, Jenkins, AWS/GCP/Azure, and who knows what else. Prior to understanding the concept and benefit of multi-stage, our architecture team was heavily promoting this idea and I could not understand why. I had barely figured out how to write a DockerFile and now I needed to basically write two DockerFiles? I felt like I was being strung along.
Full credit goes to my friends on the architecture team promoting this methodology and showing us the way.
How It Was ...
Prior to switching to this recommendation, it was like a minor education class to understand how to use a new codebase. "First, clone my repo, make sure you have random file 123 in random directory ABC with credentials to this random repo, then place these six configuration files in the etc/ directory". Handing off code or training new developers on your code was always a chore. What is worse is that everything is language and framework specific. Java alone has Maven, Ant, and Gradle. It was the responsibility of the code maintainers to have a very explicit README markdown in the Github repo that specifically described how to test, compile, and run the application.
Okay, so now your new developer friend can build and run locally, and then comes our favorite frenemy: CI/CD. Each application builds slightly different, may require custom steps in the compilation/build, and outputs a slightly different binary. Furthermore, you need the codebase specific compilation and assembly tools available on the CI/CD environment for every application. How is your DevOps team going to keep up with that? Every time you build a new service/application/whatever, you need to budget out of your sprint specifically to get that build pipeline up. Then, what if you introduce a breaking change to your build process? "Can you upgrade Gradle/SBT/your-build-tool, but do not break legacy builds using legacy versions of these tools?" Call back the DevOps team. This entire tax is both unsustainable and unnecessary.
Thankfully, this era may be coming to a close soon.
What does it bring to the table?
The multistage Docker build has solved this entire problem. With just a small learning curve, take your dockerization to the next level. With multiple stages, you can easily script the build stage of your service/application. In doing so, you eliminate the need to have pre-existing configuration and knowledge of the codebase to build the application. Implicitly, your CI/CD platform no longer requires specific libraries or configuration to package and deploy to your environments. Every README and pipeline will look simply like this:
docker build . -t my-service:{tag}
How Does It Work?
Let's take a look at an example multistage build:
It really is not scary. There are multiple stages identified by different FROM statements. In the first stage (named builder), use a base docker image that can perform your build. SBT, Gradle, lein, maven, golang, gcc, npm, and every other language/build tool you need has public images posted on Docker Hub. Because I am compiling Scala, I will use the SBT docker image for my first stage. The first stage instructions are code-specific, but simple for my use case. As the codebase owner and author, you maintain what is necessary to package your code. Be sure to include your tests that need to run. For this case, a simple "sbt assembly" will do. In the second stage, use your base image of choice. The important line is the COPY statement. This pulls the compilation output from the first stage and places it in the second. After the second stage generates the image, the first stage is lost in the metaphorical bit-bucket.
So there you have it. A consistent pattern applicable to every codebase that simplifies knowledge transfers and continuous integration/deployment. These tweaks to your services can reduce overhead on many resources on your team and expedite your path to sustainable CI/CD. Check out my Github repo
HERE to see a working example of this.
UPDATE: One last note. Everyone loves docker because "it runs the same way in the remote environment as it does on your dev/local environment". In the case of using a build stage, now your codebase compiles/builds the same way in all environments. We have all had to deal with "it does not compile on my computer" problems at some time. Help make that a problem of the past!