Monday, August 3, 2020

How the multi-stage Docker Build Simplifies It All

Plain and simple: Multi-stage Docker builds simplify your life. Small tweaks to your DockerFile cannot only make your CI/CD process easier, but will also make it easier for other developers to use your codebase.

A misunderstanding?

In the era of constant evolving state of software environments, it is difficult to keep up with the next "thing". You need to stay sharp and up-to-date on your language of choice, supporting frameworks, Docker, Kubernetes, Jenkins, AWS/GCP/Azure, and who knows what else. Prior to understanding the concept and benefit of multi-stage, our architecture team was heavily promoting this idea and I could not understand why. I had barely figured out how to write a DockerFile and now I needed to basically write two DockerFiles? I felt like I was being strung along.

Full credit goes to my friends on the architecture team promoting this methodology and showing us the way.

How It Was ...

Prior to switching to this recommendation, it was like a minor education class to understand how to use a new codebase. "First, clone my repo, make sure you have random file 123 in random directory ABC with credentials to this random repo, then place these six configuration files in the etc/ directory". Handing off code or training new developers on your code was always a chore. What is worse is that everything is language and framework specific. Java alone has Maven, Ant, and Gradle. It was the responsibility of the code maintainers to have a very explicit README markdown in the Github repo that specifically described how to test, compile, and run the application.

Okay, so now your new developer friend can build and run locally, and then comes our favorite frenemy: CI/CD. Each application builds slightly different, may require custom steps in the compilation/build, and outputs a slightly different binary. Furthermore, you need the codebase specific compilation and assembly tools available on the CI/CD environment for every application. How is your DevOps team going to keep up with that? Every time you build a new service/application/whatever, you need to budget out of your sprint specifically to get that build pipeline up. Then, what if you introduce a breaking change to your build process? "Can you upgrade Gradle/SBT/your-build-tool, but do not break legacy builds using legacy versions of these tools?" Call back the DevOps team. This entire tax is both unsustainable and unnecessary.

Thankfully, this era may be coming to a close soon.

What does it bring to the table?

The multistage Docker build has solved this entire problem. With just a small learning curve, take your dockerization to the next level. With multiple stages, you can easily script the build stage of your service/application. In doing so, you eliminate the need to have pre-existing configuration and knowledge of the codebase to build the application. Implicitly, your CI/CD platform no longer requires specific libraries or configuration to package and deploy to your environments. Every README and pipeline will look simply like this:

docker build . -t my-service:{tag}

How Does It Work?

Let's take a look at an example multistage build:

It really is not scary. There are multiple stages identified by different FROM statements. In the first stage (named builder), use a base docker image that can perform your build. SBT, Gradle, lein, maven, golang, gcc, npm, and every other language/build tool you need has public images posted on Docker Hub. Because I am compiling Scala, I will use the SBT docker image for my first stage. The first stage instructions are code-specific, but simple for my use case. As the codebase owner and author, you maintain what is necessary to package your code. Be sure to include your tests that need to run. For this case, a simple "sbt assembly" will do. In the second stage, use your base image of choice. The important line is the COPY statement. This pulls the compilation output from the first stage and places it in the second. After the second stage generates the image, the first stage is lost in the metaphorical bit-bucket.

So there you have it. A consistent pattern applicable to every codebase that simplifies knowledge transfers and continuous integration/deployment. These tweaks to your services can reduce overhead on many resources on your team and expedite your path to sustainable CI/CD. Check out my Github repo HERE to see a working example of this.

UPDATE: One last note. Everyone loves docker because "it runs the same way in the remote environment as it does on your dev/local environment". In the case of using a build stage, now your codebase compiles/builds the same way in all environments. We have all had to deal with "it does not compile on my computer" problems at some time. Help make that a problem of the past!

Thursday, May 21, 2020

Integration Testing and TestContainers

I must be behind the times. TestContainers is a library ported to many popular languages and frameworks that assists in starting up Docker containers during your test runtime. It initially was pushed to Github in 2015 and I am only NOW discovering it. Oh the troubles and headaches this would have saved me.

For years and years, I have struggled to find the correct approach to writing code and building tests in the correct balance. It was always one extreme or the other. I would write enormous amounts of code without any unit tests. I mean, come on. The project is due very soon. Who has time to write tests? Then you get the other extreme: Test-driven development. It is like being in school all over again where you show your work in Math class. How can we forget my favorite situation: 50% of your unit tests do not consistently pass. These are just mounting headaches and problems.

How do we act like perfect programming angels where you check off ALL the boxes:

Continuous integration where all tests run
Tests that pass every time
Integration tests that ensure connectivity to other component
Tests that do not depend on external databases being in the perfect state
367% code coverage (you know they would ask for it if they could)
Shift Left test concepts

Involving QA earlier
Getting developers to assist in the test phase

Delivering the code at a fast pace with adequate tests to prove your work
Make the bosses happy

TestContainers is the metaphorical programming angel sent from binary heaven. When used properly, TestContainers allows you to build integration tests using Docker in your test runtime. Let's get setup!

Setup your environment

Make sure you have Docker installed on your development machine: Docker Desktop

Import TestContainers in your language of choice. Check HERE for support of your language. Other languages supported are Java, Python, Golang, Node, etc. Here is the link to the Scala repo: TestContainers-Scala. For my specific use case, I'll be using ScalaTest so I'll be importing the module to run TestContainers from ScalaTest.

Build your First Test

Now, let's create our first test! I am going to use Redis because Redis solves all of the world's problems. FACT!

As you can see from our test, it simply starts the Redis Docker container, calls PING, and asserts that PONG is replied back from the Redis process running in the Docker container.

Build a Test Harness

Now, setting up your test fixtures from here forward will be a major contributor to the success of this pattern. Remember our goals: shift left, deliver on time, pass tests, coverage, etc. Instead of compiling your solution, deploying to your Dev/QA environment, and telling QA to find the issues; bring the testing effort back to your local machine. Think of the applicable use cases and states of your service and put them into the test fixtures using TestContainers. I like to call it "creating a test harness". Create an interface that is easily implemented to serve as your way to create strong integration tests for your service.

Now we have create a concise interface on how to perform integration tests on this service. There is a clear pattern: purgeAllData to empty the database if necessary, import flat files or entries, call getDataId to retrieve data, and assert on the response.

Make a Test Harness Implementation

In the example posted above we have used the test harness to assert every possible behavior of the API/service abstraction. At this point, I have 99% confidence that everything I built will work perfectly when I move this to the QA and PRODUCTION environments. At worst, I will have some configuration issues. These tests will carry on with the service for the life of the codebase. Furthermore, the usage of the test harness is SO basic that developers of ALL levels of experience should be able to expand your test cases with little guidance.

You have now finished a POC of using TestContainers. In doing so, you have shifted left, built a solid and reliable test base that passes every time, and will have incredible levels of confidence that your service will operate appropriately when moved through QA and Production environments. I recently used this paradigm on a mission critical project at work. Not only were we ahead of schedule every single time, but we have YET to receive a single solitary bug report. Furthermore, we have received very minimal feedback from QA for any major changes. The bosses are happy because we pushed to PROD early and have an enormous test base that runs against multiple integrated components. We have absolute confidence in every build we push out.

Pull my Github repo hosting this blog post HERE and see for yourself. Join TestContainers with no regrets. Watch the reliability of your service soar.

Wednesday, May 13, 2020

Me

My name is Daniel Natic.

I have been writing software for nearly twenty years. I started hobby programming in VB6 back in the day. This evolved into writing WinForms applications in C# and custom web sites with ASP.net, SQL server, and Microsoft blah-blah-blah.

The most dramatic change in my programming life came when I changed employers and switched to Scala. I found my true passion and dove headfirst into functional programming and building distributed, highly scalable APIs. I was introduced to AWS, caching strategies, multi-data center systems, NoSQL databases, Docker, Kubernetes, and so much more.

I started this blog to share my experiences with fellow developers. I hope my readers learn from my posts and look forward to learning from them in discussions.

This blog will be primarily tailored to Scala (JVM) concepts, but is very frequently applicable to any language of choice.