The Inside Story of How We Optimized our Own Build System

The story of the DriveNets build system started like most other startups. We took various different open source projects as a base for the product. Almost every one of them had a different way of building it. From plain old Makefiles to automake, CMake and others. We started by building a VM template (using vagrant) as a starting point for developers. Thus each developer got his own Ubuntu VM that was capable of building the product.

The DriveNets Build System Environment

While using a VM template seemed like a good idea at the start, it didn’t take long before we started to hit its limitation. The main issue for us was adding/modifying tools/libs for the build. If a developer wanted to add a new library, then he had to make sure that all the other developers had that installed. We couldn’t just add it to the template, since developers were reluctant to change the VM that they were using – just to get a new library. For a while, these issues was solved mainly by emails, e.g “I added a new gcc flag, make sure to upgrade your gcc version to X”. Needless to say, this was a nightmare. It really made developers think twice before adding a new tool.


To Learn More Download the AvidThink White Paper:
Disaddregation in Networking – The Second Wave

Download


At that point, it was obvious that we needed something better to manage the requirements for our build. We decided to use Docker containers for our builds. We decided to create a builder image. This image would contain all the dependencies that are required to build our artifacts. By using this method, if a developer wanted to add a new dependency to the build, all he would need to do is add it to the Dockerfile, and update the image that should be used to build the product.

No more emails to all the company to upgrade/install/remove a package. Really, a game changer. Adding stuff got much better.

This was a great improvement but it involved an extra step for building. Now in order to build, a developer would be required to spin up a container and run the relevant build command, e.g make. We wanted to make it even simpler. The developer should just run make /target/ and it should work. He shouldn’t have to think about containers. To solve this, we added proxy targets. These proxy targets could spin up the container and run the build command inside it. The following example should make this clear.

Without proxy target

```
awesome:
      gcc -o drivenets drivenets.c
```

With the proxy target:

 ```
_awesome_:
     gcc -o drivenets drivenets.c
awesome:
     docker run -v ${PWD}:${PWD} $(shell cat builder_version_file) make _awesome_
```

Now the developers didn’t need to do anything special to make the build run properly. They would just get all the relevant packages automatically when someone changes the builder version. No manual work required. Nice.

The need for speed

As the product started to mature, we added more features. This means that we had more artifacts and more work to build our product. Building our product started to take a long time, in order of tens of minutes. Something had to be done.

On our quest for success we found sccache from Mozilla. This marvelous tool caches the compilation process. It computes a hash and the source file, its dependencies and the exact flags used to compile it. Since sccache uses remote cache, in our case a Redis server in the office network, it means that if someone in the office already compiled a file, everyone else gets it for free. The sccache tool took us from tens of minutes to 7~8 minutes, with very little effort. This was a great success.

Another thing that we realized is that we can use the fact that we have a builder image for one more thing. There was no need to run configure (in autotools) or any other equivalent step. The reason is that in the builder image the versions of all the libraries were set when creating the image. Removing this step saved a few tens of seconds. Not a game changer, but a good improvement no less.

This solution lasted for awhile. Though we still wanted it to be faster. We wanted to get to a point that a fully cached build would take seconds, not minutes. At this point it seemed like a far fetched aspiration, but we didn’t give up.

The need for simplicity

As the company grew, we had multiple, different tools with various complexity for building. Each team had their own preference. Thus we ended up with multiple build “frameworks”. For example, we had a bunch of cmake modules that performed compilation tasks that were used by one team. A different team had Perl scripts that were triggered from automake files. A different team had Python scripts triggered from a makefile. As you can understand this was hard to maintain.

The fact that each one of these tools required a different expertise to run correctly, e.g craft the dependencies in the makefile properly to avoid redundant builds, made it hard for developers to modify what other teams were working on. If they managed to make it work, and that’s a big IF, they usually ended up with a suboptimal solution. This meant that a build became more and more complex to maintain. This also made the build slower again.

Something had to be changed.

The quest for ‘build’ gold

We set out to find a build tool that would be simple enough for everyone to use. It needed to also be fast – very fast. We reviewed various build systems. From Google Bazel to Facebook Buck and Ninja Build. Some of them felt like improved CMake, meaning they weren’t that simple to use. Others were very limited in terms of what you can do with them. Nothing really felt like the golden tool that we were looking for.

Then came tup. This build system looked very promising. From the simple syntax of tupfiles and the great power that they brought, to the incredible build times it gave. One thing that it lacked was support for strong caching. We wanted something in the spirit of Bazel remote caching for tup. We wanted to cache everything that we could. This was the key for getting our build time down to seconds.

We spent some time investigating tup and tried to bend it to fit our needs. Eventually, we decided to use some of the main concepts of tup:

  • Detecting which files are accessed during a build to compute dependencies automatically.
  • Simple and clean build files.
  • Strong diagnostic commands to understand the build flow with ease

The ‘Build’ Endgame

We now had a pretty good understanding of the requirements that we wanted from the build tool.

  1. Cache as much as we could
  2. Compute dependencies automatically
  3. Simple and powerful build files

Cache as Much as We Could

While it might seem obvious, caching is very important and should be an integral part of the build system. The tool should apply caching where possible as long as the resulting artifact is correct. The sccache tool, for example, does a great job at caching at the single file level. To create a single .o file, sccache would do a great job. The thing is that if our makefile contained a lot of single .o targets, we would get a lot of sccache invocation, each one without context of the next element that would be built. So even though the build system knew that it wanted file1.o just to build the resulting binary sccache didn’t have that knowledge and couldn’t use it. By making the caching a core part of the build system, we could make sure that every part is cached, even the resulting artifact after the linkage stage.

Compute dependencies automatically

By computing dependencies automatically, we could make sure that the resulting build is correct and optimized. This meant that there wouldn’t be unnecessary calls to clean, when something didn’t work. This also meant that we would have a minimal and exact set of dependencies on each file. So we could build it only if we really needed to.

While there are some approaches to do this in a language specific manner (like this GCC trick for c/c++), the approach from tup allowed this to be language agnostic. We simply checked which files were accessed while building. Each file that was accessed to build a specific artifact was a dependency of that artifact.

Simple and powerful build files

Makefiles look simple at first sight. You just create a target and define how to build it. The problem starts when you want to assure correctness. We had various cryptic make features (eval anyone?) and very large and complex makefiles. The tup syntax was so simple and effective. We wanted to stay close to that syntax, and enhance it where needed.

Armed with this knowledge and how it met our requirements, it was time to start with umake. umake is currently still under active and heavy development. It is still a relatively young project but the current results were very promising. I mean, just look at the numbers for building our project now.

With remote cache in the office network  

umake
[0.273]  done imports
[0.000]  done loading graph
[0.000]  done filesystem scan
[0.048]  done parsing UMakefile
[0.012]  done saving graph
[0.005]  done cache gc
Workers   0/4   Cache   146/1500[MB]  Cache Hits  94%  Local  0%   Remote  100%    Time   11[sec]

With cache on local disk

umake
[0.278]  done imports
[0.019]  done loading graph
[0.051]  done filesystem scan
[0.116]  done parsing UMakefile
[0.022]  done saving graph
[0.005]  done cache gc
Workers   0/4   Cache   146/1500[MB]  Cache Hits  100%  Local  100%   Remote  0%    Time   0[sec]

Repeated build (umake & umake)

umake
[0.274]  done imports
[0.019]  done loading graph
[0.016]  done filesystem scan
[0.036]  done parsing UMakefile
[0.020]  done saving graph
[0.004]  done cache gc
Workers   0/4   Cache   146/1500[MB]  Time   0[sec]

Yep, that’s 11 seconds when using remote cache, 0 seconds with local disk and 0 seconds for repeated build.

If You ‘Build’ It….We Optimized the Build

While a lot of people worked on these areas in our company, the main contributor to umake was Gregory Freilikhman. He knew that our builds could be brought down to seconds. He tried different tools and different approaches to optimize the build. Eventually he was the one that decided to create our own build system. The true father of umake. We are lucky that we have such talented developers in our ranks.

About the author

Kfir is a software development team lead at DriveNets. Kfir has more than 12 years of experience in software development, primarily in communication applications and appliances in C/C++, as well as in Python development, build systems etc. He holds a B.A in Computer Science from the College of Management Academic Studies.

White Paper

DISAGGREGATION IN NETWORKING: THE SECOND WAVE

New white paper from AvidThink!
Explore how disaggregation is being implemented for networking and the unique opportunities for Service Providers.