Accelerating build processes with multithreading

Building (compiling) large projects can take a long time.
Build systems like Make and SCons do a really good job of reusing resources that don’t need recompiling because we didn’t change anything, but compiling a large project for the first time can still take a while.

One solution to this issue, especially with multicore and multithreaded processors increasing in popularity even in low-end machines, is to tell our build system to run multiple jobs at once. This allows it to use the full power of modern CPUs, compiling many files at once. In theory, this could speed up the build process on my Ryzen 7 1700 machine (which sports a whopping 8 threads and 16 cores) by a factor of up to 16 times!

Proof of concept: building Godot Engine with SCons

To get a feel of how much of a difference this could really make, I figured I’d test it in a real-world scenario.

Godot Engine is the only truly open-source 2D and 3D video game engine out there. It is still in the early stages, but it’s in rapid development, and has a growing community. It is compiled with SCons, a Python-based build system. It is a very large project, and takes a while to compile from scratch.

So I downloaded the source code from the official GitHub repo and compiled it with (this is for Linux – read the official docs for other platforms):

$ cd ~/path/to/godot/
$ time scons platform=x11

The time command will show us how long scons took to execute. To force SCons to recompile a project you’ve already compiled, use

$ scons platform=x11 --clean --no-cache
$ time scons platform=x11

Compiling the project this way took a whole 11m26.011s. As you can see from the following screenshot of Ubuntu’s System Monitor, it’s really not making good use of the Ryzen 7 processor’s 16 threads it has at its disposal:

Ubuntu System Monitor screenshot during compilation

Individual threads occasionally peak up to near 100% usage, but only one thread is really getting used at a time, which is not very efficient.
So, let’s run this same process, but with 16 jobs (parallel tasks which each get assigned to a different processor thread) instead of the default one.

$ scons platform=x11 --clean --no-cache
$ time scons --jobs=16 platform=x11

An alternative but equivalent command would be scons -j 16 platform=x11.
The process ended in just 1m 53.684s, which is a whole 6 times faster! This isn’t the full 16x speedup one might have hoped for, but this is to be expected: not the entire process can be done in parallel, and there is some overhead to running multiple jobs. It’s still a noticeable improvement over the single-job version, though. As you can clearly see from the System Monitor’s feedback, all the threads are being fully used now, and the RAM usage is now also noticeably above the system’s idle state (around 3GB at the time of testing, with a few Firefox tabs open etc.), too:

Ubuntu System Monitor screenshot during parallelized compilation

Compiling with more jobs than you have threads available in your CPU will likely make little difference, as any excess jobs will have to wait their turn to execute.
An unpleasant side effect of this technique is that your computer will most likely pretty much lock up and become almost unusable as long as the compilation is running: this is because all of your processor’s cores are being used to their full extent, leaving little room for other applications. So, you either wait it out, or you can run the process with fewer jobs than threads available, compromising some speedup in exchange for usability of the machine. Compiling with 12 jobs instead of 16 meant I could barely notice a difference when performing background tasks, but still finished in about 2m 2.5s.

Parallelizing in Make

This example was with SCons, but similar options exists with other build systems too. With Make for instance, you would use make --jobs=16 or make -j 16 to run 16 jobs in parallel, and the -B option to unconditionally make all targets (even if they were already built).