15 January 2026

Towards reproducible builds via Docker

When maintaining software, it is an important aspect to build new versions quickly and with minimal effort. If it is about free software, it is also relevant to make each step reproducible. Since I maintain several free software, these questions and their answers are unavoidable for me.

Recently, I faced Docker, which has been famous in the open source world for many years, but I was not very interested in it because I used my own toolset. While working on the evaluation of a nice mathematical project, I tried to install their software on my workstation without success. The solution was to use Docker, exactly how the project authors officially suggested. The reason of my failure was that Python had to be a certain version (3.11) to ensure support for some underlying modules. (My Python, being on Ubuntu 24.04, was 3.12. Bad luck!)

You may say: it is crazy that a minor change in versioning can make such a difference. But this is reality. Thirty-five years ago, near the birth of the open source movement, you usually had no such problems since software changes were much more followable than today because of the small amount of existing software packages. Now, however, to ensure that people run the same underlying software, especially when a new version of your tool is packaged, one may need to fix every piece of the prerequisites. Fifteen years ago when I became responsible for packaging software at GeoGebra, I was happy with using VirtualBox and vmware to ensure such details in a full virtualized environment. Today, however, there are much better approaches, and here comes Docker in picture.

The big problem with Docker is that several people use it in a different way than it was designed for. It was the same with the above mentioned mathematical project. In fact, Docker should be considered as a set of command line tools to help Linux hosted virtualization, rather than using it as a simple command. This was my first misunderstanding. Now, I would compare it rather to Git because both tools support certain workflows. For each workflow you may need a different scenario to select the appropriate tool for each step.

Docker has a kind of philosophy which may be different from the natural approach of a typical system administrator or a package maintainer. In my case, I had to understand the difference between images and containers. First you build an image with docker build, and then you can run that image to create a container with docker run. Interacting between the container and your host system is not always trivial: sometimes your container is exiting before you want to get data from it, or you need to forward information from the host to the container by using the correct way designed by the Docker authors. Luckily, I had a quite helpful guidance from AI tools and I got the main concepts quickly enough without reading the whole documentation in advance. In fact, for Docker, it could be an important hint to read the docs first to clarify the main concepts before running into crazy issues. In fact, nowadays it is very rare that you read the docs before giving the tool a try first. “If everything else fails, read the documentation”, this well-known wisdom is certainly a modification of a quote by Ralph Waldo Emerson. As of today, I would use another modification: “Before everything else fails, ask the AI, but always check the result.”

Now I am after 3 weeks of experiments. The first steps included compiling Krita via Docker. This was inspired by my son Benedek who came up with the idea to develop a plugin for Krita to allow showing a hexagonal grid to support game masters in Dungeons & Dragons. I was surprised how perfectly the compilation process worked in a Docker virtualization. Clearly, its maintainer Dmitry Kazakov did a wonderful job by putting the tools together to have an exceptionally nicely working development environment.

In my learning process I chose my current project bibref with the idea in mind to simplify its build process, and, to make it even more deterministic. Formerly, I had difficulties on Windows because of using the rolling distribution MSYS2, and by having changed versions of some libraries quite spontaneously the build stopped working at some point. I experienced the same problem with GitHub Actions builds, also when building for macOS. In fact, Docker has nothing to do with Mac builds, but it has a good support to cross-compile code for Windows. Also, it turned out that building for the WebAssembly platform is a good use case for Docker. Surprisingly (but this was actually clear from the Krita example) it is also possible to run a Linux-based graphical application via Docker, but I was unsure how far it is supported — luckily, I have managed to solve all problems I faced during my experiments.

I created 5 Dockerfiles. I learned it was important in my case to put them in separate folders (otherwise it may take a long time to send the whole content of the folder to Docker unnecessarily). Each Dockerfile is responsible for one specific build of bibref. There are common parts of the Dockerfiles: most commonly, the native version must be built to ensure having the database cache pre-generated. I guess advanced Docker users would separate the common parts in a different image and compose them together, but I was happy with my solution at this stage.

Five containers

I started with building the command line (cli) version. It was the simplest scenario, but I already faced the problem of interactive run of a container. Then I continued with building the Qt version for Linux which provides a graphical user interface (gui). The first working version of the Dockerfile was not much longer than for the cli version, but the final version is quite long because of patching the code to have full support for icons and running the application with a number of environment variables to ensure proper startup of Chromium inside Qt via QtWebEngine. (Here I found AI support extremely helpful when debugging the issues.) Then I went on with compiling the web version, first with its command line interface. Here I needed to get Emscripten and compile the SWORD library for WebAssembly as well. It was surprisingly easy to run a web server inside the container and connect to it from the host (or from another workstation in the same network). The next project was the same with the Qt version, here I had to recompile the whole Qt stack from scratch for both the container host and for WebAssembly. This was quite challenging but still manageable since I had lots of experience from last December when I successfully did the same natively (without Docker). For this 4th Dockerfile, however, I did not manage to connect to the WebAssembly based Qt application from an arbitrary workstation, except from the Linux host, but that was acceptable for me since my final scenario was to put the deployment on a public server.

The fifth project was to cross-compile the gui version for Windows. Here I made several attempts until I got success. The first dead end was to use Wine and install MSYS2 on Wine, everything inside Docker. I chose this approach by having a positive result from an earlier experiment. I had to disable the signature checking since it would involve unimplemented features in Wine, but finally I had to give up this approach because its terrible slowness. Unfortunately, the installation of libxml2 was stuck for an indefinite time, no idea why.

The second dead end was to use MINGW64 by compiling everything from scratch. I managed to build Graphviz, Boost, Qt and zlib from source with moderate pain. Unfortunately, it turned out that bibref requires some non-trivial features of Graphviz that are required to display statement graphs, including rsvg and pango. These finally would result in recompiling almost a complete GTK stack which seemed an overkill for the project. Here, a workaround was to copy the binaries of the required Graphviz dependencies from the MSYS2 repository, and surprisingly, this option worked for the full project.

But, finally I chose to select all possible dependencies from MSYS2 (by using its MINGW64 packages), instead of compiling Graphviz, Boost and zlib from source. This resulted in selecting 36 packages, being available in zstd format. (This format has been chosen by the pacman package manager for Arch Linux, too.) The packages, however, do not include the Qt binary packages. Indeed, it was a sad decision to leave out the Qt packages, but I did not manage to resolve a linker problem: in the final step I got an error related to some mismatches with the Qt library taken from the MSYS2/MINGW64 build. So, at the end of the day, Qt had to be built from scratch for the Linux/MINGW64 platform, and it took a huge amount of time.

The final challenge was to collect all dependencies for each MSYS2/MINGW64 package (this is how 36 packages were finally selected) and lay out the required folders for the final .EXE to find. This included the folders platforms, styles and imageformats from the Qt distribution (which was built by me formerly). As the last step, I managed to run the Inno Setup utility (after installing it via inside xfvb via Wine) and create the Windows installer .EXE as well, completely automatically.

Reproducible?

Reproducibility is just partial at the moment. First, cloning a GitHub repository without a fixed tag or commit hashcode is always a question. This should be fixed in the future. For one Dockerfile I used Subversion to get the latest version: this should also be changed to use a fixed version. Luckily, Docker has a feature to override preconfigured settings via the ARG command, so this might be used in the future to fine-tune the required version to build.

The used packages from MSYS2 are already set via ARG commands. Being a rolling release, it is quite natural that MSYS2 will reject serving outdated versions of the dependent packages after a while, hopefully not earlier than in a couple of years. Fortunately, the package versions can be overriden with the above mentioned idea, so this is already in a good shape, no change is required in the Dockerfile. (Each change in the Dockerfile usually results in a full rebuild of all remaining steps that come after the modified one, including re-run of the changed step as well, of course. So one should avoid changing the Dockerfile if it is not necessary.)

For the moment, I am happy that all the projects are buildable and the artifacts being created can be copied in an easy way from the container to the host system. The Linux versions remain to be built by the Snapcraft machinery and the Flatpak ecosystem automatically, and for the Mac version I still need my own Mac Mini (borrowed from the GeoGebra guys), but I have already changed the release workflow for the WebAssembly ports and the Windows version.

Comparison

Here is a short comparison of the 5 containers in a tabular form:

No. Target OS Application variant Compiler Guest OS Dockerfile length (chars) Build time (mins) Image size (GB)
1 Linux cli gcc Ubuntu 24.04 1259 4 1.6
2 Linux gui gcc Debian Trixie 4149 7 3.25
3 WebAssembly cli emscripten/clang Ubuntu 24.04 2873 13 3.57
4 WebAssembly gui emscripten/clang Ubuntu 24.04 4926 96 69.2
5 Windows gui mingw64/gcc Debian Trixie 14225 100+ 79.6

You can try all of these variants (and also the Mac version) on the web site of the project, here. If interested, the Dockerfiles include several comments to help their users. Most importantly, each Dockerfile comes with step-by-step instructions at the top of the file.


Continue reading…

See also a filtered list of the entries on topics GeoGebra, technical developments or internal references in the Bible.


Zoltán Kovács
Linz School of Education
Johannes Kepler University
Altenberger Strasse 69
A-4040 Linz