Jump to content
Forums in Read-Only Mode - Please use Reddit ×

Funtoo Newsletter, March 2023


drobbins

1,702 views

Better Late Than Never!

Our third newsletter is a bit late -- but has an in-depth article on some as-yet-unexplained aspects of Funtoo related to metatools and our CDN. Definitely worth a read!

Each newsletter, we are going to try to feature an in-depth article combined with key Funtoo news for the month. 

Linux Kernel 6.1.20_p1 is Now Available

The sys-kernel/debian-sources-6.1.20_p1 has been unmasked on the meta-repo tree and is available for regular upgrade for everybody. This is basically a “bug fix” upgrade, as no new features or modules were announced. However, 1185 files have been touched by bug fix commits since v.6.1.12_p1, so you might want to consider upgrading your kernel to benefit from them.

The Funtoo CDN and Metatools

Most people are aware that Funtoo has its own CDN (Content Distribution Network), but few understand the role it has been playing in Funtoo and the potential it has for the future. Over the last month there have been interesting developments regarding the use of the CDN associated with Funtoo Metatools. Metatools lets Funtoo auto-generate up-to-date ebuilds from sites like GitHub. This month we saw the realization of a year-long effort which made Go and Rust packages a lot more efficient.

In this article, you will learn about the evolution of our content distribution, from the early days of Gentoo to the latest stage at which Funtoo finds itself right now, as well as some of the inner workings of Metatools and how to use the new extensions in your Go and Rust autogens.

Background

Historically, Linux distributions used a network of mirrors to distribute their packages, installation media images and so on. This would provide faster downloads for users all over the world who could select a close-by mirror — often inside the very institution they were working from — while also alleviating some of the load on the primary servers.

Gentoo made use of a traditional mirror network to distribute its installation media and distfiles but introduced an innovation: for the small files that make up the Portage tree, Gentoo started using the relatively new rsync protocol instead the traditional ftp and http, guaranteeing that only the files that needed update were downloaded, which made the updates a lot more efficient.

Funtoo inherited that system from Gentoo, but soon innovated again, adopting the then new “git protocol” instead of rsync. Now instead of downloading the files that changed, it downloads only the changes themselves (the git “deltas”), making the updates even more efficient and faster. Also, this meant that the master Portage tree could now be hosted on GitHub, doing away with the burden of maintaining a network of rsync mirror servers for the Portage tree. This move also made it feasible to group the previously monolithic Portage tree into logical “kits”, each in its independent repository within the “meta-repo”.

However, git by its very nature is not suitable for hosting binary files and therefore cannot be used to distribute installation media and pre-built packages, for example. Thus, just moving from rsync to git didn’t mean that Funtoo wouldn’t need to manage a mirror network anymore; installation media, stage tarballs, distfiles and occasional pre-built binaries still needed to be made available for download somehow. At this point, Daniel decided to reach out to CDN77, which generously offered to provide CDN resources to the Funtoo project.

The Funtoo CDN

A CDN (Content Distribution Network) serves the same purpose as the “mirror network”, but it’s hard to even try to compare what they really are. The best-maintained mirror network will look amateurish and won’t compare in terms of performance to a modern CDN service. The CDN77 service uses caching and high-speed links to make the files easily and rapidly downloadable worldwide, using several geographic endpoints that can cache files and communicate rapidly between each other.

For practical purposes, let’s just say that the user doesn’t need to select a “best mirror” and there are no outdated or down mirrors. From the user standpoint, all the available ISO images and stage tarballs appear to be under one single URL: https://build.funtoo.org/, and the distfiles referred to in the ebuilds appear under https://direct.funtoo.org. The CDN will transparently select the fastest route between the user and a server that can deliver the content at the moment of the request. The CDN does not only manage mirrors in strategic points around the world, but also the content can be cached by ISP’s, making it readily available at the fastest speed possible to the users who connect to them. It doesn’t matter if you are in Los Angeles, Bucharest, Jakarta, Buenos Aires or Cape Town, you will have the same fast and reliable experience downloading content from a CDN.

This move completely freed Funtoo from the need for any kind of mirror network for any purpose. At first, only the Funtoo stage tarballs and distfiles were uploaded to the CDN. But there were some distfiles referenced from some ebuilds from the Gentoo snapshots that were using Gentoo mirrors, which risked being altered or removed without notice. Realizing that, Daniel wrote a script to populate our CDN with a full collection of source code, so that this would not become an issue. Then the CDN was made the default “mirror” for all distfile sources and the “fastpull service” was added to the Funtoo Portage, providing users with a fast, reliable and universally accessible download point for all their source code.

But Funtoo is not about ebuilds from Gentoo snapshots. The real deal are the “autogens”, which in theory can generate anything — not just ebuilds — using the logic contained in a “generator”, and a Jinja template. Metatools is essentially an advanced Python-based API for creating ebuilds, which contains useful tools for automatically checking the latest or all the available versions of packages, downloading the sources and generating ebuilds based on information found online or contained inside the downloaded tarballs. The generator can also take parameters from a YAML file, thus allowing a single Python generator to generate ebuilds for hundreds of different packages.

It Starts With the Spider

Metatools itself has a highly-efficient Web spider which is used to download the sources for all autogenned ebuilds. When the autogen is run in developer mode, fastpull only downloads the source tarball to the local computer, but when it’s run on our official regen infrastructure, all distfiles grabbed by the spider are immediately made available on the CDN, so that source code is always available even if the original repository goes offline.

In 2022, the second-generation of our fastpull technology was released, which stores files indexed by their sha512 sum hash rather than their file names. Thus, Portage can request distfiles from our CDN by their sha512 sum hash, and then save them locally with their original file name.  This completely eliminates the possibility of having an infamous Portage “digest mismatch”.

Also in 2022, the ebuild-generation component of metatools gained a new and very powerful feature called “Dynamic Archives”, which allows the autogens to create their own tarballs. These can be modified or repackaged versions of the original source tarballs, maybe with the addition of some icons or documentation downloaded from different places. They can also be a “prepared” version of the sources, so that the generated ebuild can drop the dependencies that would then be needed to prepare the sources or build the documentation. They can be used to build tarballs from git clones that include git submodule sources, which are usually missing from the GitHub tarballs.

This leads us to the biggest story of March, 2023: the new golang and rust extensions to the github-1 generator that showcase the power of the Dynamic Archives within metatools.

The golang Extension

Metatools have had for some time the ability to peek into source tarballs for software written in Go Language to extract the gosum hashes and download urls for its dependencies, which could then be used to create an ebuild with a long SRC_URI, which would hold both the URL for the main package and also all its dependencies. This allowed us to have up-to-date ebuilds for packages that were written in the Go Language.

In order to use this generator, however, it was necessary to use a custom python autogen rather than a generic YAML one. Also, the resulting ebuild was sub-optimal, since it was based on the existing Gentoo go-modules.eclass which required listing every single required go module individually in SRC_URI. For many golang-based ebuilds, this resulted in hundreds of entries in SRC_URI. Even though our CDN is very efficient for downloads, Portage downloads files one at a time, so each entry in SRC_URI takes a minimum of a few seconds each to download. Add a hundred (or two!) entries in SRC_URI, and the download of sources could take 10 or more minutes! Quite annoying.

Fortunately, Funtoo doesn’t have to settle for that sub-optimal experience. Thanks to dynamic archives, invakid404 and drobbins were able to develop a solution. Rather than individually list each required go module in an ebuild, the autogen itself could create a single tarball which contained all necessary go modules, and this tarball would be magically populated on our CDN. The ebuild could now reference one additional file, rather than hundreds, and we were able to magically work around the Portage fetch performance issue. Emerging ebuilds for golang-based packages could be made fast again.

As of April 5, 2023, these dynamic golang autogens are now active in the main tree – net-misc/rclone is one example of such packages.

To make these improvements easier to use, the go-modules.eclass was optimized to transparently use Funtoo’s go-module bundles, and a new extension to the metatools github-1 generator was introduced. To make your golang-based autogen automatically create a “golang bundle” (tarball), just two additional lines are needed (the last two ones in the YAML below):

mypackage_rule:
    generator: github-1
    packages:
        - my package
            extensions:
                - golang

The rust extension

Also included in the harvester/2023-03 branch is the "rust" extension to the github-1 generator, which works in an analogous form to that of the golang extension, with a conjunction of new code added to the existing rust metatools sub and to the github-1 generator. harvester/2023-03 has been merged into "production Funtoo", meaning that this functionality is now fully active and in use.

Next Time

In the April issue, we’ll begin a series of tutorials on Pull Requests and the Funtoo Git Repository and how it works. You’ll learn about the Funtoo tools & metatools, the kit-fixups repository, and end with the git pull request workflow.
 

3 Comments


Recommended Comments

"In the April issue, we’ll begin a series of tutorials on Pull Requests and the Funtoo Git Repository and how it works. You’ll learn about the Funtoo tools & metatools, the kit-fixups repository, and end with the git pull request workflow."

Yes, please.  I'm just getting started with this stuff and have lots of questions.

Link to comment
  • Funtoo Linux BDFL

@court-jester some changes -- the next newsletter is going to be more focused on updates on plans for funtoo and infrastructure, but I hope to be able to have the tutorial stuff available soon.

Link to comment

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...