mcgrof

Do not panic!

kdevops logo

After 3 years since the announcement of the first release of kdevops I'd like to announce the release of v6.2-rc1 of kdevops!

kdevops is designed to help with automation of Linux kernel development workflows. At first, is was not clear how and if kdevops could be used outside of filesystems testing easily. In fact my last post about it 3 years ago explained how one could only use kdevops in an odd way for other things, one had to fork it to use it for different workflows. That's old nonsense now. kdevops has grown to adopt kconfig and so in one single tree different workflows are now possible. Embracing other things such as using jinja2 for file templating with ansible and having to figure out a way to add PCI-E passthrough support through kconfig has made me realize that the growth component to the project is no longer a concern, it is actually a feature now. It is clear now that new technologies and very complex workflows can easily be added to kdevops.

But it is easy to say unless you have proof, and fortunately I have it. There are two new technologies that go well supported in kdevops that folks who are curious can start mucking around with, which otherwise may take a bit of time to ramp up with. The technologies are: Zoned storage and CXL. Supporting new technologies also means ensuring you get whatever tooling you might need to want to test or work with such technologies.

So for instance, getting a full Linux kernel development workflow going for CXL with the meson unit tests, even by enabling PCI-E passthrough, with the latest linux-next kernel is now reduced to just a few basic commands, in a Linux distribution / cloud provider agnostic manner:

make dynconfig
make
make bringup
make linux
make cxl
make cxl-test-meson

Just ask around a typical CXL Linux kernel developer how long it took them to get a CXL Linux kernel development & test environment up and running that they were happy with. And ask if it was reproducible. This is all now just reduced to 6 commands.

As for the details, it has been 8 months since the last release, and over that time the project has received 680 commits. I'd like to thank the developers who contributed:

Adam Manzanares
Amir Goldstein
Chandan Babu R
Jeff Layton
Joel Granados
Josef Bacik
Luis Chamberlain
Pankaj Raghav

I'd also like to thank my employer for trusting in this work, and allowing me to share a big iron server to help the community with Linux kernel stable work and general kernel technology enablement.

As for the exact details of changes merged, there so many! So I've tried to provide a nice terse summary on highlights on the git tag for v6.2-rc1. 8 months was certainly a long time to wait for a new release, so my hope is we'll try to bake a release now in tandem with the Linux kernel, in cadence with the same Linux kernel versioning and release timeline.

Based on feedback at LSFMM this year the project is now under the github linux-kdevops organization. This enables other developers to push into the tree. This let's us scale, specially as different workflows are supported.

If you see value in enabling rapid ramp up with Linux kernel development through kdevops for your subsystem / technology / feel free to join the party and either send a pull request to the group or just send patches.

I'm announcing the release of kdevops which aims at making setting up and testing the Linux kernel for any project as easy as possible. Note that setting up testing for a subsystem and testing a subsystem are two separate operations, however we strive for both. This is not a new test framework, it allows you to use existing frameworks, and set those frameworks up as easily can humanly be possible. It relies on a series of modern hip devops frameworks, it relies on ansible, vagrant and terraform, ansible roles through the Ansible Galaxy, and terraform modules.

Three example demo projects are released which demo it's use:

  • kdevops – skeleton generic example using linux-stable
  • fw-kdevops – used for testing firmware loading using linux-next. This example demo was written in about one hour tops by forking kdevops, trimming it, adding a new ansible galaxy for selftests. You are expected to be able to fork it and add your respective kernel selftest fork in a minute
  • oscheck – actively being used to test and advance the XFS filesystem for stable kernel releases. If you fork this to try to add support for testing a new filesystem under a new project, please let me know how long it took you to do that.

Fancy pictures in a nutshell

Of course you just want pictures and the ability to go home after seeing them. Should these be on instagram as well? Gosh.

A first run of kdevops

On a first run:

Running the bootlinux role on just one host

Example run of just running the ansible bootlinux role on just one host:

End of running the bootlinx ansible role on just one host

This shows what it looks like at the end of running the ansible bootlinux role after the host has booted into the new shiny kernel:

Logging into test test systems

Well, since we set up your ~/ssh/.config for you, all you gotta do now is just ssh in to the target host you want to test, it will already have the shiny new kernel installed and booted into it:

Motivations for kdevops

Below I'll document just a bit of the motivation behind this project. The documentation and demo projects should hopefully suffice for how to use all this.

Testing ain't easy, brah!

Getting contributors to your subsystem / driver in Linux is wonderful, however ensuring it doesn't break anything is a completely separate matter. It is my belief that testing a patch to ensure no testable regressions exist should be painless, and simple, however that has never been the case.

Testing frameworks ain't easy to setup, brah!

Linux kernel testing frameworks should also be really easy to set up. But that is typically never the case either. One example case of complexity in setting a test framework is fstests used to tests Linux kernel filesystems, and to ensure to the best of our ability that a new patch doesn't regress the kernel against a baseline. But wait, what is the baseline?

Setting up test systems ain't easy to ramp up, brah!

Another difficulty with testing the Linux kernel comes with the fact that you don't want to test the kernel on same kernel you're laptop is running on, otherwise you'd crash it, and if you're testing filesystems you may even end up corrupting your filesystem. So typically folks end up using virtualization technologies to setup virtual machines, boot into them, and then use the virtualized hosts as test vehicles. Another alternative is to use cloud service providers such as OpenStack, Azure, Amazon Web Services, Google Cloud Compute to create hosts on the cloud and use these instead. But I've heard complaints about how even setting up KVM can be complex, even from kernel developers! Even some kernel developers don't want to know how to set up a virtual environment to test things.

I hear ya, brah!

My litmus test for a full set up complexity is all the work required to setup fstests to test a Linux filesystem. If a solution for all the woes above were to ever be provided, I figured it'd have to allow to you easily setup fstests to test XFS without you doing much work.

I started looking into this effort first by trying to provide my own set of wrappers around KVM to let you easily setup KVM. Then I extended this effort to easily setup fstests. Both efforts were all shell hacks... It worked for me, but I was still not really happy with it all. It seemed hacky.

Ted Ts'o's xfstests-bld.git provided a cloud environment solution for using setting up fstests on Google Cloud Compute for ext filesystemes (ext2, ext3, ext4), however I was not satisfied with this given I wanted it easy to allow you to test any filesystem, and be Cloud provider agnostic.

ansible provides a proper replacement for shell hacks, in a distribution agnostic manner, and even OS agnostic manner. Vagrant lets me replace all those terrible original bash hacks to setup KVM with simple elegant descriptions of what I want a set of target set of hosts to look like. It also lets me support not only KVM but also Virtualbox, and even support Mac OS X. Terraform accomplishes the same but for cloud environments, and supports different providers.

Feedback and rants welcomed

So, give the repositories a shot, I welcome feedback and rants.

kdevops is intended to be used as the de-facto example for all of the ansible roles, and terraform modules.

fw-kdevops is intended to be forked by folks wanting a simple two host test setup where all you need is linux-next and to run selftests.

oscheck is already actively used to help advance XFS on the stable kernel releases, and is intended to be forked by folks who want to use fstests to test any filesystem on any kernel release.

The offlineimap woes

A long term goal I've had for a while now was finding a reasonable replacement for offlineimap to get all my email for my development purposes. I knew offlineimap kept dying on me with out of memory (OOM) errors however it was not clear how bad the issue was. It was also not clear what I'd replace it with until now. At least for now... I've replaced offlineimap with mbsync. Below are some details comparing both, with shiny graphs of system utilization on both, I'll provide my recipes for fetching gmail nested labels over IMAP, glance over my systemd user unit files and explain why I use them, and hint what I'm asking Santa for in the future.

System setup when home is $HOME

I used to host my mail fetching system at home, however, $HOME can get complicated if you travel often, and so for more flexibility I rely now on a digital ocean droplet with a small dedicated volume pool for storage for mail. This lets me do away with the stupid host whenever I'm tired of it, and lets me collect nice system utilization graphs without much effort.

Graphing use on offlineimap

Every now and then I'd check my logs and see how offlineimap tends to run out of memory, and would tend to barf. A temporary solution I figure would work was to disable autorefresh, and instead run offlineimap once in a controlled timely loop using systemd unit timers. That solution didn't help in the end. I finally had a bit of time to check my logs carefully and also check system utilization graphs on the sytem over time and to my surprise offlineimap was running out of memory every single damn time. Here's what I saw from results of running offlineimap for a full month:

Full month graph of offlineimap

Those spikes are a bit concerning, it's likely the system running out of memory. But let's zoom in to see how often with an hourly graph:

Hourly graph of offlineimap

Pretty much, I was OOM'ing every single damn time! The lull you see towards the end was me just getting fed up and killing offlineimap until I found a replacement.

The OOM risks

Running out of memory every now and then is one thing, but every single time is just insanity. A system always running low on memory while doing writes is an effective way to stress test a kernel, and if the stars align against you, you might even end up with a corrupted filesystem. Fortunately this puny single threaded application is simple enough so I didn't run into that issue. But it was a risk.

mbsync

mbsync is written in C, actively maintained and has mutt code pedigree. Need I say more? Hell, I'm only sad it took me so long to find out about it. mbsync works with idea of channels, for each it would have a master and local store. The master is where we fetch data from, and the local where we stash things locally.

But in reading its documentation it was not exactly clear how I'd use it for my development purpose to fetch email off of my gmail where I used nested labels for different public mailing lists.

The documentation was also not clear on what to do when migrating and keeping old files.

mbsync migration

Yes in theory you could keep the old IMAP folder, but in practice I ran into a lot of issues. So much so, my solution to the problem was:

$ rm -rf Mail/

And just start fresh... Afraid to make the jump due to the amount of time it may take to sync one of your precious labels? Well, evaluate my different timer solution below.

mbsync for nested gmail labels

Here's what I ended up with. It demos getting mail to say my linux-kernel/linux-xfs and linux-kernel/linux-fsdevel mailing lists, and includes some empirical throttling to ensure you don't get punted by gmail for going over some sort of usage quota they've concocted for an IMAP connection.

# A gmail example
#
# First generic defaults
# This example was updated on 2021-05-27 to account
# for the isync rename of Master/Slave for Far/Near
Create Near
SyncState *

IMAPAccount gmail
SSLType IMAPS
Host imap.gmail.com
User user@gmail.com
# Must be an application specific password, otherwise google will deny access.
Pass example
# Throttle mbsync so we don't go over gmail's quota: OVERQUOTA error would
# eventually be returned otherwise. For more details see:
# https://sourceforge.net/p/isync/mailman/message/35458365/
# PipelineDepth 50

MaildirStore gmail-local
# The trailing "/" is important
Path ~/Mail/
Inbox ~/Mail/Inbox
Subfolders Verbatim

IMAPStore gmail-remote
Account gmail

# emails sent directly to my kernel.org address
# are stored in my gmail label "korg"
Channel korg
Far :gmail-remote:"korg"
Near :gmail-local:korg

# An example of nested labels on gmail, useful for large projects with
# many mailing lists. We have to flatten out the structure locally.
Channel linux-xfs
Far :gmail-remote:"linux-kernel/linux-xfs"
Near :gmail-local:linux-kernel.linux-xfs

Channel linux-fsdevel
Far :gmail-remote:"linux-kernel/linux-fsdevel"
Near :gmail-local:linux-kernel.linux-fsdevel

# Get all the gmail channels together into a group.
Group googlemail
Channel korg
Channel linux-xfs
Channel linux-fsdevel

mbsync systemd unit files

Now, some of these mailing lists (channels in mbsync lingo) have heavy traffic, and I don't need to be fetching email off of them that often. I also have a channel dedicated solely for emails sent directly to me, those I want right away. But also... since I'm starting fresh, if I ran mbsync to fetch all my email it would mean that at one point mbsync would stall for any large label I'd have. I'd have to wait for those big labels before getting new email for smaller labels. For this reason, ideally I 'd want to actually call mbsync at different intervals depending on the mailing list / mbsync channel. Fortunately mbsync locks per target local directory, and so the only missing piece was a way to configure timers / calls for mbsync in such a way I could still journal calls / issues.

I ended up writing a systemd timer and a service unit file per mailing list. The nice thing about this, in favor over using good 'ol cron, is OnUnitInactiveSec=4m, for instance will call mbsync 4 minutes after it last finished. I also end up with a central place to collect logs:

journalctl --user

Or if I want to monitor:

journalctl --user -f

For my korg label, patches / rants sent directly to me, I want to fetch mail every minute:

$ cat .config/systemd/user/mbsync-korg.timer
[Unit]
Description=mbsync query timer [0000-korg]
ConditionPathExists=%h/.mbsyncrc

[Timer]
OnBootSec=1m
OnUnitInactiveSec=1m

[Install]
WantedBy=default.target

$ cat .config/systemd/user/mbsync-korg.service
[Unit]
Description=mbsync service [korg]
Documentation=man:mbsync(1)
ConditionPathExists=%h/.mbsyncrc

[Service]
Type=oneshot
ExecStart=/usr/local/bin/mbsync 0000-korg

[Install]
WantedBy=mail.target

However for my linux-fsdevel... I could wait at least 30 minutes for a refresh:

$ cat .config/systemd/user/mbsync-linux-fsdevel.timer
[Unit]
Description=mbsync query timer [linux-fsdevel]
ConditionPathExists=%h/.mbsyncrc

[Timer]
OnBootSec=5m
OnUnitInactiveSec=30m

[Install]
WantedBy=default.target

And the service unit:

$ cat .config/systemd/user/mbsync-linux-fsdevel.service
[Unit]
Description=mbsync service [linux-fsdevel]
Documentation=man:mbsync(1)
ConditionPathExists=%h/.mbsyncrc

[Service]
Type=oneshot
ExecStart=/usr/local/bin/mbsync linux-fsdevel

[Install]
WantedBy=mail.target

Enabling and starting systemd user unit files

To enable these unit files I just run for each, for instance for linux-fsdevel:

# The first command is now required on more recent versions
# of systemd. Only older versions of systemd
# you just need to enable the timer and start it
systemctl --user enable mbsync-linux-fsdevel.service
systemctl --user enable mbsync-linux-fsdevel.timer
systemctl --user start  mbsync-linux-fsdevel.timer

Graphing mbsync

So... how did it do?

I currently have enabled 5 mbsync channels, all fetching my email in the background for me. And not a single one goes on puking with OOM. Here's what life is looking like now:

mbsync hourly

Peachy.

Long term ideals

IMAP does the job for email, it just seems utterly stupid for public mailing lists and I figure we can do much better. This is specially true in light of the fact of how much simpler it is for me to follow public code Vs public email threads these days. Keep in mind how much more complicated code management is over the goal of just wanting to get a simple stupid email Message ID onto my local Maildir directory. I really had my hopes on public-inbox but after looking into it, it seems clear now that its main objectives are for archiving — not local storage / MUA use. For details refer to this linux-kernel discussion on public-inbox with a MUA focus.

If the issue with using public-inbox for local MUA usage was that archive was too big... it seems sensible to me to evaluate trying an even smaller epoch size, and default clients to fetch only one epoch, the latest one. That alone wouldn't solve the issue though. How data files are stored on Maildir makes using git almost incompatible. A proper evaluation of using mbox would be in order.

The social lubricant is out on the idea though, and I'm in hopes a proper simple git Mail solution is bound to find us soon for public emails.