Konstantin Ryabitsev

kernel.org administrator

Message-ID's are used to identify and retrieve messages from the public-inbox archive on lore.kernel.org, so it's only natural to want to use memorable ones. Or maybe it's just me.

Regardless, here's what I do with neomutt and coolname:

  1. If coolname isn't yet packaged for your distro, you can install it with pip:

    pip install --user coolname
    
  2. Create this file as ~/bin/my-msgid.py:

    #!/usr/bin/python3
    import sys
    import random
    import string
    import datetime
    import platform
    
    from coolname import generate_slug
    
    parts = []
    parts.append(datetime.datetime.now().strftime('%Y%m%d'))
    parts.append(generate_slug(3))
    parts.append(''.join(random.choices(string.hexdigits, k=6)).lower())
    
    sys.stdout.write('-'.join(parts) + '@' + platform.node().split('.')[0])
    
  3. Create this file as ~/.mutt-fix-msgid:

    my_hdr Message-ID: <`/path/to/my/bin/my-msgid.py`>
    
  4. Add this to your .muttrc (works with mutt and neomutt):

    send-hook . "source ~/.mutt-fix-msgid"
    
  5. Enjoy funky message-id's like 20240227-flawless-capybara-of-drama-e09653@lemur. :)

At some point in the recent past, mutt changed the way it generates Message-ID header values. Instead of the perfectly good old way of doing it, the developers switched to using base64-encoded random bytes. The base64 dictionary contains the / character, which causes unnecessary difficulties when linking to these messages on lore.kernel.org, since the / character needs to be escaped as %2F for everything to work properly.

Mutt developers seem completely uninterested in changing this, so please save everyone a lot of trouble and do the following if you're using mutt for your kernel development needs (should work for all mutt versions):

  1. Create a ~/.mutt-hook-fix-msgid file with the following contents (change “mylaptop.local” to whatever you like):

    my_hdr Message-ID: <`uuidgen -r`@mylaptop.local>
    
  2. Add the following to your ~/.muttrc:

    send-hook . "source ~/.mutt-hook-fix-msgid"
    

UPDATE: if you have mutt 2.1 or later you can alternatively set the $message_id_format variable to restore the pre-mutt-2.0 behaviour:

# mutt-2.1+ only
set message_id_format = "<%Y%02m%02d%02H%02M%02S.G%c%p@%f>"

Thanks to Thomas Weißschuh for the suggestion!

While b4 started out as a way for maintainers to retrieve patches from mailing lists, it also has contributor-oriented features. Starting with version 0.10 b4 can:

  • create and manage patch series and cover letters
  • track and auto-reroll series revisions
  • display range-diffs between revisions
  • apply trailers received from reviewers and maintainers
  • submit patches without needing a valid SMTP gateway

These features are still considered experimental, but they should be stable for most work and I'd be happy to receive further feedback from occasional contributors.

In this article, we'll go through the process of submitting an actual typo fix patch to the upstream kernel. This bug was identified a few years ago and submitted via bugzilla, but never fixed:

Accompanying video

This article has an accompanying video where I go through all the steps and submit the actual patch at the end:

Installing the latest b4 version

Start by installing b4. The easiest is to do it via pip, as this would grab the latest stable version:

$ pip install --user b4
[...]
$ b4 --version
0.11.1

If you get an error or an older version of b4, please check that your $PATH contains $HOME/.local/bin where pip installs the binaries.

Preparing the tree

  • b4 prep -n [name-of-branch] -f [nearest-tag]

Next, prepare a topical branch where you will be doing your work. We'll be fixing a typo in arch/arm/boot/dts/aspeed-bmc-opp-lanyang.dts, and we'll base this work on tag v6.1:

$ b4 prep -n lanyang-dts-typo -f v6.1
Created new branch b4/lanyang-dts-typo
Created the default cover letter, you can edit with --edit-cover.

This is just a regular branch prepended with “b4/”:

$ git branch
* b4/lanyang-dts-typo
  master

You can do all the normal operations with it, and the only special thing about it is that it has an “empty commit” at the start of the series containing the template of our cover letter.

Editing the cover letter

  • b4 prep --edit-cover

If you plan to submit a single patch, then the cover letter is not that necessary and will only be used to track the destination addresses and changelog entries. You can delete most of the template content and leave just the title and sign-off. The tracking information json will always be appended to the end automatically — you don't need to worry about it.

Here's what the commit looks like after I edited it:

$ git cat-file -p HEAD
tree c7c1b7db9ced3eba518cfc1f711e9d89f73f8667
parent 830b3c68c1fb1e9176028d02ef86f3cf76aa2476
author Konstantin Ryabitsev <icon@mricon.com> 1671656701 -0500
committer Konstantin Ryabitsev <icon@mricon.com> 1671656701 -0500

Simple typo fix for the lanyang dts

Signed-off-by: Konstantin Ryabitsev <icon@mricon.com>

--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
  "series": {
    "revision": 1,
    "change-id": "20221221-lanyang-dts-typo-8509e8ffccd4",
    "base-branch": "master",
    "prefixes": []
  }
}

Committing your work

You can add commits to this branch as you normally would with any other git work. I am going to fix two obvious typos in a single file and make a single commit:

$ git show HEAD
commit 820ce2d9bc7c88e1515642cf3fc4005a52e4c490 (HEAD -> b4/lanyang-dts-typo)
Author: Konstantin Ryabitsev <icon@mricon.com>
Date:   Wed Dec 21 16:17:21 2022 -0500

    arm: lanyang: fix lable->label typo for lanyang dts

    Fix an obvious spelling error in the dts file for Lanyang BMC.
    This was reported via bugzilla a few years ago but never fixed.

    Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org>
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=205891
    Signed-off-by: Konstantin Ryabitsev <icon@mricon.com>

diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-lanyang.dts b/arch/arm/boot/dts/aspeed-bmc-opp-lanyang.dts
index c0847636f20b..e72e8ef5bff2 100644
--- a/arch/arm/boot/dts/aspeed-bmc-opp-lanyang.dts
+++ b/arch/arm/boot/dts/aspeed-bmc-opp-lanyang.dts
@@ -52,12 +52,12 @@ hdd_fault {
                        gpios = <&gpio ASPEED_GPIO(B, 3) GPIO_ACTIVE_HIGH>;
                };
                bmc_err {
-                       lable = "BMC_fault";
+                       label = "BMC_fault";
                        gpios = <&gpio ASPEED_GPIO(H, 6) GPIO_ACTIVE_HIGH>;
                };

                sys_err {
-                       lable = "Sys_fault";
+                       label = "Sys_fault";
                        gpios = <&gpio ASPEED_GPIO(H, 7) GPIO_ACTIVE_HIGH>;
                };
        };

Collecting To: and Cc: addresses

  • b4 prep --auto-to-cc

After you've committed your work, you will want to collect the addresses of people who should be the ones reviewing it. Running b4 prep --auto-to-cc will invoke scripts/get_maintainer.pl with the default recommended flags to find out who should go into the To: and Cc: headers:

$ b4 prep --auto-to-cc
Will collect To: addresses using get_maintainer.pl
Will collect Cc: addresses using get_maintainer.pl
Collecting To/Cc addresses
    + To: Rob Herring <...>
    + To: Krzysztof Kozlowski <...>
    + To: Joel Stanley <...>
    + To: Andrew Jeffery <...>
    + Cc: devicetree@vger.kernel.org
    + Cc: linux-arm-kernel@lists.infradead.org
    + Cc: linux-aspeed@lists.ozlabs.org
    + Cc: linux-kernel@vger.kernel.org
    + Cc: Jens Schleusener <...>
---
You can trim/expand this list with: b4 prep --edit-cover
Invoking git-filter-repo to update the cover letter.
New history written in 0.06 seconds...
Completely finished after 0.33 seconds.

These addresses will be added to the cover letter and you can edit them to add/remove destinations using the usual b4 prep --edit-cover command.

Creating your patatt keypair for web endpoint submission

(This needs to be done only once.)

  • patatt genkey

Note: if you already have a PGP key and it's set as user.signingKey, then you can skip this section entirely.

Before we submit the patch, let's set up the keypair to sign our contributions. This is not strictly necessary if you are going to be using your own SMTP server to submit the patches, but it's a required step if you will use the kernel.org patch submission endpoint (which is what b4 will use in the absence of any [sendemail] sections in your git config).

The process is very simple. Run patatt genkey and add the resulting [patatt] section to your ~/.gitconfig as instructed by the output.

NOTE: You will want to back up the contents of your ~/.local/share/patatt so you don't lose access to your private key.

Dry-run and checkpatch

  • b4 send -o /tmp/tosend
  • ./scripts/checkpatch.pl /tmp/tosend/*

Next, generate the patches and look at their contents to make sure that everything is looking sane. Good things to check are:

  • the From: address
  • the To: and Cc: addresses
  • general patch formatting
  • cover letter formatting (if more than 1 patch in the series)

If everything looks sane, one more recommended step is to run checkpatch.pl from the top of the kernel tree:

$ ./scripts/checkpatch.pl /tmp/tosend/*
total: 0 errors, 0 warnings, 14 lines checked

/tmp/tosend/0001-arm-lanyang-fix-lable-label-typo-for-lanyang-dts.eml has no obvious style problems and is ready for submission.

Register your key with the web submission endpoint

(This needs to be done only once, unless you change your keys.)

  • b4 send --web-auth-new
  • b4 send --web-auth-verify [challenge]

If you're not going to use your own SMTP server to send the patch, you should register your new keypair with the endpoint:

$ b4 send --web-auth-new
Will submit a new email authorization request to:
  Endpoint: https://lkml.kernel.org/_b4_submit
      Name: Konstantin Ryabitsev
  Identity: icon@mricon.com
  Selector: 20221221
    Pubkey: ed25519:24L8+ejW6PwbTbrJ/uT8HmSM8XkvGGtjTZ6NftSSI6I=
---
Press Enter to confirm or Ctrl-C to abort
Submitting new auth request to https://lkml.kernel.org/_b4_submit
---
Challenge generated and sent to icon@mricon.com
Once you receive it, run b4 send --web-auth-verify [challenge-string]

The challenge is a UUID4 string and this step is a simple verification that you are able to receive email at the address you want associated with this key. Once you receive the challenge, complete the process as described:

$ b4 send --web-auth-verify 897851db-9b84-4117-9d82-1d970f9df5f8
Signing challenge
Submitting verification to https://lkml.kernel.org/_b4_submit
---
Challenge successfully verified for icon@mricon.com
You may now use this endpoint for submitting patches.

OR, set up your [sendemail] section

You don't have to use the web endpoint — it exists primarily for people who are not able or not willing to set up their SMTP information with git. Setting up a SMTP gateway is not a straightforward process for many:

  • platforms using OAuth require setting up “application-specific passwords”
  • some companies only provide Exchange or browser-based access to email and don't offer any other way to send mail
  • some company SMTP gateways rewrite messages to add lengthy disclaimers or rewrite links to quarantine them

However, if you have access to a functional SMTP gateway, then you are encouraged to use it instead of submitting via the web endpoint, as this ensures that the development process remains distributed and not dependent on any central services. Just follow instructions in man git-send-email and add a valid [sendemail] section to your git config. If b4 finds it, it will use it instead of relying on the web endpoint.

[sendemail]
    smtpEncryption = tls
    smtpServer = smtp.gmail.com
    smtpServerPort = 465
    smtpEncryption = ssl
    smtpUser = yourname@gmail.com
    smtpPass = your-gmail-app-password

Reflect the email to yourself

  • b4 send --reflect

This is the last step to use before sending off your contribution. Note, that it will fill out the To: and Cc: headers of all messages with actual recipients, but it will NOT actually send mail to them, just to yourself. Mail servers don't actually pay any attention to those headers — the only thing that matters to them is what was specified in the RCPT TO outer envelope of the negotiation.

This step is particularly useful if you're going to send your patches via the web endpoint. Unless your email address is from one of the following domains, the From: header will be rewritten in order to not violate DMARC policies:

  • @kernel.org
  • @linuxfoundation.org
  • @linux.dev

If your email domain doesn't match the above, the From: header will be rewritten to be a kernel.org dummy address. Your actual From: will be added to the body of the message where git expects to find it, and the Reply-To: header will be set so anyone replying to your message will be sending it to the right place.

Send it off!

  • b4 send

If all your tests are looking good, then you are ready to send your work. Fire off “b4 send”, review the “Ready to:” section for one final check and either Ctrl-C to get out of it, or hit Enter to submit your work upstream.

Coming up next

In the next post, I will go over:

  • making changes to your patches using: git rebase -i
  • retrieving and applying follow-up trailers using: b4 trailers -u
  • comparing v2 and v1 to see what changes you made using: b4 prep --compare-to v1
  • adding changelog entries using: b4 prep --edit-cover

Documentation

All contributor-oriented features of b4 are documented on the following site:

Once every couple of years someone unfailingly takes advantage of the following two facts:

  1. most large git hosting providers set up object sharing between forks of the same repository in order to save both storage space and improve user experience
  2. git's loose internal structure allows any shared object to be accessed from any other repository

Thus, hilarity ensues on a fairly regular basis:

Every time this happens, many wonder how come this isn't treated like a nasty security bug, and the answer, inevitably, is “it's complicated.”

Blobs, trees, commits, oh my

Under the hood, git repositories are a bunch of objects — blobs, trees, and commits. Blobs are file contents, trees are directory listings that establish the relationship between file names and the blobs, and commits are like still frames in a movie reel that show where all the trees and blobs were at a specific point in time. Each next commit refers to the hash of the previous commit, which is how we know in what order these still frames should be put together to make a movie.

Each of these objects has a hash value, which is how they are stored inside the git directory itself (look in .git/objects). When git was originally designed, over a decade ago, it didn't really have a concept of “branches” — there was just a symlink HEAD pointing to the latest commit. If you wanted to work on several things at once, you simply cloned the repository and did it in a separate directory with its own HEAD. Cloning was a very efficient operation, as through the magic of hardlinking, hundreds of clones would take up about as much room on your disk as a single one.

Fast-forward to today

Git is a lot more complicated these days, but the basic concepts are the same. You still have blobs, trees, commits, and they are all still stored internally as hashes. Under the hood, git has developed quite a bit over the past decade to make it more efficient to store and retrieve millions and tens of millions of repository objects. Most of them are now stored inside special pack files, which are organized rather similar to compressed video clips — formats like webm don't really store each frame in a separate image, as there is usually very little difference between any two adjacent frames. It makes much more sense to store just the difference (“delta”) between two still images until you come to a designated “key frame”.

Similarly, when generating pack files, git will try to calculate the deltas between objects and only store their incremental differences — at least until it decides that it's time to start from a new “key frame” just so checking out a tag from a year ago doesn't require replaying a year worth of diffs. At the same time, there has been a lot of work to make the act of pushing/pulling objects more efficient. When someone sends you a pull request and you want to review their changes, you don't want to download their entire tree. Your git client and the remote git server compare what objects they already have on each end, with the goal to send you just the objects that you are lacking.

Optimizing public forks

If you look at the GitHub links above, check out how many forks torvalds/linux has on that hosting service. Right now, that number says “41.1k”. With the best kinds of optimizations in place, a bare linux.git repository takes up roughtly 3 GB on disk. Doing quick math, if each one of these 41.1k forks were completely standalone, that would require about 125 TB of disk storage. Throw in a few hundred terabytes for all the forks of Chromium, Android, and Gecko, and soon you're talking Real Large Numbers. Which is why nobody actually does it this way.

Remember how I said that git forks were designed to be extremely efficient and reuse the objects between clones? This is how forks are actually organized on GitHub (and git.kernel.org, for that matter), except it's a bit more complicated these days than simply hardlinking the contents of .git/objects around.

On git.kernel.org side of things we store the objects from all forks of linux.git in a single “object storage” repository (see https://pypi.org/project/grokmirror/ for the gory details). This has many positive side-effects:

  • all of git.kernel.org, with its hundreds of linux.git forks takes up just 30G of disk space
  • when Linus merges his usual set of pull requests and performs “git push”, he only has to send a very small subset of those objects, because we probably already have most of them
  • similarly, when maintainers pull, rebase, and push their own forks, they don't have to send any of the objects back to us, as we already have them

Object sharing allows to greatly improve not only the backend infrastructure on our end, but also the experience of git's end-users who directly benefit from not having to push around nearly as many bits.

The dark side of object sharing

With all the benefits of object sharing comes one important downside — namely, you can access any shared object through any of the forks. So, if you fork linux.git and push your own commit into it, any of the 41.1k forks will have access to the objects referenced by your commit. If you know the hash of that object, and if the web ui allows to access arbitrary repository objects by their hash, you can even view and link to it from any of the forks, making it look as if that object is actually part of that particular repository (which is how we get the links at the start of this article).

So, why can't GitHub (or git.kernel.org) prevent this from happening? Remember when I said that a git repository is like a movie full of adjacent still frames? When you look at a scene in a movie, it is very easy for you to identify all objects in any given still frame — there is a street, a car, and a person. However, if I show you a picture of a car and ask you “does this car show up in this movie,” the only way you can answer this question is by watching the entire thing from the beginning to the end, carefully scrutinizing every shot.

In just the same way, to check if a blob from the shared repository actually belongs in a fork, git has to look at all that repository's tips and work its way backwards, commit by commit, to see if any of the tree objects reference that particular blob. Needless to say, this is an extremely expensive operation, which, if enabled, would allow anyone to easily DoS a git server with only a handful of requests.

This may change in the future, though. For example, if you access a commit that is not part of a repository, GitHub will now show you a warning message:

Looking up “does this commit belong in this repository” used to be a very expensive operation, too, until git learned to generate commit graphs (see man git-commit-graph). It is possible that at some point in the future a similar feature will land that will make it easy to perform a similar check for the blob, which will allow GitHub to show a similar warning when someone accesses shared blobs by their hash from the wrong repo.

Why this isn't a security bug

Just because an object is part of the shared storage doesn't really have any impact on the forks. When you perform a git-aware operation like “git clone” or “git pull,” git-daemon will only send the objects actually belonging to that repository. Furthermore, your git client deliberately doesn't trust the remote to send the right stuff, so it will perform its own connectivity checks before accepting anything from the server.

If you're extra paranoid, you're encouraged to set receive.fsckObjects for some additional protection against in-flight object corruption, and if you're really serious about securing your repositories, then you should set up and use git object signing:

This is, incidentally, also how you would be able to verify whether commits were made by the actual Linus Torvalds or merely by someone pretending to be him.

Parting words

This neither proves nor disproves the identity of “Satoshi.” However, given Linus's widely known negative opinions of C++, it's probably not very likely that it's the language he'd pick to write some proof of concept code.

This is the second installment in the series where we're looking at using the public-inbox lei tool for interacting with remote mailing list archives such as lore.kernel.org. In the previous article we looked at delivering your search results locally, and today let's look at doing the same, but with remote IMAP folders. For feedback, send a follow-up to this message on the workflows list:

For our example query today, we'll do some stargazing. The following will show us all mail sent by Linus Torvalds:

f:torvalds AND rt:1.month.ago..

I'm mostly using it because it's short, but you may want to use something similar if you have co-maintainer duties and want to automatically receive a copy of all mail sent by your fellow subsystem maintainers.

Note on saving credentials

When accessing IMAP folders, lei will require a username and password. Unless you really like typing them in manually every time you run lei up, you will probably want to have them cached on your local system. Lei will defer to git-credential-helper for this purpose, so if you haven't already set this up, you will want to do that now.

The two commonly used credential storage backends on Linux are “libsecret” and “store”:

  • libsecret is the preferred mechanism, as it will work with your Desktop Environment's keyring manager to store the credentials in a relatively safe fashion (encrypted at rest).

  • store should only be used if you don't have any other option, as it will record the credentials without any kind of encryption in the ~/.git-credentials file. However, if nothing else is working for you and you are fairly confident in the security of your system, it's something you can use.

Simply run the following command to configure the credential helper globally for your environment:

git config --global credential.helper libsecret

For more in-depth information about this topic, see man git-credential.

Getting your IMAP server ready

Before you start, you should get some information about your IMAP server, such as your login information. For my examples, I'm going to use Gmail, Migadu, and a generic Dovecot IMAP server installation, which should hopefully cover enough ground to be useful for the vast majority of cases.

What you will need beforehand:

  • the IMAP server hostname and port (if it's not 993)
  • the IMAP username
  • the IMAP password

It will also help to know the folder hierarchy. Some IMAP servers create all subfolders below INBOX, while others don't really care.

Generic Dovecot

We happen to be running Dovecot on mail.codeaurora.org, so I'm going to use it as my “generic Dovecot” system and run the following command:

lei q -I https://lore.kernel.org/all/ -d mid \
  -o imaps://mail.codeaurora.org/INBOX/torvalds \
  <<< 'f:torvalds AND rt:1.month.ago..'

The <<< bit above is a Bash-ism, so if you're using a different shell, you can use the POSIX-compliant heredoc format instead:

lei q -I https://lore.kernel.org/all/ -d mid \
  -o imaps://mail.codeaurora.org/INBOX/torvalds <<EOF
f:torvalds AND rt:1.month.ago..
EOF

The first time you run it, you should get a username: and password: prompt, but after that the credentials should be cached and no longer required on each repeated access to the same imaps server.

NOTE: some IMAP servers use the dot . instead of the slash / for indicating folder hierarchy, so if INBOX/torvalds is not working for you, try INBOX.torvalds instead.

Refreshing and subscribing to IMAP folders

If the above command succeeded, then you should be able to view the IMAP folder in your mail client. If you cannot see torvalds in your list of available folders, then you may need to refresh and/or subscribe to the newly created folder. The process will be different for every mail client, but it shouldn't be too hard to find.

The same with Migadu

If you have a linux.dev account (see https://korg.docs.kernel.org/linuxdev.html), then you probably already know that we ask you not to use your account for subscribing to busy mailing lists. This is due to Migadu imposing soft limits on how much incoming email is allowed for each hosted domain — so using lei + IMAP is an excellent alternative.

To set this up with your linux.dev account (or any other account hosted on Migadu), use the following command:

lei q -I https://lore.kernel.org/all/ -d mid \
  -o imaps://imap.migadu.com/lei/torvalds \
  <<< 'f:torvalds AND rt:1.month.ago..'

Again, you will need to subscribe to the new lei/torvalds folder to see it in your mail client.

The same with Gmail

If you are a Gmail user and aren't already using IMAP, then you will need to jump through a few additional hoops before you are able to get going. Google is attempting to enhance the security of your account by restricting how much can be done with just your Google username and password, so services like IMAP are not available without setting up a special “app password” that can only be used for mail access.

Enabling app passwords requires that you first enable 2-factor authentication, and then generate a random app password to use with IMAP. Please follow the process described in the following Google document: https://support.google.com/mail/answer/185833

Once you have the app password for use with IMAP, you can use lei and imaps just like with any other IMAP server:

lei q -I https://lore.kernel.org/all/ -d mid \
  -o imaps://imap.gmail.com/lei/torvalds \
  <<< 'f:torvalds AND rt:1.month.ago..'

It requires a browser page reload for the folder to show up in your Gmail web UI.

Automating lei up runs

If you're setting up IMAP access, then you probably want IMAP updates to happen behind the scenes without your direct involvement. All you need to do is periodically run lei up --all (plus -q if you don't want non-critical output).

If you're just getting started, then you can set up a simple screen session with a watch command at a 10-minute interval, like so:

watch -n 600 lei up --all

You can then detach from the screen terminal and let that command continue behind the scenes. The main problem with this approach is that it won't survive a system reboot, so if everything is working well and you want to make the command a bit more permanent, you can set up a systemd user timer.

Here's the service file to put in ~/.config/systemd/user/lei-up-all.service:

[Unit]
Description=lei up --all service
ConditionPathExists=%h/.local/share/lei

[Service]
Type=oneshot
ExecStart=/usr/bin/lei up --all -q

[Install]
WantedBy=mail.target

And the timer file to put in ~/.config/systemd/user/lei-up-all.timer:

[Unit]
Description=lei up --all timer
ConditionPathExists=%h/.local/share/lei

[Timer]
OnUnitInactiveSec=10m

[Install]
WantedBy=default.target

Enable the timer:

systemctl --user enable --now lei-up-all.timer

You can use journalctl -xn to view the latest journal messages and make sure that the timer is running well.

CAUTION: user timers only run when the user is logged in. This is not actually that bad, as your keyring is not going to be unlocked unless you are logged into the desktop session. If you want to run lei up as a background process on some server, you should set up a system-level timer and use a different git-credential mechanism (e.g. store) — and you probably shouldn't do this on a shared system where you have to worry about your account credentials being stolen.

Coming up next

In the next installment we'll look at some other part of lei and public-inbox... I haven't yet decided which. :)

I am going to post a series of articles about public inbox's new lei tool (stands for “local email interface”, but is clearly a “lorelei” joke :)). In addition to being posted on the blog, it is also available on the workflows mailing list, so if you want to reply with a follow up, see this link:

What's the problem?

One of kernel developers' perennial complaints is that they just get Too Much Damn Email. Nobody in their right mind subscribes to “the LKML” (linux-kernel@vger.kernel.org) because it acts as a dumping ground for all email and the resulting firehose of patches and rants is completely impossible for a sane human being to follow.

For this reason, actual Linux development tends to happen on separate mailing lists dedicated to each particular subsystem. In turn, this has several negative side-effects:

  1. Developers working across multiple subsystems end up needing to subscribe to many different mailing lists in order to stay aware of what is happening in each area of the kernel.

  2. Contributors submitting patches find it increasingly difficult to know where to send their work, especially if their patches touch many different subsystems.

The get_maintainer.pl script is an attempt to solve the problem #2, and will look at the diff contents in order to suggest the list of recipients for each submitted patch. However, the submitter needs to be both aware of this script and know how to properly configure it in order to correctly use it with git-send-email.

Further complicating the matter is the fact that get_maintainer.pl relies on the entries in the MAINTAINERS file. Any edits to that file must go through the regular patch submission and review process and it may take days or weeks before the updates find their way to individual contributors.

Wouldn't it be nice if contributors could just send their patches to one place, and developers could just filter out the stuff that is relevant to their subsystem and ignore the rest?

lore meets lei

Public-inbox started out as a distributed mailing list archival framework with powerful search capabilities. We were happy to adopt it for our needs when we needed a proper home for kernel mailing list archives — thus, lore.kernel.org came online.

Even though it started out as merely a list archival service, it quickly became obvious that lore could be used for a lot more. Many developers ended up using its search features to quickly locate emails of interest, which in turn raised a simple question — what if there was a way to “save a search” and have it deliver all new incoming mail matching certain parameters straight to the developers' inbox?

You can now do this with lei.

lore's search syntax

Public-inbox uses Xapian behind the scenes, which allows to narrowly tailor the keyword database to very specific needs.

For example, did you know that you can search lore.kernel.org for patches that touch specific files? Here's every patch that touched the MAINTAINERS file:

How about every patch that modifies a function that starts with floppy_:

Say you're the floppy driver maintainer and wanted to find all mail that touches drivers/block/floppy.c and modifies any function that starts with floppy_ or has “floppy” in the subject and maybe any other mail that mentions “floppy” and has the words “bug” or “regression”? And maybe limit the results to just the past month.

Here's the query:

    (dfn:drivers/block/floppy.c OR dfhh:floppy_* OR s:floppy
     OR ((nq:bug OR nq:regression) AND nq:floppy))
    AND rt:1.month.ago..

And here are the results:

Now, how about getting that straight into your mailbox, so you don't have to subscribe to the (very busy) linux-block list, if you are the floppy maintainer?

Installing lei

Lei is very new and probably isn't yet available as part of your distribution, but I hope that it will change quickly once everyone realizes how awesome it is.

I'm working on packaging lei for Fedora, so depending on when you're reading this, try dnf install lei — maybe it's already there. If it's not in Fedora proper yet, you can get it from my copr:

    dnf copr enable icon/b4
    dnf install lei

If you're not a Fedora user, just consult the INSTALL file:

Maildir or IMAP?

Lei can deliver search results either into a local maildir, or to a remote IMAP folder (or both). We'll do local maildir first and look at IMAP in a future follow-up, as it requires some preparatory work.

Getting going with lei-q

Let's take the exact query we used for the floppy drive above, and get lei to deliver entire matching threads into a local maildir folder that we can read with mutt:

    lei q -I https://lore.kernel.org/all/ -o ~/Mail/floppy \
      --threads --dedupe=mid \
      '(dfn:drivers/block/floppy.c OR dfhh:floppy_* OR s:floppy \
      OR ((nq:bug OR nq:regression) AND nq:floppy)) \
      AND rt:1.month.ago..'

Before you run it, let's understand what it's going to do:

  • -I https://lore.kernel.org/all/ will query the aggregated index that contains information about all mailing lists archived on lore.kernel.org. It doesn't matter to which list the patch was sent — if it's on lore, the query will find it.

  • -o ~/Mail/floppy will create a new Maildir folder and put the search results there. Make sure that this folder doesn't already exist, or lei will clobber anything already present there (unless you use --augment, but I haven't tested this very extensively yet, so best to start with a clean slate).

  • --threads will deliver entire threads even if the match is somewhere in the middle of the discussion. This is handy if, for example, someone says “this sounds like a bug in the floppy subsystem” somewhere in the middle of a conversation and --threads will automatically get you the entire conversation context.

  • --dedupe=mid will deduplicate results based on the message-id header. The default behaviour is to dedupe based on the body contents, but with so many lists still adding junky “sent to the foo list” footers, this tends to result in too many duplicated results. Passing --dedupe=mid is less safe (someone could sneak in a bogus message with an identical message-id and have it delivered to you instead), but more convenient. YMMV, BYOB.

  • Make sure you don't omit the final “..” in the rt: query parameter, or you will only get mail that was sent on that date, not since that date.

As always, backslashes and newlines are there just for readability — you don't need to use them.

After the command completes, you should get something similar to what is below:

    # /usr/bin/curl -Sf -s -d '' https://lore.kernel.org/all/?x=m&t=1&q=(omitted)
    # /home/user/.local/share/lei/store 0/0
    # https://lore.kernel.org/all/ 122/?
    # https://lore.kernel.org/all/ 227/227
    # 150 written to /home/user/Mail/floppy/ (227 matches)

A few things to notice here:

  1. The command actually executes a curl call and retrieves the results as an mbox file.
  2. Lei will automatically convert 1.month.ago into a precise timestamp
  3. The command wrote 150 messages into the maildir we specified

We can now view these results with mutt (or neomutt):

    neomutt -f ~/Mail/floppy

It is safe to delete mail from this folder — it will not get re-added during lei up runs, as lei keeps track of seen messages on its own.

Updating with lei-up

By default, lei -q will save your search and start keeping track of it. To see your saved searches, run:

    $ lei ls-search
    /home/user/Mail/floppy

To fetch the newest messages:

    lei up ~/Mail/floppy

You will notice that the first line of output will say that lei automatically limited the results to only those that arrived since the last time lei was invoked for this particular saved search, so you will most likely get no new messages.

As you add more queries in the future, you can update them all at once using:

    lei up --all

Editing and discarding saved searches

To edit your saved search, just run lei edit-search. This will bring up your $EDITOR with the configuration file lei uses internally:

    ; to refresh with new results, run: lei up /home/user/Mail/floppy
    ; `maxuid' and `lastresult' lines are maintained by "lei up" for optimization
    [lei]
        q = (dfn:drivers/block/floppy.c OR dfhh:floppy_* OR s:floppy OR \
            ((nq:bug OR nq:regression) AND nq:floppy)) AND rt:1.month.ago..
    [lei "q"]
        include = https://lore.kernel.org/all/
        external = 1
        local = 1
        remote = 1
        threads = 1
        dedupe = mid
        output = maildir:/home/user/Mail/floppy
    [external "/home/user/.local/share/lei/store"]
        maxuid = 4821
    [external "https://lore.kernel.org/all/"]
        lastresult = 1636129583

This lets you edit the query parameters if you want to add/remove specific keywords. I suggest you test them on lore.kernel.org first before putting them into the configuration file, just to make sure you don't end up retrieving tens of thousands of messages by mistake.

To delete a saved search, run:

    lei forget-search ~/Mail/floppy

This doesn't delete anything from ~/Mail/floppy, it just makes it impossible to run lei up to update it.

Subscribing to entire mailing lists

To subscribe to entire mailing lists, you can query based on the list-id header. For example, if you wanted to replace your individual subscriptions to linux-block and linux-scsi with a single lei command, do:

    lei q -I https://lore.kernel.org/all/ -o ~/Mail/lists --dedupe=mid \
      '(l:linux-block.vger.kernel.org OR l:linux-scsi.vger.kernel.org) AND rt:1.week.ago..'

You can always edit this to add more lists at any time.

Coming next

In the next series installment, I'll talk about how to deliver these results straight to a remote IMAP folder and how to set up a systemd timer to get newest mail automatically (if that's your thing — I prefer to run lei up manually and only when I'm ready for it).

Linux development depends on the ability to send and receive emails. Unfortunately, it is common for corporate gateways to post-process both outgoing and incoming messages with the purposes of adding lengthy legal disclaimers or performing anti-phishing link quarantines, both of which interferes with regular patch flow.

While it is possible to configure free options like GMail to work well with sending and receiving patches, Google services may not be available in all geographical locales — or there may be other reasons why someone may prefer not to have a gmail.com address.

For this reason, we have partnered with Migadu to provide a mail hosting service under the linux.dev domain. If you're a Linux subsystem maintainer or reviewer and you need a mailbox to do your work, we should be able to help you out.

We hope to expand the service to include other kernel developers in the near future.

Please see our service documentation page for full details.

asciicast

One of the side-effects of the recent UMN Affair has been renewed scrutiny of the kernel development process that continues to rely on patches sent via email. This prompted me to revisit my end-to-end patch attestation work and get it to the point where I consider it to be both stable for day-to-day work and satisfactory from the point of view of underlying security and usability.

Goals of the project

These were the goals at the outset:

  • make it opt-in and don't interfere with existing tooling and workflows
  • be as behind-the-scenes and non-intrusive as possible
  • be simple and easy to understand, explain, and audit

I believe the proposed solution hits all of these points:

  • the implementation is very similar to DKIM and uses email headers for cryptographic attestation of all relevant content (“From:” and “Subject:” headers, plus the message body). Any existing tooling will simply ignore the unrecognized header.
  • cryptographic signing is done via a git hook invoked automatically by git-send-email (sendemail-validate), so it only needs to be set up once and doesn't require remembering to do any extra steps
  • the library doing the signing is only a few hundred lines of Python code and reuses the DKIM standard for most of its logic

Introducing patatt

The library is called “patatt” (for Patch Attestation, obviously), and can be installed from PyPi:

  • pip install --user patatt

It only requires PyNaCl (Python libsodium bindings), git, and gnupg (if signing with a PGP key). The detailed principles of operation are described on the PyPi project page, so I will not duplicate them here.

The screencast linked above shows patatt in action from the point of view of a regular patch contributor.

If you have an hour or so, you can also watch my presentation to the Digital Identity Attestation WG:

Youtube video

Support in b4

Patatt is fully supported starting with version 0.7.0 of b4 — here it is in action verifying a patch from Greg Kroah-Hartman:

$ b4 am 20210527101426.3283214-1-gregkh@linuxfoundation.org
[...]
---
  ✓ [PATCH] USB: gr_udc: remove dentry storage for debugfs file
  ---
  ✓ Signed: openpgp/gregkh@linuxfoundation.org
  ✓ Signed: DKIM/linuxfoundation.org
---
Total patches: 1
---
[...]

As you see above, b4 verified that the DKIM header was valid and that the PGP signature from Greg Kroah-Hartman passed as well, giving double assurance that the message was not modified between leaving Greg's computer and being checked on the end-system of the person retrieving the patch.

Keyring management

Patatt (and b4) also introduce the idea of tracking contributor public keys in the git repository itself. It may sound silly — how can the repository itself be a source of trusted keys? However, it actually makes a lot of sense and isn't any worse than any other currently used public key distribution mechanism:

  • git is already decentralized and can be mirrored to multiple locations, avoiding any single points of failure
  • all contents are already versioned and key additions/removals can be audited and “git blame’d”
  • git commits themselves can be cryptographically signed, which allows a small subset of developers to act as “trusted introducers” to many other contributors (mimicking the “keysigning” process)

Contributor public keys can be added either to the main branch itself, along with the project codebase (perhaps in the .keys toplevel subdirectory), or it can be managed in a dedicated ref, such as refs/meta/keyring). The latter can be especially useful for large projects where patches are collected by subsystem maintainers and then submitted as pull requests for inclusion into the mainline repository. Keeping the keyring in its own ref assures that it stays out of the way of regular development work but is still centrally managed and tracked.

Further work

I am hoping that people will now start using cryptographic attestation for the patches they send, however I certainly can't force anyone's hand. If you are a kernel subsystem maintainer or a core developer of some other project that relies on mailed-in patches for the submission and code review process, I hope that you will give this a try.

If you have any comments, concerns, or improvement suggestions, please reach out to the tools list.

We have recently announced the availability of our new mailing list platform that will eventually take on the duties currently performed by vger. Off the bat, there were a few questions about how it works under the hood — especially regarding DMARC-friendly cofiguration.

Under the hood

There is nothing fancy about the setup — mailing lists are managed by mlmmj, while all delivery operations are handled by Postfix. All outgoing mail is delivered via kernel.org mirror edge nodes (see the output of dig -t txt +short _spf.kernel.org if you are curious what they are), which is mostly done to speed up delivery by spreading out the queue across several systems.

When mlmmj writes to the archive directory of the mailing list, the message is immediately picked up by public-inbox-watch and appended to the public-inbox archive for that list. The archive is then replicated to lore.kernel.org (using grokmirror integration), which usually happens within 60 seconds. This replication is parallel to Postfix delivering mail to list subscribers, so is not dependent on the size of the mail queue. In theory, it shouldn't take longer than a few minutes for a message sent to a lists.linux.dev address to show up on lore.kernel.org. Similarly, messages should never go missing from the public-inbox archive if they got accepted by mlmmj for delivery (I know, famous last words).

Appeasing DMARC

It's a common misconception that mailing lists are somehow incompatible with DMARC. There are two key principles to follow:

  1. The Envelope-From should be that of the mailing list domain. For example, if I send an email to linux-staging, the envelope-from of the outgoing message will be changed to linux-staging+bounces-x@lists.linux.dev, with some subscriber bounce tracking information in place of x. This way, when MTAs are looking at DMARC for “konstantin@linuxfoundation.org”, the SPF check will be performed against “lists.linux.dev” instead of the domain of the original sender (“linuxfoundation.org”).

  2. There should be no changes to any of the existing message headers and no modifications to the message body. This is actually the part that generally trips up mailing list operators, as it is a long-standing practice to do two things when it comes to mailing lists: modify the subject to insert a terse list identifier (e.g. Subject: [linux-staging] [PATCH] ...) and append a footer to the message body with mailing list administrative info. Doing either of these things will almost certainly invalidate DKIM signatures and therefore cause the message to fail the DMARC check. It is correct to add proper List-Id/List-Subscribe/etc headers, though — and hopefully the domain of the original sender isn't misconfigured to include these headers into the DKIM signature (true story, saw this happen).

Following the above advice will work for nearly all cases except where a domain sets a DMARC policy, but the message is sent without a DKIM signature. If this happens, DMARC validators are supposed to use a kludgy “alignment” check where the envelope-from must match the From: header. In that particular case the messages we send out will fail DMARC checks, unfortunately. As far as I'm concerned, this is the fault of domain owners and is properly fixed by setting up proper DKIM signing and giving users a way to send outgoing mail via proper SMTP gateways.

(There is a way to work around this by rewriting the “From: “ header so that it matches the list domain as well, but let's just not go there, as rewriting the From: header is not an acceptable solution for lists working with code reviews.)

Here's a write-up I randomly found while writing this post that goes into some more detail regarding DMARC and mailing lists.

Why no ARC headers?

We don't currently add ARC headers — as far as I can tell, they aren't required for operating a mailing list that properly sets the envelope-from. In theory, using ARC signing may help with the “DMARC with no DKIM” corner-case above, but I'm not convinced this is worth the crazy header bloat. Who knows, I may change my mind about this in the future.

Parting words

In short, the best way to assure that a message sent via subspace.kernel.org is delivered to all subscribers is to send it from a domain that properly DKIM-signs all mail. If you run your own server, you can either set up OpenDKIM on your own (it's not complicated, honest), or you can pay some money to a company like Mailgun to do it for you.

Many people know that you can PGP-sign git objects — such as tags or commits themselves — but very few know of another attestation feature that git provides, which is signed git pushes.

Why sign git pushes? And how are they different from signed tags/commits?

Signed commits are great, but one thing they do not indicate is intent. For example, you could write some dangerous proof-of-concept code and push it into refs/heads/dangerous-do-not-use. You could even push it into some other fork hosted on a totally different server, just to make it clear that this is not production-ready code.

However, if your commits are PGP-signed, someone could take them and replay over any other branch in any other fork of your repository. To anyone checking the commit signatures, everything will look totally legitimate, as the actual commits are signed by you — never mind that they contain dangerous vulnerable code and were never intended to be pushed into something like refs/heads/next. At the very least, you will look reckless for pushing bad code, even though you were just messing around in a totally separate environment set up specifically for experimentation.

To help hedge against this problem, git provides developers a way to sign their actual pushes, as a means to attest “yes, I actually did intend to push these commits into this ref in this repository on this server, and here's my PGP signature to prove it.” When a push is signed, git will both check the signature it received against a trusted keyring and generate a “push certificate” that can be logged in something like a transparency log:

https://git.kernel.org/pub/scm/infra/transparency-logs/gitolite/git/1.git/plain/m?id=c06eebe4875d6103d580efcf8cd78cc9cc4b5192

Now, before you rush to enable signed pushes, please keep in mind that this functionality needs to first be enabled on the server side, and the vast majority of public git hosting forges do NOT have this turned on. Thankfully, git provides an if-asked setting, which will first check if the remote server supports signed pushes, and only generate the push certificate if the remote server accepts them. To enable this feature for yourself, simply add the following to your ~/.gitconfig:

[push]
    gpgSign = if-asked

Enabling on the server side

If you are running your own git server, then it is easy to enable this on the server side. Add the following either to each repository config file, or to /etc/gitconfig to enable it globally:

[receive]
    advertisePushOptions = true
    certNonceSeed = "<uniquerandomstring>"

You should set the certNonceSeed setting to some randomly generated long string that should be kept secret. It is combined with the timestamp to generate a one-time value (“nonce”) that the git client is required to sign and provides both a mechanism to prevent replay attacks and to offer proof that the certificate was generated for that specific server (though for others to verify this, they would need to know the value of the nonce seed).

Once you have this feature enabled, it is up to you what you do with the generated certificates. You can simply opt to record them, just like we do with our transparency log, or you can actually reject pushes that do not come with valid push certificates. I refer you to the git documentation and to our post-receive-activity-feed hook, which we use to generate the transparency log: