Konstantin Ryabitsev

kernel.org administrator

This week we made public all of our git commit logs, going back to 2013, in hopes to increase the transparency of high-importance kernel.org operations. All writes performed on public git repositories are now recorded in a public-inbox feed, which is immediately replicated to multiple worldwide servers. This is done with the goal do make it difficult for someone to make changes to any git repository hosted on kernel.org without it generating a verifiable, tamper-evident record.

The transparency logs are available at the following address:


You can read more detailed documentation here:


You can have lore.kernel.org mailing lists delivered right into your inbox straight from the git archive (in fact, this will work for any public-inbox server, not just for lore.kernel.org). It's efficient and (optionally) preserves a full copy of entire list archives on your system — should you wish to keep them.

Note: this requires grokmirror-2.0.2+, as earlier versions do not come with the grok-pi-piper utility.

Installing grokmirror-2.0

Easiest is to install from pip:

pip install --user grokmirror~=2.0.2

You may have grokmirror available from your distro packages, too, but make sure it's version 2.0.2 or above.

Installing procmail

Procmail should be available with your distribution, so install it like any other package.

Configuring procmail

Procmail configuration can easily be a topic for a whole book in itself, but if you just want to have messages delivered into your inbox, all you have to do is create a ~/.procmailrc with the following contents:


# Don't deliver duplicates sent to multiple lists
:0 Wh: .msgid.lock
| formail -D 8192 .msgid.cache

If your mailbox is not in ~/Mail, then you should adjust the above accordingly.

Configuring grokmirror

Create a ~/.config/lore.conf with the following contents. We'll use three lists as examples: git, linux-hardening, and linux-doc, but you'll obviously want to use the lists you care about. You can see which lists are available from https://lore.kernel.org/lists, or the exact git repositories on https://erol.kernel.org/.

toplevel = ~/.local/share/grokmirror/lore
log = ${toplevel}/grokmirror.log

site = https://lore.kernel.org
manifest = https://lore.kernel.org/manifest.js.gz

post_update_hook = ~/.local/bin/grok-pi-piper -c ~/.config/pi-piper.conf
refresh = 300
include = /git/*

The above assumes that you installed grokmirror with pip install --user. Now make the toplevel directory for the git repos:

$ mkdir -p ~/.local/share/grokmirror/lore

Configuring pi-piper

The last step is to create ~/.config/pi-piper.conf:

pipe = /usr/bin/procmail
shallow = yes

The important bit here is shallow = yes. Public-inbox stores every mail message as a separate commit, so once a message is piped to procmail and delivered, we usually don't care about keeping a copy of that commit any more. If you set shallow = yes, pi-piper will prune all but the last successfully processed commit out of your local git copy by turning those repos into shallow git repositories. This helps to greatly save disk space, especially for large archives.

If you do want to keep full archives, then don't set shallow. You can change your mind at any time by running git fetch _grokmirror master --unshallow in each underlying git repository (you can find them in ~/.local/share/grokmirror/lore/).

You can also specify the shallow option per list:

pipe = /usr/bin/procmail

shallow = yes

Running grok-pull

You can now run grok-pull to get the initial repo copies. Note, that during the first run grokmirror will perform full clones even if you specified shallow = yes in the pi-piper config, so it may take some time for large archives like those for the git list. However, once the pi-piper hook runs, they will be repacked to almost nothing. Future versions of grokmirror may become smarter about this and perform shallow clones from the beginning.

During the initial pi-piper run, there will be no mail delivered, as it will just perform initial setup and make a note where the HEAD is pointing. If you run grok-pull again, two things may happen:

  1. There will be no changes and grok-pull will exit right away
  2. If there are changes, they will be fetched and the hook will deliver them to procmail (and to your inbox)

Running in the background

You can run grok-pull in the background, where it will check for updates as frequently as the refresh setting says (300 seconds in the example above).

You can either background it “the old way”:

grok-pull -o -c ~/.config/lore.conf &

Or the new way, using a systemd user service:

$ cat .config/systemd/user/grok-pull@.service

Description=Grok-pull service for %I

ExecStart=%h/.local/bin/grok-pull -o -c %h/.config/%i.conf


$ systemctl --user enable grok-pull@lore
$ systemctl --user start grok-pull@lore

If you make changes to ~/.config/lore.conf, for example to add new lists, you will need to restart the service:

$ systemctl --user restart grok-pull@lore

Combining with mbsync

You can totally combine this with mbsync and deliver into the same local inbox. As a perk, any messages injected from grokmirror will be uploaded to your remote imap mailbox. See this post from mcgrof about configuring mbsync:


Email tools@linux.kernel.org if you have any trouble getting the above to work. The grok-pi-piper utility is fairly new, so it's entirely possible that it's full of bugs.

With git, a cryptographic signature on a commit provides strong integrity guarantees of the entire history of that branch going backwards, including all metadata and all contents of the repository, all the way back to the initial commit. This is possible because git records the hash of the previous commit in each next commit's metadata, creating an unbreakable cryptographic chain of records. If you can verify the cryptographic signature at the tip of the branch, you effectively verify that branch's entire history.

For example, let's take a look at linux.git, where the latest tag at the time of writing, v5.8-rc7, is signed by Linus Torvalds. (Tag signatures are slightly different from commit signatures — but in every practical sense they offer the same guarantees.)

If you have a git checkout of torvalds/linux.git, you are welcome to follow along.

$ git cat-file -p v5.8-rc7
object 92ed301919932f777713b9172e525674157e983d
type commit
tag v5.8-rc7
tagger Linus Torvalds <torvalds@linux-foundation.org> 1595798046 -0700

Linux 5.8-rc7


If we have Linus's key in our GnuPG keyring, we can easily verify that this tag is valid:

$ git verify-tag v5.8-rc7
gpg: Signature made Sun 26 Jul 2020 05:14:06 PM EDT
gpg:                using RSA key ABAF11C65A2970B130ABE3C479BE3E4300411886
gpg:                issuer "torvalds@linux-foundation.org"
gpg: Good signature from "Linus Torvalds <torvalds@kernel.org>" [unknown]
gpg:                 aka "Linus Torvalds <torvalds@linux-foundation.org>" [full]

The entire contents of this tag are signed, so this tells us that when Linus signed the tag, the “object hash” on his system was 92ed301919932f777713b9172e525674157e983d. But what exactly is that “object hash?” What are the contents that are hashed here? We can find out by asking git to tell us more about that object:

$ git cat-file -p 92ed301919932f777713b9172e525674157e983d
tree f16e3e4bcea2d875a17d2278ff67364b3277b10a
parent 1c8594b8427290c178c5d39885eacd9e41f68743
author Linus Torvalds <torvalds@linux-foundation.org> 1595798046 -0700
committer Linus Torvalds <torvalds@linux-foundation.org> 1595798046 -0700

Linux 5.8-rc7

The above contents in their entirety (slightly differently formatted) is what gives us the sha1 hash 92ed301919932f777713b9172e525674157e983d. So, thus far, we have unbroken cryptographic attestation from Linus's PGP signature to two other important bits about his git repository:

  • information about the state of his source code (tree)
  • information about the previous commit in the history (parent)
  • information about the author of the commit and the committer, which are the one and the same in this particular case
  • information about the date and time when the commit was made

Let's take a look a the tree line — what contents were hashed to arrive at that checksum? Let's ask git:

$ git cat-file -p f16e3e4bcea2d875a17d2278ff67364b3277b10a
100644 blob a0a96088c74f49a961a80bc0851a84214b0a9f83    .clang-format
100644 blob 43967c6b20151ee126db08e24758e3c789bcb844    .cocciconfig
100644 blob a64d219137455f407a7b1f2c6b156c5575852e9e    .get_maintainer.ignore
100644 blob 4b32eaa9571e64e47b51c43537063f56b204d8b3    .gitattributes
100644 blob d5f4804ed07cd36336a5e80f2a24e45104f902cf    .gitignore
100644 blob db4f2295bd9d792b47eb77aab179a9db0d968454    .mailmap
100644 blob a635a38ef9405fdfcfe97f3a435393c1e9cae971    COPYING
100644 blob 0787b5872906c8a92a63cde3961ed630e2ec93b6    CREDITS
040000 tree 37e1b4166d912d69738beca645d3d539da4bbf30    Documentation
040000 tree ba6955ee6228666d9ef117fdd45df2e53ba0e221    virt

This is the entirety of the top-level Linux kernel directory contents. The blob entries are sha1sum's of the actual file contents in that directory, so these are straightforward. Subdirectories are represented as other tree entries, which also consist of blob and tree records going all the way down to the last sublevel, which will only contain blobs.

So, tree f16e3e4bcea2d875a17d2278ff67364b3277b10a in the commit record is a checksum of other checksums and it allows us to verify that each and every file in linux.git is exactly the same as it was on Linus Torvalds' system when he created the commit. If any file is changed, the tree checksum would be different and the whole repository would be considered invalid, because the object hash would be different than in the commit.

Finally, if we look at the object mentioned in parent 1c8594b8427290c178c5d39885eacd9e41f68743, we will see that it is a hash of another commit, containing its own tree and parent records:

$ git cat-file -p 1c8594b8427290c178c5d39885eacd9e41f68743
tree d56de40028d9ecdbebfc2121fd1ce1213fa09fa2
parent 40c60ac32174f0c0c090cd31d0d1712f2478e689
parent ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51
author Linus Torvalds <torvalds@linux-foundation.org> 1595796417 -0700
committer Linus Torvalds <torvalds@linux-foundation.org> 1595796417 -0700
mergetag object ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51

If we cared to, we could walk each commit all the way back to the beginning of Linux git history, but we don't need to do that — verifying the checksum of the latest commit is sufficient to provide us all the necessary assurances about the entire history of that tree.

So, if we verify the signature on the tag and confirm that it matches the key belonging to Linus Torvalds, we will have strong cryptographic assurances that the repository on our disk is byte-for-byte the same as the repository on the computer belonging to Linus Torvalds — with all its contents and its entire history going back to the initial commit.

The difference between signed tags and signed commits is minimal — in the case of commit signing, the signature goes into the commit object itself. It is generally a good practice to PGP-sign commits, particularly in environments where multiple people can push to the same repository branch. Signed commits provide easy forensic proof of code origins (e.g. without commit signing Alice can fake a commit to pretend that it was actually authored by Bob). It also allows for easy verification in cases where someone wants to cherry-pick specific commits into their own tree without performing a git merge.

If you are looking to get started with git and PGP signatures, I can recommend my own Protecting Code Integrity guide, or its kernel-specific adaptation that ships with the kernel docs: Kernel Maintainer PGP Guide.

Obligatory note: sha1 is not considered sufficiently strong for hashing purposes these days, and this is widely acknowledged by the git development community. Significant efforts are under way to migrate git to stronger cryptographic hashes, but they require careful planning and implementation in order to minimize disruption to various projects using git. To my knowledge, there are no effective attacks against sha1 as used by git, and git developers have added further precautions against sha1 collision attacks in git itself, which helps buy some time until stronger hashing implementations are considered ready for real-world use.

For the past few weeks I've been working on a tool to fetch patches from lore.kernel.org and perform the kind of post-processing that is common for most maintainers:

  • rearrange the patches in proper order
  • tally up various follow-up trailers like Reviewed-by, Acked-by, etc
  • check if a newer series revision exists and automatically grab it

The tool started out as get-lore-mbox, but has now graduated into its own project called b4 — you can find it on git.kernel.org and pypi.

To use it, all you need to know is the message-id of one of the patches in the thread you want to grab. Once you have that, you can use the lore.kernel.org archive to grab the whole thread and prepare an mbox file that is ready to be fed to git-am:

$ b4 am 20200312131531.3615556-1-christian.brauner@ubuntu.com
Looking up https://lore.kernel.org/r/20200312131531.3615556-1-christian.brauner@ubuntu.com
Grabbing thread from lore.kernel.org
Analyzing 26 messages in the thread
Found new series v2
Will use the latest revision: v2
You can pick other revisions using the -vN flag
Writing ./v2_20200313_christian_brauner_ubuntu_com.mbx
  [PATCH v2 1/3] binderfs: port tests to test harness infrastructure
    Added: Reviewed-by: Kees Cook <keescook@chromium.org>
  [PATCH v2 2/3] binderfs_test: switch from /dev to a unique per-test mountpoint
    Added: Reviewed-by: Kees Cook <keescook@chromium.org>
  [PATCH v2 3/3] binderfs: add stress test for binderfs binder devices
    Added: Reviewed-by: Kees Cook <keescook@chromium.org>
Total patches: 3
 Link: https://lore.kernel.org/r/20200313152420.138777-1-christian.brauner@ubuntu.com
 Base: 2c523b344dfa65a3738e7039832044aa133c75fb
       git checkout -b v2_20200313_christian_brauner_ubuntu_com 2c523b344dfa65a3738e7039832044aa133c75fb
       git am ./v2_20200313_christian_brauner_ubuntu_com.mbx

As you can see, it was able to:

  • grab the whole thread
  • find the latest revision of the series (v2)
  • tally up the Reviewed-by trailers from Kees Cook and insert them into proper places
  • save all patches into an mbox file
  • show the commit-base (since it was specified)
  • show example git checkout and git am commands

Pretty neat, eh? You don't even need to know on which list the thread was posted — lore.kernel.org, through the magic of public-inbox, will try to find it automatically.

If you want to try it out, you can install b4 using:

pip install b4

(If you are wondering about the name, then you should click the following links: V'ger, Lore, B-4.)

The same, but now with patch attestation

On top of that, b4 also introduces support for cryptographic patch attestation, which makes it possible to verify that patches (and their metadata) weren't modified in transit between developers. This is still an experimental feature, but initial tests have been pretty encouraging.

I tried to design this mechanism so it fulfills the following requirements:

  • it must be unobtrusive and not pollute the mailing lists with attestation data
  • it must be possible to submit attestation after the patches were already sent off to the list (for example, from a different system, or after being asked to do so by the maintainer/reviewer)
  • it must not invent any new crypto or key distribution routines; this means sticking with PGP/GnuPG — at least for the time being

If you are curious about the technical details, I refer you to my original RFC where I describe the implementation.

If you simply want to start using it, then read on.

Submitting patch attestation

If you would like to submit attestation for a patch or a series of patches, the best time to do that is right after you use git send-email to submit your patches to the list. Simply run the following:

b4 attest *.patch

This will do the following:

  • create a set of 3 hashes per each patch (for the metadata, for the commit message, and for the patch itself)
  • add these hashes to a YAML-style document
  • PGP-sign the attestation document using the PGP key you set up with git
  • connect to mail.kernel.org:587 and send the attestation document to the signatures@kernel.org mailing list.

If you don't want to send that attestation right away, use the -n flag to simply generate the message and save it locally for review.

Verifying patch attestation

When running b4 am, the tool will automatically check if attestation is available by querying the signatures archive on lore.kernel.org. If it finds the attestation document, it will run gpg --verify on it. All of the following checks must pass before attestation is accepted:

  1. The signature must be “good” (signed contents weren't modified)
  2. The signature must be “valid” (not done with a revoked/expired key)
  3. The signature must be “trusted” (more on this below)

If all these checks pass, b4 am will show validation checkmarks next to the patches as it processes them:

$ b4 am 202003131609.228C4BBEDE@keescook
Looking up https://lore.kernel.org/r/202003131609.228C4BBEDE@keescook
Grabbing thread from lore.kernel.org
Writing ./v2_20200313_keescook_chromium_org.mbx
  [✓] [PATCH v2 1/2] selftests/harness: Move test child waiting logic
  [✓] [PATCH v2 2/2] selftests/harness: Handle timeouts cleanly
  [✓] Attestation-by: Kees Cook <keescook@chromium.org> (pgp: 8972F4DFDC6DC026)
Total patches: 2

These checkmarks give you assurance that all patches are exactly the same as when they were generated by the developer on their system.

Trusting on First Use (TOFU)

The most bothersome part of PGP is key management. In fact, it's the most bothersome part of any cryptographic attestation scheme — you either have to delegate your trust management to some shadowy Certification Authority, or you have to do a lot of decision making of your own when evaluating which keys to trust.

GnuPG tries to make it a bit easier by introducing the “Trust on First Use” (TOFU) model. The first time you come across a key, it is considered automatically trusted. If you suddenly come across a different key with the same identity on it, GnuPG will mark both keys as untrusted and let you decide on your own which one is “the right one.”

If you want to use the TOFU trust policy for patch attestation, you can add the following configuration parameter to your $HOME/.gitconfig:

  attestation-trust-model = tofu

Alternatively, you can use the traditional GnuPG trust model, where you rely on cross-certification (“key signing”) to make a decision on which keys you trust.

Where to get help

If either b4 or patch attestation are breaking for you — or with any questions or comments — please reach out for help on the kernel.org tools mailing list:

  • tools@linux.kernel.org

If Greg KH ever writes a book about his work as the stable kernel maintainer, it should be titled “Everyone must upgrade” (or it could be a Dr. Who fanfic about Cybermen, I guess). Today, I'm going to borrow a leaf out of that non-existent book to loudly proclaim that all patch submissions must include base-commit info.

What is a base-commit?

When you submit a single patch or a series of patches to a kernel maintainer, there is one important piece of information that they need to know in order to properly apply it. Specifically, they need to know what was the state of your git tree at the time when you wrote that code. Kernel development moves very quickly and there is no guarantee that a patch written mid-January would still apply at the beginning of February, unless there were no significant changes to any of the files your patch touches.

To solve this problem, you can include a small one-liner in your patch:

base-commit: abcde12345

This tells the person reviewing your patch that, at the time when you wrote your code, the latest commit in the git repository was abcde12345. It is now easy for the maintainer to do the following:

git checkout -b incoming_patch abcde12345
git am incoming_patch.mbx

This will tell git to create a new branch using abcde12345 as the parent commit and apply your patches at that point in history, ensuring that there will be no failed or rejected hunks due to recent code changes.

After reviewing your submission the maintainer can then merge that branch back into master, resolving any conflicts during the merge stage (they are really good at that), instead of having to modify patches during the git am stage. This saves maintainers a lot of work, because if your patches require revisions before they can be accepted, they don't have to manually edit anything at all.

Automated CI systems

Adding base-commit info really makes a difference when automated CI systems are involved. With more and more CI tests written for the Linux kernel, maintainers are hoping to be able to receive test reports for submitted patches even before they look at them, as a way to save time and effort.

Unfortunately, if the CI system does not have the base-commit information to work with, it will most likely try to apply your patches to the latest master. If that fails, there will be no CI report, which means the maintainers will be that much less likely to look at your patches.

How to add base-commit to your submission

If you are using git-format-patch (and you really should be), then you can already automatically include the base commit information. The easiest way to do so is by using topical branches and git format-patch --base=auto, for example:

$ git checkout -t -b my-topical-branch master
Branch 'my-topical-branch' set up to track local branch 'master'.
Switched to a new branch 'my-topical-branch'

[perform your edits and commits]

$ git format-patch --base=auto --cover-letter -o outgoing/ master

When you open outgoing/0000-cover-letter.patch for editing, you will notice that it will have the base-commit: trailer at the very bottom.

Once you have the set of patches to send, you should run them through checkpatch.pl to make sure that there are no glaring errors, and then submit them to the proper developers and mailing lists.

You can learn more by reading the submitting patches document, which now includes a section about base-commit info as well.

WriteFreely recently added support for creating and editing posts via the command-line wf tool and this functionality is available to all users at people.kernel.org.

On the surface, this is easy to use — you just need to write out a markdown-formatted file and then use wf publish myfile.md to push it into your blog (as draft). However, there are some formatting-related caveats to be aware of.


Firstly, WriteFreely's MD flavour differs from GitHub's in how it treats hard linebreaks: specifically, they will be preserved in the final output. On GitHub, if you write the following markdown:

Hello world! Dis next line. And dis next line.

And dis next para. Pretty neat, huh?

GitHub will collapse single linebreaks and only preserve the double linebreak to separate text into two paragraphs. On the contrary, WriteFreely will preserve all newlines as-is. I was first annoyed by difference from other markdown flavours, but then I realized that this is actually more like how email is rendered, and found zen and peace in this. :)

Therefore, publishing via wf post will apply stylistic markdown formatting and properly linkify all links, but will preserve all newlines as if you were reading an email message on lore.kernel.org.

There's some discussion about making markdown flavouring user-selectable, so if you want to add your voice to the discussion, please do it there.

Making it behave more like GitHub's markdown

If you do want to make it behave more like GitHub's markdown, you need to make sure that:

  1. You aren't using hard linebreaks to wrap your long lines
  2. You are publishing using --font serif


  $ gedit mypost.md
  $ cat mypost.md | wf post --font serif

This will render things more like how you get them by publishing from the WriteFreely's web interface.

Using “post” and “publish” actually puts things into drafts

I found this slightly confusing, but this is not a bad feature in itself, as it allows previewing your post before putting it out into the world. The way it works is:

  $ vim myfile.md
  $ cat myfile.md | wf post

You can then access that URL to make sure everything got rendered correctly. If something isn't quite right, you can update it via using its abcrandomstr preview URL:

  $ vim myfile.md
  $ cat myfile.md | wf update abcrandomstr

After you're satisfied, you can publish the post using the “move to Yourblog” link in the Drafts view.

Read the friendly manual

Please read the user guide and the markdown reference to try things out.

After my trusty Pebble 2 died about 6 months ago, I needed some kind of replacement that would do the following:

  1. buzz my wrist and show me alerts from any app (not just calls/texts)
  2. have a long-lasting battery without being huge
  3. count my daily steps and prod me when I haven't moved for a while
  4. not spy on me continuously and feed my data to a shady entity

The solution I settled on was an Amazfit Bip. It does almost all of the above:

  1. it offers Bluetooth LE with full notifications integration
  2. the battery lasts about a month (!) — my biggest problem is actually finding where the heck I put the charger, since I use it so rarely
  3. it has a step/heartbeat/sleep tracker

It also costs about US$80.

Now, the default smartphone app that comes with it doesn't particularly inspire confidence regarding that point #4 in my requirements list. I'm not trying to accuse anyone of anything, but I am not entirely brimming with confidence that the abundant personal data it collects about me is never going to be used for nefarious purposes.

The good news is that Amazfit Bip is fully supported by Gadgetbridge, which is a free software application installable via F-Droid. The version of Amazfit Bip that I got 6 months ago required a firmware update to work with Gadgetbridge, which required that I installed the Amazfit manufacturer app in order to upgrade it (which I did from one of the old junker phones I have lying around). However, after that I was able to pair it with Gadgetbridge on multiple phones. It is also not necessary to use the official app for the initial step, but the alternative looked more complicated than just using a junker phone to shortcut the process.

In the end, I spent $80 and a couple of hours to get a wrist gadget that does all I need, fits well, and doesn't spy on me. Freeyourgadget has a lot more info if you're interested.

If you need to run a CentOS/RHEL 7 system where GnuPG is stuck on version 2.0, you can use the gnupg22-static package I'm maintaining for our own needs at Fedora COPR.

It installs into /opt/gnupg22 so it doesn't clash with the version of GnuPG installed and used by the system.

To start using the gnupg22-static binaries, you will need to first enable the COPR repository:

# yum install yum-plugin-copr
# yum copr enable icon/lfit
# yum install gnupg22-static

The static compilation process is not perfect, because it hardcodes some defaults to point to the buildroot locations (which don't exist in an installed RPM), so you will need to tell your gpg binary where its auxiliary programs live by adding the file called ~/.gnupg/gpg.conf-2.2 with the following content:

agent-program   /opt/gnupg22/bin/gpg-agent
dirmngr-program /opt/gnupg22/bin/dirmngr

Now you just need to add a couple of aliases to your ~/.bash_profile:

  alias gpg="/opt/gnupg22/bin/gpg"
  alias gpg2="/opt/gnupg22/bin/gpg"

Alternatively, you can list /opt/gnupg22/bin earlier in the path:

export PATH=/opt/gnupg22/bin:$PATH

You should now be able to enjoy GnuPG-2.2 features such as support for ECC keys and Web Key Directories.

Imagine you are at a conference somewhere and a person you run across tells you about a project that you find interesting.

“Here,” they say, “I can share it with you if you've got Decent. I think it's a few days stale, sorry — roaming costs here are crazy and I don't trust the hotel wifi.”

You do have the Decent app on your phone, so you scan the QR code and wait until your phone shows you the usual “replica complete” checkmark. You don't even know if your phones used NFC, Bluetooth, or WiFi for this little chat (probably not Wifi, because it kinda sucks at all conferences), but the data went straight from their phone to yours without needing to hit any other systems on the net.

When you get to your hotel room, you decide to check out the project details. You open the Decent app on your phone and it shows a short ID code (xPqat3z). You can use it to replicate the project straight from your phone. You open up your travel laptop and run:

$ decent replicate xPqat3z
Looking for xPqat3z...found project "fizzbuzz" on "My Phone".
Cloning git...done.
Replicating ssb chains...done.

Since both your laptop and your phone are Bluetooth-paired, you are able to grab the project straight from your phone without having to hit the net again. You poke around the git tree, try to get things to work, but something is not quite right. You do have a fairly old version of libsnafu installed, so maybe that's the cause?

You run “decent tui” and search for “libsnafu”, which shows that someone has hit that very same problem just two days ago and opened a new issue, but there is no follow up yet.

Or is there?

You exit the tui and run:

[~/fizzbuzz.git]$ decent pull
Found .decent with 5 pub servers.
Establishing the fastest pub server to use...done.
Joining pub.kernel.org:80 using the invite code...done.
Updating git...1c8b2e7..620b5e2...done.
Updating ssb chains...done.
- 1 new participant
- 15 new conversations
- 12 new commits in 2 branches
- 15 patch updates (2 new)
- 9 issue updates (1 new)

Ding, when you view the libsnafu issue again you see that there have been new updates since it was created 2 days ago (the person you replicated from did say their replica was a bit stale). There is even a proposed patch that is supposed to fix the library compatibility problem.

You hit “enter” on the patch to review it. Seems like a straightforward fix, and you're happy to see that there is already a couple of Tested-by from the usual CI bots, and a Reviewed-by from Taylor Thompson, the person you spoke with just earlier today — in fact, this Reviewed-by has a timestamp of only a few minutes ago. You guess Taylor is catching up on some work before dinner as well.

You type in “:apply snafutest” and decent automatically creates a snafutest branch, checks it out, and applies the proposed patch on top of it. Presto, fizzbuzz finally builds and works for you.

Being a good citizen, you decide to comment on the issue and add your own Tested-by. Since it's your first time participating in this project, you need to join first:

[~/fizzbuzz.git]$ decent join
Creating new SSB keypair...done.
Starting SSB replication agent...done.
Your name [Alex Anderson]: Alex Anderson
Device identifier: Travel laptop
Self-identifying as Alex Anderson (Travel Laptop)
(Required) Agree with terms stated in COPYING? [y/n/view]: y
Adding COPYING agreement record...done.
(Required) Agree with terms stated in COVENANT? [y/n/view]: y
Adding COVENANT agreement record...done.
Cross-certify? [SSB/PGP/Keybase/None]: PGP
Adding PGP public key record...done.
Adding signed feed-id record...
Enter PGP key passphrase: *********

Now that you've initialized your own developer chain, you can comment on the issue. You give it a thumbs-up, add your own Tested-by to the proposed patch, and join the #fizzbuzz-users and #fizzbuzz-dev channels. All of these actions are simply added records to your local SSB feed, which gets replicated to the pub server you'd joined earlier.

Other members of the project will automatically start getting your SSB feed updates either from the pub server they joined, or from other developers they are following. If a pub server becomes unavailable, anyone who's ever run “decent pull” will have replicas of all participating developer and bot feeds (which means full copies of all issues, patches, developer discussions, and CI reports — for the entirety of the project's existence). They can switch to a different pub server, set up their own, or just replicate between developers using the SSB gossip protocol that powers it all behind the scenes.

What is this magic?

The “decent” tool is fiction, but the SSB framework I'm describing is not. SSB stands for “Secure Scuttlebutt” (it's nautical slang for “gossip,” so please stop guffawing). SSB is a distributed gossip protocol that is built on the concept of replicating individual “sigchains,” which are very similar in concept to git. Each record references the hash of the previous record, plus SSB uses an ECC key to cryptographically sign every new entry, such that the entire chain is fully verifiable and attestable. Unless someone has access to the ECC secret key created at the beginning of the SSB chain, they would not be able to add new entries — and unless the chain has never been replicated anywhere, all entries are immutable (or the replication simply breaks if any of the existing records in it are modified).

The sigchains are only part of the story — SSB also offers a decentralized replication protocol that works hard to make sure that there is no single point of trust and no single point of failure. It is able to replicate using “pub” servers that merely work as convenient mediators, but are unnecessary for the overall health of the SSB fabric. SSB replication can be done peer-to-peer via local network, over the Internet, via Tor, sneakernet, or anything at all that is able to send and receive bits.

The end-tool on the client uses these individual feeds to assemble a narrative, using message-id cross-references to construct threads of conversations. SSB is envisioned as a fully-private and fully-decentralized social network where each participating individual shares an immutable activity record choosing how much to share publicly, how much to share with specific individuals, and how much to keep fully private.

I suggest we co-opt SSB for free software development to make it truly decentralized, self-archiving, and fully attestable in all developer interactions.

What problem are you solving?

If you've read my previous entries, you know that we've been working hard to archive and preserve mailing list discussions on lore.kernel.org. Mailing lists have served us well, but their downsides are very obvious:

  • email started out as decentralized, but the vast majority of it now flows through the same handful of providers (Google, Amazon, Microsoft). It's becoming more and more difficult to set up your own mail server and expect that mail you send out will be properly accepted and delivered by the “big guns.” Did you set up SPF? DKIM? DMARC? ARC-Seal? Is your IP blacklisted on RBL? Are you sure? Have you checked today?
  • Receiving email is also frustrating regardless whether you are using your own mail server or relying on one of the “big-gun” providers. If you're not running a spamchecker, you are probably wasting some part of your day dealing with spam. If you do filter your mail, then I hope you check your spam folder regularly (in which case you are still wasting the same amount of time on it, just more infrequently and in longer chunks at once).
  • Mailing list servers are single points of failure that have to send out amazing amounts of redundant data to all subscribers. If a mailing list provider becomes unavailable, this basically kills all project discussions until a new mailing list is set up and everyone re-subscribes. Usually, this also results in the loss of previous archives, because everyone assumes someone else has a full copy.
  • Mailing lists are lossy. If your mail starts bouncing for some reason (e.g. due to a full inbox), you usually end up unsubscribed and miss out on potentially important conversations. Unless you go back and check the archives, you may never become aware of what you missed.
  • Mail clients routinely mangle structured data. Anyone who's ever had to send out a patch is aware of the long-ish “how to configure your mail client so it doesn't mangle patches” section in the git docs.
  • Even if you do manage to successfully send patches, sending any other kind of structured data is the wild west. Bot reports, automated issue notifications, etc, attempt to present data as both human- and machine-readable and largely fail at both.
  • Everyone has pretty much given up on email encryption and attestation. PGP signatures in email are mostly treated like noise, because all clients kinda suck at PGP, and, more importantly, meaningful trust delegation is hard.

Duh, that's why nobody uses email

Extremely few projects still use email for software development. The Kernel is obviously an important exception to this, among a few others, and it's usually the kind of thing people like to mention to point out how behind-the-times kernel developers are. They should stop acting like such dinosaurs, get with the program and just start using Git..b already!

However, using Git..b obviously introduces both a single point of failure and a single point of trust. Git repositories may be decentralized, but commits are merely the final product of a lot of developer back-and-forth that ends up walled-in inside the beautiful Git..b garden. You can export your project from Git..b, but very few people bother to do so, and almost nobody does it on a regular basis.

If a maintainer steps away and all development moves to a different fork, the project loses part of its history that is not committed to git, because all its issues, CI test results, pull requests and conversations are now split between the old fork and the new fork. If the original developer has a personal crisis and purges their original repository, that part of the project history is now forever gone, even if the code remains.

Furthermore, if you've been around for a while, you've seen beautiful gardens come and go. Before Github there was Sourceforge, which at some point poisoned its beautiful wells by bundling adware with binary downloads. Google Code has come and gone, like most Google things do. Github has seen a significant exodus of projects to Gitlab after it got acquired by Microsoft, and there's certainly no guarantee that Gitlab won't be acquired by some other $TechGiant looking to spruce up its open-source community image.

Git is decentralized and self-archiving. Mailing lists... sort-of are — at least we are trying to keep them that way, but it's becoming more and more difficult. Even those projects that use mailing lists for patches may not use them for issue tracking or CI reports (for example, not all Bugzilla activity goes to mailing lists and Patchwork allows attaching CI reports directly to patches using its REST API).

I think it's way past due time for us to come up with a solution that would offer decentralized, self-archiving, fully attestable, “cradle-to-grave” development platform that covers all aspects of project development and not just the code. It must move us away from mailing lists, but avoid introducing single points of trust, authority, and failure.

And you think SSB is it?

I believe SSB offers us a usable framework that we can build on to achieve this goal. The concept of sigchains is very easy to convey due to their close resemblance to git, and the protocol's decentralized, mesh-like P2P replication nature is an important feature that will help us avoid introducing single points of failure. Every participant receives the full history of the project, same as currently every participant receives the full history of the project's code when they clone the git repository.

In SSB, every sigchain (“feed”) is tied to a single identity, which is usually a device belonging to a real person or it can be a project-specific feed used by a bot. Developers would naturally have multiple identities (“work laptop”, “phone”, “travel laptop”) and they can add new ones and abandon old ones as they add work environments or lose access to old devices. The feeds can be authenticated by each individual developer by cross-signing them with another identity framework (Keybase, PGP, etc), or they can remain fully pseudonymous.

The important part here is that once an identity is established, all records created by that identity are attestable to the same person or entity that were in the possession of the private ECC key at the time when that feed was created. When a maintainer applies someone's patches to their git tree, simply referencing the SBB record-id of the patch as part of the commit message is enough to provide a full immutable attestation chain for that code. It's like Signed-off-by on very powerful drugs.

Spammy feeds can be excluded using a blocklist file in the project repository, or a project can choose to have an allowlist explicitly listing authorized feeds (as long as they provide instructions on how to request addition to that list for the purposes of participation). Developers violating established community guidelines can be terminated from the project by adding a record indicating that their feeds should be replicated up to a specific entry and no further.

Since SSB relies on cryptographic keypairs by design, it is easy to set up fully private discussion groups that are end-to-end encrypted to all group participants. This makes it easy to discuss sensitive subjects like security vulnerabilities without needing to rely on any other means of communication or any other privacy tools outside of what is already provided by SSB.

(We'll ignore for the moment the fact that all implementations of SSB are written in Javascript/NPM — wait, don't go, hear me out! — since it uses standard crypto and the records themselves are json, everything about SSB is easily portable.)

Won't it raise the entry barrier?

I am acutely aware that such system would significantly raise the participation barrier. It's one thing to open an issue or send a pull request on Git..b, attach a patch to a Bugzilla entry, or send an email to a mailing list. We cannot expect that a “drive-by” contributor would install a client tool, replicate potentially tens of gigabytes of individual developer feeds, and create their own SSB identity simply to open an issue or submit a single patch. We would need full-featured web clients that would allow someone to browse projects in a similar fashion as they would browse them on Git..b, including viewing issues, submitting bug reports, and sending patches and pull requests.

The main distinction from Git..b here is that these web clients — let's call them “community bridges” — would merely be rich API translation endpoints contributing to fully distributed projects without locking developers into any walled gardens. They would be enabling collaboration without introducing central dependencies and points of failure and anyone choosing to participate on their own terms using their own free software stack (e.g. with our fictional decent tool) would be fully empowered to do so. In fact, Git..b and others can, too, become community bridges and allow their clients to participate in distributed projects for a truly “cross-garden” development experience.

(The web bridge would necessarily need to manage the contributor's identity to create their sigchain feed, but for “drive-by” contributions this would be a reasonable trade-off. Anyone can decide to switch to a local client and start a new identity at any time if they feel they can no longer trust the bridge they are using.)

I'm intrigued. What happens next?

I've been mulling this over for a while now, and this is really just the first brain dump of my thoughts. At this point, I need everyone to point out all the possible ways why this wouldn't work (much more appreciated if it's followed by a “but it might work if...”). I realize that there is no way to leave a comment here, but you can reach out to me via either floss.social or by emailing me at mricon@kernel.org.

The Linux development community has already given us a powerful distributed development tool in the form of git, and I firmly believe that it is able to deliver a git satellite tool that would encompass all aspects of project development beyond just code. I hope that by outlining my thoughts here I'll be able to jumpstart the necessary discussion that would eventually get us there.

PS: The title of this post references a talk by Greg KH titled Patches carved into stone tablets that goes into details on why kernel developers still use mailing lists for everything.

The mail archiving system at lore.kernel.org uses public-inbox, which relies on git as the mechanism to store messages. This makes the entire archive collection very easy to replicate using grokmirror — the same tool we use to mirror git.kernel.org repositories across multiple worldwide frontends.

Setting up

It doesn't take a lot to get started. First, install grokmirror either from pip:

pip install grokmirror

or from your distro repositories:

dnf install python3-grokmirror

Next, you will need a config file and a location where you'll store your copy (keep in mind, at the time of writing all of the archives take up upwards of 20GB):

# Use the erol mirror instead of lore directly
site = https://erol.kernel.org
manifest = https://erol.kernel.org/manifest.js.gz
toplevel = /path/to/your/local/archive
mymanifest = %(toplevel)s/manifest.js.gz
log = %(toplevel)s/pull.log
pull_threads = 2

Save this file into lore.conf and just run:

grok-pull -v -c lore.conf

The initial clone is going to take a long time, but after it is complete, consecutive runs of grok-pull will only update those repositories that have changed. If new repositories are added, they will be automatically cloned and added to your mirror of the archive.

Note: this by itself is not enough to run public-inbox on your local system, because there's a lot more to public-inbox than just git archives of all messages. For starters, the archives would need to be indexed into a series of sqlite3 and xapian databases, and the end-result would take up a LOT more than 20GB.

Future work

We are hoping to fund the development of a set of tools around public-inbox archives that would allow you to do cool stuff with submitted patches without needing to subscribe to LKML or any other list archived by lore.kernel.org. We expect this would be a nice feature that various CI bots can use to automatically discover and test patches without needing to bother about SMTP and incoming mail processing. If you would like to participate, please feel free to join the public-inbox development list.