Patches carved into developer sigchains
Imagine you are at a conference somewhere and a person you run across tells you about a project that you find interesting.
“Here,” they say, “I can share it with you if you've got Decent. I think it's a few days stale, sorry — roaming costs here are crazy and I don't trust the hotel wifi.”
You do have the Decent app on your phone, so you scan the QR code and wait until your phone shows you the usual “replica complete” checkmark. You don't even know if your phones used NFC, Bluetooth, or WiFi for this little chat (probably not Wifi, because it kinda sucks at all conferences), but the data went straight from their phone to yours without needing to hit any other systems on the net.
When you get to your hotel room, you decide to check out the project details. You open the Decent app on your phone and it shows a short ID code (
xPqat3z). You can use it to replicate the project straight from your phone. You open up your travel laptop and run:
$ decent replicate xPqat3z Looking for xPqat3z...found project "fizzbuzz" on "My Phone". Cloning git...done. Replicating ssb chains...done.
Since both your laptop and your phone are Bluetooth-paired, you are able to grab the project straight from your phone without having to hit the net again. You poke around the git tree, try to get things to work, but something is not quite right. You do have a fairly old version of
libsnafu installed, so maybe that's the cause?
You run “
decent tui” and search for “
libsnafu”, which shows that someone has hit that very same problem just two days ago and opened a new issue, but there is no follow up yet.
Or is there?
You exit the tui and run:
[~/fizzbuzz.git]$ decent pull Found .decent with 5 pub servers. Establishing the fastest pub server to use...done. Joining pub.kernel.org:80 using the invite code...done. Updating git...1c8b2e7..620b5e2...done. Updating ssb chains...done. - 1 new participant - 15 new conversations - 12 new commits in 2 branches - 15 patch updates (2 new) - 9 issue updates (1 new)
Ding, when you view the
libsnafu issue again you see that there have been new updates since it was created 2 days ago (the person you replicated from did say their replica was a bit stale). There is even a proposed patch that is supposed to fix the library compatibility problem.
You hit “enter” on the patch to review it. Seems like a straightforward fix, and you're happy to see that there is already a couple of
Tested-by from the usual CI bots, and a
Reviewed-by from Taylor Thompson, the person you spoke with just earlier today — in fact, this
Reviewed-by has a timestamp of only a few minutes ago. You guess Taylor is catching up on some work before dinner as well.
You type in “
:apply snafutest” and
decent automatically creates a
snafutest branch, checks it out, and applies the proposed patch on top of it. Presto,
fizzbuzz finally builds and works for you.
Being a good citizen, you decide to comment on the issue and add your own
Tested-by. Since it's your first time participating in this project, you need to join first:
[~/fizzbuzz.git]$ decent join Creating new SSB keypair...done. Starting SSB replication agent...done. Your name [Alex Anderson]: Alex Anderson Device identifier: Travel laptop Self-identifying as Alex Anderson (Travel Laptop) (Required) Agree with terms stated in COPYING? [y/n/view]: y Adding COPYING agreement record...done. (Required) Agree with terms stated in COVENANT? [y/n/view]: y Adding COVENANT agreement record...done. Cross-certify? [SSB/PGP/Keybase/None]: PGP Adding PGP public key record...done. Adding signed feed-id record... Enter PGP key passphrase: ********* Done.
Now that you've initialized your own developer chain, you can comment on the issue. You give it a thumbs-up, add your own
Tested-by to the proposed patch, and join the
#fizzbuzz-dev channels. All of these actions are simply added records to your local SSB feed, which gets replicated to the pub server you'd joined earlier.
Other members of the project will automatically start getting your SSB feed updates either from the pub server they joined, or from other developers they are following. If a pub server becomes unavailable, anyone who's ever run “
decent pull” will have replicas of all participating developer and bot feeds (which means full copies of all issues, patches, developer discussions, and CI reports — for the entirety of the project's existence). They can switch to a different pub server, set up their own, or just replicate between developers using the SSB gossip protocol that powers it all behind the scenes.
What is this magic?
decent” tool is fiction, but the SSB framework I'm describing is not. SSB stands for “Secure Scuttlebutt” (it's nautical slang for “gossip,” so please stop guffawing). SSB is a distributed gossip protocol that is built on the concept of replicating individual “sigchains,” which are very similar in concept to git. Each record references the hash of the previous record, plus SSB uses an ECC key to cryptographically sign every new entry, such that the entire chain is fully verifiable and attestable. Unless someone has access to the ECC secret key created at the beginning of the SSB chain, they would not be able to add new entries — and unless the chain has never been replicated anywhere, all entries are immutable (or the replication simply breaks if any of the existing records in it are modified).
The sigchains are only part of the story — SSB also offers a decentralized replication protocol that works hard to make sure that there is no single point of trust and no single point of failure. It is able to replicate using “pub” servers that merely work as convenient mediators, but are unnecessary for the overall health of the SSB fabric. SSB replication can be done peer-to-peer via local network, over the Internet, via Tor, sneakernet, or anything at all that is able to send and receive bits.
The end-tool on the client uses these individual feeds to assemble a narrative, using message-id cross-references to construct threads of conversations. SSB is envisioned as a fully-private and fully-decentralized social network where each participating individual shares an immutable activity record choosing how much to share publicly, how much to share with specific individuals, and how much to keep fully private.
I suggest we co-opt SSB for free software development to make it truly decentralized, self-archiving, and fully attestable in all developer interactions.
What problem are you solving?
If you've read my previous entries, you know that we've been working hard to archive and preserve mailing list discussions on lore.kernel.org. Mailing lists have served us well, but their downsides are very obvious:
- email started out as decentralized, but the vast majority of it now flows through the same handful of providers (Google, Amazon, Microsoft). It's becoming more and more difficult to set up your own mail server and expect that mail you send out will be properly accepted and delivered by the “big guns.” Did you set up SPF? DKIM? DMARC? ARC-Seal? Is your IP blacklisted on RBL? Are you sure? Have you checked today?
- Receiving email is also frustrating regardless whether you are using your own mail server or relying on one of the “big-gun” providers. If you're not running a spamchecker, you are probably wasting some part of your day dealing with spam. If you do filter your mail, then I hope you check your spam folder regularly (in which case you are still wasting the same amount of time on it, just more infrequently and in longer chunks at once).
- Mailing list servers are single points of failure that have to send out amazing amounts of redundant data to all subscribers. If a mailing list provider becomes unavailable, this basically kills all project discussions until a new mailing list is set up and everyone re-subscribes. Usually, this also results in the loss of previous archives, because everyone assumes someone else has a full copy.
- Mailing lists are lossy. If your mail starts bouncing for some reason (e.g. due to a full inbox), you usually end up unsubscribed and miss out on potentially important conversations. Unless you go back and check the archives, you may never become aware of what you missed.
- Mail clients routinely mangle structured data. Anyone who's ever had to send out a patch is aware of the long-ish “how to configure your mail client so it doesn't mangle patches” section in the git docs.
- Even if you do manage to successfully send patches, sending any other kind of structured data is the wild west. Bot reports, automated issue notifications, etc, attempt to present data as both human- and machine-readable and largely fail at both.
- Everyone has pretty much given up on email encryption and attestation. PGP signatures in email are mostly treated like noise, because all clients kinda suck at PGP, and, more importantly, meaningful trust delegation is hard.
Duh, that's why nobody uses email
Extremely few projects still use email for software development. The Kernel is obviously an important exception to this, among a few others, and it's usually the kind of thing people like to mention to point out how behind-the-times kernel developers are. They should stop acting like such dinosaurs, get with the program and just start using Git..b already!
However, using Git..b obviously introduces both a single point of failure and a single point of trust. Git repositories may be decentralized, but commits are merely the final product of a lot of developer back-and-forth that ends up walled-in inside the beautiful Git..b garden. You can export your project from Git..b, but very few people bother to do so, and almost nobody does it on a regular basis.
If a maintainer steps away and all development moves to a different fork, the project loses part of its history that is not committed to git, because all its issues, CI test results, pull requests and conversations are now split between the old fork and the new fork. If the original developer has a personal crisis and purges their original repository, that part of the project history is now forever gone, even if the code remains.
Furthermore, if you've been around for a while, you've seen beautiful gardens come and go. Before Github there was Sourceforge, which at some point poisoned its beautiful wells by bundling adware with binary downloads. Google Code has come and gone, like most Google things do. Github has seen a significant exodus of projects to Gitlab after it got acquired by Microsoft, and there's certainly no guarantee that Gitlab won't be acquired by some other $TechGiant looking to spruce up its open-source community image.
Git is decentralized and self-archiving. Mailing lists... sort-of are — at least we are trying to keep them that way, but it's becoming more and more difficult. Even those projects that use mailing lists for patches may not use them for issue tracking or CI reports (for example, not all Bugzilla activity goes to mailing lists and Patchwork allows attaching CI reports directly to patches using its REST API).
I think it's way past due time for us to come up with a solution that would offer decentralized, self-archiving, fully attestable, “cradle-to-grave” development platform that covers all aspects of project development and not just the code. It must move us away from mailing lists, but avoid introducing single points of trust, authority, and failure.
And you think SSB is it?
I believe SSB offers us a usable framework that we can build on to achieve this goal. The concept of sigchains is very easy to convey due to their close resemblance to git, and the protocol's decentralized, mesh-like P2P replication nature is an important feature that will help us avoid introducing single points of failure. Every participant receives the full history of the project, same as currently every participant receives the full history of the project's code when they clone the git repository.
In SSB, every sigchain (“feed”) is tied to a single identity, which is usually a device belonging to a real person or it can be a project-specific feed used by a bot. Developers would naturally have multiple identities (“work laptop”, “phone”, “travel laptop”) and they can add new ones and abandon old ones as they add work environments or lose access to old devices. The feeds can be authenticated by each individual developer by cross-signing them with another identity framework (Keybase, PGP, etc), or they can remain fully pseudonymous.
The important part here is that once an identity is established, all records created by that identity are attestable to the same person or entity that were in the possession of the private ECC key at the time when that feed was created. When a maintainer applies someone's patches to their git tree, simply referencing the SBB record-id of the patch as part of the commit message is enough to provide a full immutable attestation chain for that code. It's like
Signed-off-by on very powerful drugs.
Spammy feeds can be excluded using a blocklist file in the project repository, or a project can choose to have an allowlist explicitly listing authorized feeds (as long as they provide instructions on how to request addition to that list for the purposes of participation). Developers violating established community guidelines can be terminated from the project by adding a record indicating that their feeds should be replicated up to a specific entry and no further.
Since SSB relies on cryptographic keypairs by design, it is easy to set up fully private discussion groups that are end-to-end encrypted to all group participants. This makes it easy to discuss sensitive subjects like security vulnerabilities without needing to rely on any other means of communication or any other privacy tools outside of what is already provided by SSB.
Won't it raise the entry barrier?
I am acutely aware that such system would significantly raise the participation barrier. It's one thing to open an issue or send a pull request on Git..b, attach a patch to a Bugzilla entry, or send an email to a mailing list. We cannot expect that a “drive-by” contributor would install a client tool, replicate potentially tens of gigabytes of individual developer feeds, and create their own SSB identity simply to open an issue or submit a single patch. We would need full-featured web clients that would allow someone to browse projects in a similar fashion as they would browse them on Git..b, including viewing issues, submitting bug reports, and sending patches and pull requests.
The main distinction from Git..b here is that these web clients — let's call them “community bridges” — would merely be rich API translation endpoints contributing to fully distributed projects without locking developers into any walled gardens. They would be enabling collaboration without introducing central dependencies and points of failure and anyone choosing to participate on their own terms using their own free software stack (e.g. with our fictional
decent tool) would be fully empowered to do so. In fact, Git..b and others can, too, become community bridges and allow their clients to participate in distributed projects for a truly “cross-garden” development experience.
(The web bridge would necessarily need to manage the contributor's identity to create their sigchain feed, but for “drive-by” contributions this would be a reasonable trade-off. Anyone can decide to switch to a local client and start a new identity at any time if they feel they can no longer trust the bridge they are using.)
I'm intrigued. What happens next?
I've been mulling this over for a while now, and this is really just the first brain dump of my thoughts. At this point, I need everyone to point out all the possible ways why this wouldn't work (much more appreciated if it's followed by a “but it might work if...”). I realize that there is no way to leave a comment here, but you can reach out to me via either floss.social or by emailing me at email@example.com.
The Linux development community has already given us a powerful distributed development tool in the form of git, and I firmly believe that it is able to deliver a git satellite tool that would encompass all aspects of project development beyond just code. I hope that by outlining my thoughts here I'll be able to jumpstart the necessary discussion that would eventually get us there.
PS: The title of this post references a talk by Greg KH titled Patches carved into stone tablets that goes into details on why kernel developers still use mailing lists for everything.