What does a PGP signature on a git commit prove?
With git, a cryptographic signature on a commit provides strong integrity guarantees of the entire history of that branch going backwards, including all metadata and all contents of the repository, all the way back to the initial commit. This is possible because git records the hash of the previous commit in each next commit's metadata, creating an unbreakable cryptographic chain of records. If you can verify the cryptographic signature at the tip of the branch, you effectively verify that branch's entire history.
For example, let's take a look at
linux.git, where the latest tag at the time of writing,
v5.8-rc7, is signed by Linus Torvalds. (Tag signatures are slightly different from commit signatures — but in every practical sense they offer the same guarantees.)
If you have a git checkout of
torvalds/linux.git, you are welcome to follow along.
$ git cat-file -p v5.8-rc7 object 92ed301919932f777713b9172e525674157e983d type commit tag v5.8-rc7 tagger Linus Torvalds <firstname.lastname@example.org> 1595798046 -0700 Linux 5.8-rc7 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl8d8h4eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGd0sH/2iktYhMwPxzzpnb eI3OuTX/mRn4vUFOfpx9dmGVleMfKkpbvnn3IY7wA62Qfv7J7lkFRa1Bd1DlqXfW yyGTGDSKG5chiRCOU3s9ni92M4xIzFlrojyt/dIK2lUGMzUPI9FGlZRGQLKqqwLh 2syOXRWbcQ7e52IHtDSy3YBNveKRsP4NyqV+GxGiex18SMB/M3Pw9EMH614eDPsE QAGQi5uGv4hPJtFHgXgUyBPLFHIyFAiVxhFRIj7u2DSEKY79+wO1CGWFiFvdTY4B CbqKXLffY3iQdFsLJkj9Dl8cnOQnoY44V0EBzhhORxeOp71StUVaRwQMFa5tp48G 171s5Hs= =BQIl -----END PGP SIGNATURE-----
If we have Linus's key in our GnuPG keyring, we can easily verify that this tag is valid:
$ git verify-tag v5.8-rc7 gpg: Signature made Sun 26 Jul 2020 05:14:06 PM EDT gpg: using RSA key ABAF11C65A2970B130ABE3C479BE3E4300411886 gpg: issuer "email@example.com" gpg: Good signature from "Linus Torvalds <firstname.lastname@example.org>" [unknown] gpg: aka "Linus Torvalds <email@example.com>" [full]
The entire contents of this tag are signed, so this tells us that when Linus signed the tag, the “object hash” on his system was
92ed301919932f777713b9172e525674157e983d. But what exactly is that “object hash?” What are the contents that are hashed here? We can find out by asking git to tell us more about that object:
$ git cat-file -p 92ed301919932f777713b9172e525674157e983d tree f16e3e4bcea2d875a17d2278ff67364b3277b10a parent 1c8594b8427290c178c5d39885eacd9e41f68743 author Linus Torvalds <firstname.lastname@example.org> 1595798046 -0700 committer Linus Torvalds <email@example.com> 1595798046 -0700 Linux 5.8-rc7
The above contents in their entirety (slightly differently formatted) is what gives us the sha1 hash
92ed301919932f777713b9172e525674157e983d. So, thus far, we have unbroken cryptographic attestation from Linus's PGP signature to two other important bits about his git repository:
- information about the state of his source code (
- information about the previous commit in the history (
- information about the author of the commit and the committer, which are the one and the same in this particular case
- information about the date and time when the commit was made
Let's take a look a the
tree line — what contents were hashed to arrive at that checksum? Let's ask git:
$ git cat-file -p f16e3e4bcea2d875a17d2278ff67364b3277b10a 100644 blob a0a96088c74f49a961a80bc0851a84214b0a9f83 .clang-format 100644 blob 43967c6b20151ee126db08e24758e3c789bcb844 .cocciconfig 100644 blob a64d219137455f407a7b1f2c6b156c5575852e9e .get_maintainer.ignore 100644 blob 4b32eaa9571e64e47b51c43537063f56b204d8b3 .gitattributes 100644 blob d5f4804ed07cd36336a5e80f2a24e45104f902cf .gitignore 100644 blob db4f2295bd9d792b47eb77aab179a9db0d968454 .mailmap 100644 blob a635a38ef9405fdfcfe97f3a435393c1e9cae971 COPYING 100644 blob 0787b5872906c8a92a63cde3961ed630e2ec93b6 CREDITS 040000 tree 37e1b4166d912d69738beca645d3d539da4bbf30 Documentation [...] 040000 tree ba6955ee6228666d9ef117fdd45df2e53ba0e221 virt
This is the entirety of the top-level Linux kernel directory contents. The
blob entries are sha1sum's of the actual file contents in that directory, so these are straightforward. Subdirectories are represented as other
tree entries, which also consist of
tree records going all the way down to the last sublevel, which will only contain blobs.
tree f16e3e4bcea2d875a17d2278ff67364b3277b10a in the commit record is a checksum of other checksums and it allows us to verify that each and every file in
linux.git is exactly the same as it was on Linus Torvalds' system when he created the commit. If any file is changed, the tree checksum would be different and the whole repository would be considered invalid, because the object hash would be different than in the commit.
Finally, if we look at the object mentioned in
parent 1c8594b8427290c178c5d39885eacd9e41f68743, we will see that it is a hash of another commit, containing its own
$ git cat-file -p 1c8594b8427290c178c5d39885eacd9e41f68743 tree d56de40028d9ecdbebfc2121fd1ce1213fa09fa2 parent 40c60ac32174f0c0c090cd31d0d1712f2478e689 parent ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51 author Linus Torvalds <firstname.lastname@example.org> 1595796417 -0700 committer Linus Torvalds <email@example.com> 1595796417 -0700 mergetag object ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51 [...]
If we cared to, we could walk each commit all the way back to the beginning of Linux git history, but we don't need to do that — verifying the checksum of the latest commit is sufficient to provide us all the necessary assurances about the entire history of that tree.
So, if we verify the signature on the tag and confirm that it matches the key belonging to Linus Torvalds, we will have strong cryptographic assurances that the repository on our disk is byte-for-byte the same as the repository on the computer belonging to Linus Torvalds — with all its contents and its entire history going back to the initial commit.
The difference between signed tags and signed commits is minimal — in the case of commit signing, the signature goes into the commit object itself. It is generally a good practice to PGP-sign commits, particularly in environments where multiple people can push to the same repository branch. Signed commits provide easy forensic proof of code origins (e.g. without commit signing Alice can fake a commit to pretend that it was actually authored by Bob). It also allows for easy verification in cases where someone wants to cherry-pick specific commits into their own tree without performing a git merge.
If you are looking to get started with git and PGP signatures, I can recommend my own Protecting Code Integrity guide, or its kernel-specific adaptation that ships with the kernel docs: Kernel Maintainer PGP Guide.
Obligatory note: sha1 is not considered sufficiently strong for hashing purposes these days, and this is widely acknowledged by the git development community. Significant efforts are under way to migrate git to stronger cryptographic hashes, but they require careful planning and implementation in order to minimize disruption to various projects using git. To my knowledge, there are no effective attacks against sha1 as used by git, and git developers have added further precautions against sha1 collision attacks in git itself, which helps buy some time until stronger hashing implementations are considered ready for real-world use.