What does a PGP signature on a git commit prove?

With git, a cryptographic signature on a commit provides strong integrity guarantees of the entire history of that branch going backwards, including all metadata and all contents of the repository, all the way back to the initial commit. This is possible because git records the hash of the previous commit in each next commit's metadata, creating an unbreakable cryptographic chain of records. If you can verify the cryptographic signature at the tip of the branch, you effectively verify that branch's entire history.

For example, let's take a look at linux.git, where the latest tag at the time of writing, v5.8-rc7, is signed by Linus Torvalds. (Tag signatures are slightly different from commit signatures — but in every practical sense they offer the same guarantees.)

If you have a git checkout of torvalds/linux.git, you are welcome to follow along.

$ git cat-file -p v5.8-rc7
object 92ed301919932f777713b9172e525674157e983d
type commit
tag v5.8-rc7
tagger Linus Torvalds <torvalds@linux-foundation.org> 1595798046 -0700

Linux 5.8-rc7
-----BEGIN PGP SIGNATURE-----

iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl8d8h4eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGd0sH/2iktYhMwPxzzpnb
eI3OuTX/mRn4vUFOfpx9dmGVleMfKkpbvnn3IY7wA62Qfv7J7lkFRa1Bd1DlqXfW
yyGTGDSKG5chiRCOU3s9ni92M4xIzFlrojyt/dIK2lUGMzUPI9FGlZRGQLKqqwLh
2syOXRWbcQ7e52IHtDSy3YBNveKRsP4NyqV+GxGiex18SMB/M3Pw9EMH614eDPsE
QAGQi5uGv4hPJtFHgXgUyBPLFHIyFAiVxhFRIj7u2DSEKY79+wO1CGWFiFvdTY4B
CbqKXLffY3iQdFsLJkj9Dl8cnOQnoY44V0EBzhhORxeOp71StUVaRwQMFa5tp48G
171s5Hs=
=BQIl
-----END PGP SIGNATURE-----

If we have Linus's key in our GnuPG keyring, we can easily verify that this tag is valid:

$ git verify-tag v5.8-rc7
gpg: Signature made Sun 26 Jul 2020 05:14:06 PM EDT
gpg:                using RSA key ABAF11C65A2970B130ABE3C479BE3E4300411886
gpg:                issuer "torvalds@linux-foundation.org"
gpg: Good signature from "Linus Torvalds <torvalds@kernel.org>" [unknown]
gpg:                 aka "Linus Torvalds <torvalds@linux-foundation.org>" [full]

The entire contents of this tag are signed, so this tells us that when Linus signed the tag, the “object hash” on his system was 92ed301919932f777713b9172e525674157e983d. But what exactly is that “object hash?” What are the contents that are hashed here? We can find out by asking git to tell us more about that object:

$ git cat-file -p 92ed301919932f777713b9172e525674157e983d
tree f16e3e4bcea2d875a17d2278ff67364b3277b10a
parent 1c8594b8427290c178c5d39885eacd9e41f68743
author Linus Torvalds <torvalds@linux-foundation.org> 1595798046 -0700
committer Linus Torvalds <torvalds@linux-foundation.org> 1595798046 -0700

Linux 5.8-rc7

The above contents in their entirety (slightly differently formatted) is what gives us the sha1 hash 92ed301919932f777713b9172e525674157e983d. So, thus far, we have unbroken cryptographic attestation from Linus's PGP signature to two other important bits about his git repository:

Let's take a look a the tree line — what contents were hashed to arrive at that checksum? Let's ask git:

$ git cat-file -p f16e3e4bcea2d875a17d2278ff67364b3277b10a
100644 blob a0a96088c74f49a961a80bc0851a84214b0a9f83    .clang-format
100644 blob 43967c6b20151ee126db08e24758e3c789bcb844    .cocciconfig
100644 blob a64d219137455f407a7b1f2c6b156c5575852e9e    .get_maintainer.ignore
100644 blob 4b32eaa9571e64e47b51c43537063f56b204d8b3    .gitattributes
100644 blob d5f4804ed07cd36336a5e80f2a24e45104f902cf    .gitignore
100644 blob db4f2295bd9d792b47eb77aab179a9db0d968454    .mailmap
100644 blob a635a38ef9405fdfcfe97f3a435393c1e9cae971    COPYING
100644 blob 0787b5872906c8a92a63cde3961ed630e2ec93b6    CREDITS
040000 tree 37e1b4166d912d69738beca645d3d539da4bbf30    Documentation
[...]
040000 tree ba6955ee6228666d9ef117fdd45df2e53ba0e221    virt

This is the entirety of the top-level Linux kernel directory contents. The blob entries are sha1sum's of the actual file contents in that directory, so these are straightforward. Subdirectories are represented as other tree entries, which also consist of blob and tree records going all the way down to the last sublevel, which will only contain blobs.

So, tree f16e3e4bcea2d875a17d2278ff67364b3277b10a in the commit record is a checksum of other checksums and it allows us to verify that each and every file in linux.git is exactly the same as it was on Linus Torvalds' system when he created the commit. If any file is changed, the tree checksum would be different and the whole repository would be considered invalid, because the object hash would be different than in the commit.

Finally, if we look at the object mentioned in parent 1c8594b8427290c178c5d39885eacd9e41f68743, we will see that it is a hash of another commit, containing its own tree and parent records:

$ git cat-file -p 1c8594b8427290c178c5d39885eacd9e41f68743
tree d56de40028d9ecdbebfc2121fd1ce1213fa09fa2
parent 40c60ac32174f0c0c090cd31d0d1712f2478e689
parent ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51
author Linus Torvalds <torvalds@linux-foundation.org> 1595796417 -0700
committer Linus Torvalds <torvalds@linux-foundation.org> 1595796417 -0700
mergetag object ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51
[...]

If we cared to, we could walk each commit all the way back to the beginning of Linux git history, but we don't need to do that — verifying the checksum of the latest commit is sufficient to provide us all the necessary assurances about the entire history of that tree.

So, if we verify the signature on the tag and confirm that it matches the key belonging to Linus Torvalds, we will have strong cryptographic assurances that the repository on our disk is byte-for-byte the same as the repository on the computer belonging to Linus Torvalds — with all its contents and its entire history going back to the initial commit.

The difference between signed tags and signed commits is minimal — in the case of commit signing, the signature goes into the commit object itself. It is generally a good practice to PGP-sign commits, particularly in environments where multiple people can push to the same repository branch. Signed commits provide easy forensic proof of code origins (e.g. without commit signing Alice can fake a commit to pretend that it was actually authored by Bob). It also allows for easy verification in cases where someone wants to cherry-pick specific commits into their own tree without performing a git merge.

If you are looking to get started with git and PGP signatures, I can recommend my own Protecting Code Integrity guide, or its kernel-specific adaptation that ships with the kernel docs: Kernel Maintainer PGP Guide.

Obligatory note: sha1 is not considered sufficiently strong for hashing purposes these days, and this is widely acknowledged by the git development community. Significant efforts are under way to migrate git to stronger cryptographic hashes, but they require careful planning and implementation in order to minimize disruption to various projects using git. To my knowledge, there are no effective attacks against sha1 as used by git, and git developers have added further precautions against sha1 collision attacks in git itself, which helps buy some time until stronger hashing implementations are considered ready for real-world use.