Internet-Draft Supply Chain Security for VCSs June 2023
Walfield & Winter Expires 22 December 2023 [Page]
Intended Status:
N. H. Walfield
Sequoia PGP
J. Winter
Sequoia PGP

Supply Chain Security for Version Control Systems


In a software supply chain attack, an attacker injects malicious code into some software, which they then leverage to compromise systems that depend on that software. A simple example of a supply chain attack is when SourceForge, a once popular open source software forge, injected advertising into the binaries that they delivered on behalf of the projects that they hosted. Software supply chain attacks are different from normal bugs in that the intent of the perpetrator is different: in the former case, bugs are added with the intent to harm, and in the latter they are added inadvertently, or due to negligence.

Software supply chain security starts on a developer's machine. By signing a commit or a tag, a developer can assert that they wrote or approved the change. This allows users of a code base to determine whether a version has been approved, and by whom, and then make a policy decision based on that information. For instance, a packager may require that software releases be signed with a particular certificate.

Version control systems such as git have long included support for signed commits and tags. Most developers don't sign their commits, and in the cases where they do, it is usually unclear what the semantics are.

This document describes a set of semantics for signed commits and tags, and a framework to work with them in a version control system, in particular, in a git repository. The framework is designed to be self contained. That is, given a repository, it is possible to add changes, or authenticate a version without consulting any third parties; all of the relevant information is stored in the repository itself.

By publishing this draft we hope to clarify and enrich the semantics of signing in version control system repositories thereby enabling a new tooling ecosystem, which can strengthen software supply chain security.

About This Document

This note is to be removed before publishing as an RFC.

The latest revision of this draft can be found at Status information for this document may be found at

Discussion of this document takes place on the OpenPGP Working Group mailing list (, which is archived at Subscribe at

Source for this draft and an issue tracker can be found at

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 22 December 2023.

Table of Contents

1. Introduction

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1.2. Terminology

  • "Maintainer" is a software developer, who is responsible for a software project in the sense that they act as a gatekeeper, and decide with other maintainers what changes are acceptable, and should be added to the software.
  • "Contributor" is someone who contributes changes to a software project. Unlike a maintainer, a contributor cannot add their changes to a project on their own accord.
  • "Software supply chain" is the collection of software that something depends on. For instance, a software package depends on libraries, it is built by a compiler, it is distributed by a package registry, etc.
  • "Software supply chain attack" is an attack in which an attacker compromises a software supply chain. For instance, a maintainer or a contributor may stealthily insert malicious code into a software project in order to compromise the security of a system that depends on that software.
  • "Version control system" is a database, which contains versions of a software project. Each version includes links to preceding versions.
  • "git" is a popular version control system. Although "git" is distributed and does not rely on a central authority, it is often used with one to simplify collaboration. Examples of centralized authorities include gitea, GitHub, and Gitlab.
  • "Commit" is a version that is added to the "version control system". In git, commits are identified by their message digest.
  • "Branch" is a typically human readable name given to a particular commit. When a commit is superseded, the branch is updated to point to the new commit. Repositories normally have at least one branch called "main" or "master" where most work is done.
  • "Tag" is a name given to a particular commit. Tags are usually only added for significant versions like releases and are normally not changed once published.
  • "Change" is a commit or a tag.
  • "Forge" is a service which hosts software repositories, and often provides additional services like a bug tracker. Examples of forges are codeberg, GitHub, and GitLab.
  • "Registry" or "Package Registry" is a service that provides an index of software packages. Maintainers register their software there under a well-known name. Build tools like cargo fetch dependencies by looking up the software by its name.
  • "Authentication" is the process of determining whether something should be considered authentic.
  • "Trust model" is a process for determining what evidence to consider, and how to weigh it when doing authentication.
  • "OpenPGP certificate" or just "certificate" is the data structure that section 11.2 of [RFC4880] defines as a "Transferable Public Key". A certificate is sometimes called a key, but this is confusing, because a certificate contains components that are also called keys.
  • "Liveness" is a property of a certificate, a signature, etc. An object is considered live with respect to some reference time if, as of the reference time, its creation time is in the past, and it has not expired.

2. Problem Statement

Consider the following scenario. Alice and Bob are developers. They are the primary maintainers of the Xyzzy project, which is a free and open source project. Although they do most of the work on the project, they also have occasional collaborators like Carol, and drive-by contributions from people like Dave. Paul packages their software for an operating system distribution. Ted from Ty Coon Corporation integrates it into his company's software. And, Mallory is an adversary who is trying to subvert the project.

When someone updates their local copy of Xyzzy's source code repository, they want to authentic any changes before they use them. That is, they want to know that each change was made or approved by someone whom they consider authorized to make that change.

In the Xyzzy project, Alice is willing to rely on Bob to check-in changes he makes, and to approve contributions from third parties without auditing the code herself. But, she doesn't want to rely on anyone else without checking their proposed changes manually. Bob feels the same way about Alice.

In version control systems like git, the meta-data for a commit or tag includes author and committer fields. By themselves, these fields cannot be used to reliably determine who a change's author and committer are, because these fields are set by the committer and unauthenticated. That is, Mallory could author a commit, set both of these fields to "Bob," and push the malicious commit. No one would be able to tell that they came from Mallory and not Bob.

There are two main ways to authenticate changes. First, changes to a repository or branch can be mediated by a trusted third party, which enforces a policy at the time a change is added to the repository. Second, individual changes can be signed, and a policy can be evaluated at any time. These two approaches can be mixed.

2.1. Repositories Protected by a Trusted Third Party

When using a trusted third party, only certain users are allowed to change the repository. This is often realized using access control lists: the trusted third party has a list of users who are allowed to do certain types of modifications. Before the trusted third party allows a user to modify the repository, the user has to authenticate themselves. When they attempt to make a change, the trusted third party checks that they are authorized. If they are, the third party allows the modification. If not, it is rejected. A user of this repository can now conclude that if they can authenticate the trusted third party, then the changes were approved.

A drawback of using a trusted third party is that it relies on centralized infrastructure. This means the only way for a user to determine if a version of Xyzzy is authentic is to fetch it from the trusted third party; the repository is not self authenticating. If the third party ever disappears, users will no longer be able to authenticate the project's source code.

Another disadvantage is that this approach doesn't expose the project's policy to its users. This means that both first-parties like Alice and third-parties like Paul are not able to audit the trusted third party. This is the case even if the set of users that are currently authorized to make changes are exposed via a separate API end point: because the set of authorized users changes with time, all updates to the ACLs would need to be exposed along with information about what user authorized each change.

2.2. Self-Authenticating Repositories

An alternative approach is to have authors and committers sign their changes. Users then check that the changes are signed correctly, and authenticate the signers. For instance, for the Xyzzy project, Paul might decide that Alice or Bob are allowed to make changes. So when Paul fetches changes, he checks whether Alice or Bob signed the new changes, and flags changes made by anyone else. If Alice and Bob later decide that Carol should also be allowed to directly commit her changes, Paul needs to update his policy. If Bob leaves the team, Paul needs to pay enough attention to notice, and then disallow changes made by Bob after a certain date.

For projects that sign their commits today, this is more or less the status quo. Most users, however, do not want to maintain their own policy, and aren't even in a good position to do so. Since users are willing to rely on the maintainers to make changes to the project, they can just as well delegate the policy to them. Now, a user like Paul just needs to designate an initial policy. If he knows when the policy changes, and can authenticate changes to the policy based on the existing policy, then he is able to authenticate any subsequent changes to the repository.

An easy way to manage the policy is to include it in the repository itself. Then changes to the policy can be authenticated in the same way as normal changes. This also makes the repository self authenticating, because it is self contained.

One issue is how users should handle forks to a project. A fork in a project may occur due to a social or technical conflict, or because the project dies, and is later revived by a different party. In both cases, it may not be possible for there to be a clean hand off to the new maintainer. That is, Alice or Bob may not be willing or able to change the policy file to allow Dave to seamlessly continue the development of Xyzzy.

Forks are straightforward to handle, but require user intervention: from the system's perspective, Dave is not authorized, so his changes are rejected. And that's good, as Dave may be an attacker; the system can't tell. Users opt in to a fork by changing their trust root to designate a version in which Dave is authorized to make changes.

3. Threat Model

Consider an attacker, Mallory, who is trying to compromise a user, Ursula, by injecting a vulnerability into the software supply chain of a piece of software, Super Frob, that she uses. There are several different ways that Mallory could accomplish this. These include:

The setting is as follows. To protect herself from Mallory, Ursula has to make sure that versions of the software she obtains do not contain malicious code. Ursula cannot afford to audit every version of the software, but she is willing to rely on the maintainers of the project to not add malicious code, and to review contributions from third parties.

The framework presented in this specification allows Ursula to audit a dependency and its developers once, and then to delegate decisions of what code and dependencies to include to the developers. Assuming the developers are reliable, this can protect Ursula from attacks where Mallory is not explicitly authorized to make a change. For instance, if the developers of an abandoned software package do not authorize a new maintainer, Ursula will be warned when a package has a new maintainer, as she can no longer authenticate it. She can then reaudit it. Similarly, when the software is modified in transit by a machine in the middle, Ursula will not be able to authenticate it. This can also stop dependency confusion attacks, because the software cannot be authenticated. It won't however, stop a downgrade attack, as older versions can still be authenticated.

This framework cannot protect Ursula from mistakes that she or a developer of the software that she depends on makes. For instance, if Mallory is able to convince a developer to authorize a malicious change to their software, this framework consider the change to be legitimate. This framework can facilitate forensic analysis in these case by making it easier to identify changes approved by the same person (potentially across different projects) and thereby conduct a targeted audit.

4. Authentication

This framework helps users authenticate three types of artifacts: commits, tags, and tarballs or other archives.

4.1. Policy

Every commit has an associated policy. If a commit contains the file openpgp-policy.toml in the root directory, then that file describes the commit's policy. If the commit does not contain that file, the void policy is used. The void policy rejects everything.

openpgp-policy.toml is a TOML v1.0.0 file [toml]. Version 0 defines the following three top-level keys: version, authorization, and commit_goodlist.

If a parser recognizes the version, but encounters keys that it does not know, then it must ignore the unknown keys. This allows a degree of forwards compatibility.

4.1.1. version

The value of the version key is an integer and must be 0:

version = 0

If the value of version is not recognized, the implementation SHOULD error out. It MAY instead treat the policy as the void policy.

4.1.2. authorization

authorization is a table of authorization entries.

Each key in the authorization table is a free-form identifier, which is chosen by the user of the system. The identifier SHOULD be a UTF-8 encoded, human-readable string that identifies an entity. Examples of identifiers are alice, Bob <>, Boty McBotface <>.

The value of each authorization entry is another table. The table has the following entries:

  • keyring
  • sign_commit
  • sign_tag
  • sign_archive
  • audit
  • add_user
  • retire_user keyring

The value of keyring is a string. It contains one or more OpenPGP certificates. The OpenPGP certificates MUST be ASCII-armored. An ASCII-armored block MAY contain more than one OpenPGP certificate. The string MAY contain multiple ASCII-armored blocks.

An implementation SHOULD ignore valid OpenPGP certificates that is does not support, and MAY emit a warning that a certificate, or component is not supported. An implementation SHOULD return an error if it encounters something other than an OpenPGP certificate encoded with ASCII armor.

When adding a certificate, an implementation SHOULD only add components that are needed to validate the signatures. That is, an implementation SHOULD strip subkeys that are not signing capable, and third-party signatures. For components that are kept, an implementation SHOULD include all known self signatures, and not just the newest self signature. sign_commit

The value of sign_commit is a boolean. If true, then the entity is authorized to sign commits. sign_tag

The value of sign_tag is a boolean. If true, then the entity is authorized to sign tags. sign_archive

The value of sign_archive is a boolean. If true, then the entity is authorized to sign tarballs or other archives. audit

The value of audit is a boolean. If true, then the entity is authorized to add commits to the top-level commit_goodlist array. add_user

The value of add_user is a boolean. If true, then the entity is authorized to add new entities to the authorization table, and to grant them any capabilities that they have. retire_user

The value of retire_user is a boolean. If true, then the entity is authorized to retire capabilities from any entity. This includes capabilities that they do not have. Example

The following is an example of an authorization entry. The user has been granted all the capabilities. The user is identified by two different OpenPGP certificates. The certificates are contained in two concatenated ASCII armored blocks.

[authorization."Neal H. Walfield <>"]
sign_commit = true
sign_tag = true
sign_archive = true
add_user = true
retire_user = true
audit = true
keyring = """
Comment: F717 3B3C 7C68 5CD9 ECC4  191B 74E4 45BA 0E15 C957
Comment: Neal H. Walfield (Code Signing Key) <neal@pep.foundatio

Comment: 8F17 7771 18A3 3DDA 9BA4  8E62 AACB 3243 6300 52D9
Comment: Neal H. Walfield <>
Comment: Neal H. Walfield <>
Comment: Neal H. Walfield <>
Comment: Neal H. Walfield <>
Comment: Neal H. Walfield <>


4.1.3. commit_goodlist

The value of commit_goodlist is an array of strings where each string contains a commit identifier. The commit identifier MUST be a full hash. The commit identifier MUST NOT be a branch name, a tag name, or a truncated hash.

Commits listed in the commit_goodlist are commits that have retroactively been marked as valid. This may be useful when a certificate's private key material has been compromised.

4.2. Authenticating Commits

Each commit in a git repository is part of a directed acyclic graph (DAG) where a node is a commit, and a directed edge shows how two commits are related. Specifically, the head of a directed edge is a commit that is derived from the tail. Except for the root commits, each commit has one or more parents. A commit that has multiple parents is derived from multiple commits. Conceptually, it merges multiple paths, and as such is called a merge commit.

A commit is consider authenticated if at least one of its parent commits considers the commit to be authenticated. This rule is different from Guix's authorization invariant as described in [guix], which states that all parent commits must consider the commit to be authenticated. The semantics described here allow a developer to add commits from unauthorized third-parties as-is using a merge commit. Using Guix's authorization invariant, the third party's commit would have to be resigned, which loses the third-party's signature, and consequently complicates forensic analysis.

A commit's parent authenticates it as follows.

First, the implementation looks up the signer's certificate in the parent commit's policy file. The implementation SHOULD then canonicalize the certificate so that the active self signatures are those that were active when the signature was made. A self signature is valid, if it is not revoked, and not expired. A self signature is active, if it is the most recent, valid self signature prior to a reference time. That is, if a new commit was made on June 9, 2023, then each component's most recent signature as of June 9, 2023, which is also not revoked, and not expired, is considered that component's active self signature.

If the canonicalized certificate is valid as of the signature's time, not expired as of that time, not soft revoked as of that time, not hard revoked at any time, and the signature is correct, then the signature is considered verified. The implementation MAY consider certificate updates from other sources. If it does, it SHOULD only consider hard revocations.

The implementation MUST then check that the type of change is authorized by the policy.

The following capabilities allow the specified types of changes:

  • sign_commit: Needed for any change.
  • add_user: Needed to delegate a capability to another user. Updating keyring does not require this capability if a certificate is only updated, and not added.
  • retire_user: Needed to rescind a capability from another user.
  • audit: Needed to modify the version field, and the commit_goodlist list.

If the signature is considered verified, and the signer is authorized to make the type of change that was made, then the commit is considered authenticated.

If the commit is not considered authenticated, because the signer's certificate has been hard revoked, but the commit is included in a later commit's commit_goodlist, then the commit is considered to be authenticated.

A commit is considered to occur later if when authenticating a range of commits, a commit is a direct descendant of the commit in question, and it is in the commit range. Consider the three commits a, b, and c where a is b's parent, b is c's parent, the certificate used to sign b has been hard revoked, and c includes b in its commit_goodlist. In this case, the hard revocation for the certificate to use b is ignored. All other criteria including the fact that the signature on b is valid are still checked.

4.3. Authenticating Tags

A tag is a special type of commit in git, which has no content, but assigns a name to a specific commit. A tag is usually used to mark release points.

A tag is authenticated in the same way as a commit, as described in the previous section, with the following exceptions.

First, the tagged commit is considered a parent commit, and the tag is considered its child commit.

The entity that signed the tag needs the sign_tag capability, and only the sign_tag capability.

4.4. Authenticating Archives

Archives like tarballs are often generated as part of a software's release process. These may be signed. To authenticate an archive with respect to a signature, and a trust root, the trust root's policy is used to authenticate the tarball's signature. The entity that signed the tarball must have the sign_archive capability.

Unlike a commit, an archive does not have a pointer to the commit that it was derived from. Thus, if an archive is derived from commit c, it may be possible to authenticate commit c, as well as tags referring to commit c using a given trust root, but to not authenticate an archive derived from commit c using the same trust root, because the policy changed in the meantime.

If the signature includes the notation, then the value of the notation is interpreted as the commit that the archive is derived from. The value of the notation is a hexadecimal value corresponding to the commit's full hash. Truncated hashes MUST be considered erroneous. The commit identifier MUST NOT be a branch name, a tag name, or a truncated hash.

Since archives are often verified outside of a repository, one or more repositories may be specified using the notation. In that case, each notation indicates a git repository. For example, the main repository of the reference implementation, sq-git, is So, archives SHOULD include the notation with as the value.

When is present in the signature, the implementation MUST use that commit's policy to authenticate the archive, and then authenticate that commit by chaining back to the trust root, as described above; in this case, it MUST NOT use the trust root's policy directly unless the specified commit is also the trust root.

5. Reference implementation

A Rust implementation of this specification is part of Sequoia. See for the source code.

6. Security Concerns

6.1. Malicious vs. Buggy Changes

The scheme presented here can help mitigate malicious attacks on a code base, but it does nothing to prevent design flaws or code errors. That is, this scheme does not and cannot provide any protections from normal bugs.

6.2. Trusted Developers

The protections outlined in this document are mainly designed to stop third-parties from adding malicious code to a project. This system provides no protection from a developer who is authorized to make changes and turns out to be malicious. That said, because commits are signed, when malicious code is discovered, an audit is required to restore trust in the code base. Using this system, it is easier to identify other code added by the same person, and focus an audit on that code.

6.3. Judging Code vs. Judging Humans

The approach described in this document relies on transitive trust. The basic idea is that if a user is willing to run a developer's code, then they can reasonably rely on that developer to modify the code, and to delegate that capability to a third party.

Yet, writing and reviewing code is fundamentally different from evaluating another person's intents. This is demonstrated quite well by the events surrounding the popular event-stream npm package, [event-stream]. In 2018, a new developer gained the trust of the package's maintainer by contributing a number of high-quality changes. The original developer eventually made the new developer the maintainer, and the new maintainer introduced malicious code to steal user's credentials.

6.4. Operational Security

Signing commits relies on each developer having a long-term identity key, which they keep safe. If the key is compromised, the attacker is able to impersonate the developer. It is possible to limit the damage by revoking the compromised key, or having another authorized user retire the developer's access.

In this regard, sigstore appears to be better as it relies on ephemeral signing keys, which are issued by a central authority. However, in order to obtain a signing key, the user needs to log in. If they use a password, then if an attacker gets access to the password, an attacker can impersonate the developer. If the developer uses a second factor like a hardware token, then they are again using private key cryptography, and may as well put their private keys on a hardware token, and forego the centralized infrastructure.

6.5. Dependencies

This specification has concentrated on enabling a user of a software project to authenticate new versions. But most software has its own dependencies, and those also need to be authenticated. A user could identify all software that they are willing to rely on, but this is more work than most users are willing and able to do. But, just as developers are usually in a better position to evaluate who should be allowed to contribute to their project, they are also in a better position to designate a trust root for their dependencies.

Enabling this functionality requires ecosystem-specific tooling. The developer needs to be able to specifying a trust root for each dependency, and the build infrastructure needs to authenticate the dependencies. For instance, the Rust ecosystem uses Cargo for building and dependency management. Currently, to add sequoia-openpgp as a dependency to a project, a developer would modify their Cargo.toml file as follows:

sequoia-openpgp = { version = "1" }

Instead, they would also specify a trust root, which they've presumably audited:

sequoia-openpgp = { version = "1", trust-root = "HASH" }

When downloading the dependency, cargo would make sure that the dependency can be authenticated from the specified trust root, and if not throw an error.

6.6. Document History

This is a first draft that has not been published.

7. Acknowledgments

My thanks go---in particular, but not only---to the Sequoia PGP team for many fruitful discussions. Funding for this project was provided by the Sovereign Tech Fund.

8. References

8.1. Normative References

Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <>.
Callas, J., Donnerhacke, L., Finney, H., Shaw, D., and R. Thayer, "OpenPGP Message Format", RFC 4880, DOI 10.17487/RFC4880, , <>.
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <>.
Preston Werner, T. and P. Gedam, "TOML v1.0.0", , <>.

8.2. Informative References

Birsan, A., "Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies", , <>.
Hunter, T., "Compromised npm Package: event-stream", , <>.
Courtès, L., "Building a Secure Software Supply Chain with GNU Guix", , <>.
Thompson, K., "Reflections on trusting trust", Communications of the ACM vol. 27, no. 8, pp. 761-763, DOI 10.1145/358198.358210, , <>.

Authors' Addresses

Neal H. Walfield
Sequoia PGP
Justus Winter
Sequoia PGP