Internet-Draft | Supply Chain Security for VCSs | June 2023 |
Walfield & Winter | Expires 22 December 2023 | [Page] |
In a software supply chain attack, an attacker injects malicious code into some software, which they then leverage to compromise systems that depend on that software. A simple example of a supply chain attack is when SourceForge, a once popular open source software forge, injected advertising into the binaries that they delivered on behalf of the projects that they hosted. Software supply chain attacks are different from normal bugs in that the intent of the perpetrator is different: in the former case, bugs are added with the intent to harm, and in the latter they are added inadvertently, or due to negligence.¶
Software supply chain security starts on a developer's machine. By signing a commit or a tag, a developer can assert that they wrote or approved the change. This allows users of a code base to determine whether a version has been approved, and by whom, and then make a policy decision based on that information. For instance, a packager may require that software releases be signed with a particular certificate.¶
Version control systems such as git have long included support for signed commits and tags. Most developers don't sign their commits, and in the cases where they do, it is usually unclear what the semantics are.¶
This document describes a set of semantics for signed commits and tags, and a framework to work with them in a version control system, in particular, in a git repository. The framework is designed to be self contained. That is, given a repository, it is possible to add changes, or authenticate a version without consulting any third parties; all of the relevant information is stored in the repository itself.¶
By publishing this draft we hope to clarify and enrich the semantics of signing in version control system repositories thereby enabling a new tooling ecosystem, which can strengthen software supply chain security.¶
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://sequoia-pgp.gitlab.io/sequoia-git/. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-nhw-openpgp-supply-chain-security-vcs/.¶
Discussion of this document takes place on the OpenPGP Working Group mailing list (mailto:openpgp@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/openpgp/. Subscribe at https://www.ietf.org/mailman/listinfo/openpgp/.¶
Source for this draft and an issue tracker can be found at https://gitlab.com/sequoia-pgp/sequoia-git.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 22 December 2023.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
"Maintainer" is a software developer, who is responsible for a software project in the sense that they act as a gatekeeper, and decide with other maintainers what changes are acceptable, and should be added to the software.¶
"Contributor" is someone who contributes changes to a software project. Unlike a maintainer, a contributor cannot add their changes to a project on their own accord.¶
"Software supply chain" is the collection of software that something depends on. For instance, a software package depends on libraries, it is built by a compiler, it is distributed by a package registry, etc.¶
"Software supply chain attack" is an attack in which an attacker compromises a software supply chain. For instance, a maintainer or a contributor may stealthily insert malicious code into a software project in order to compromise the security of a system that depends on that software.¶
"Version control system" is a database, which contains versions of a software project. Each version includes links to preceding versions.¶
"git" is a popular version control system. Although "git" is distributed and does not rely on a central authority, it is often used with one to simplify collaboration. Examples of centralized authorities include gitea, GitHub, and Gitlab.¶
"Commit" is a version that is added to the "version control system". In git, commits are identified by their message digest.¶
"Branch" is a typically human readable name given to a particular commit. When a commit is superseded, the branch is updated to point to the new commit. Repositories normally have at least one branch called "main" or "master" where most work is done.¶
"Tag" is a name given to a particular commit. Tags are usually only added for significant versions like releases and are normally not changed once published.¶
"Change" is a commit or a tag.¶
"Forge" is a service which hosts software repositories, and often provides additional services like a bug tracker. Examples of forges are codeberg, GitHub, and GitLab.¶
"Registry" or "Package Registry" is a service that provides an
index of software packages. Maintainers register their software
there under a well-known name. Build tools like cargo
fetch
dependencies by looking up the software by its name.¶
"Authentication" is the process of determining whether something should be considered authentic.¶
"Trust model" is a process for determining what evidence to consider, and how to weigh it when doing authentication.¶
"OpenPGP certificate" or just "certificate" is the data structure that section 11.2 of [RFC4880] defines as a "Transferable Public Key". A certificate is sometimes called a key, but this is confusing, because a certificate contains components that are also called keys.¶
"Liveness" is a property of a certificate, a signature, etc. An object is considered live with respect to some reference time if, as of the reference time, its creation time is in the past, and it has not expired.¶
Consider the following scenario. Alice and Bob are developers. They are the primary maintainers of the Xyzzy project, which is a free and open source project. Although they do most of the work on the project, they also have occasional collaborators like Carol, and drive-by contributions from people like Dave. Paul packages their software for an operating system distribution. Ted from Ty Coon Corporation integrates it into his company's software. And, Mallory is an adversary who is trying to subvert the project.¶
When someone updates their local copy of Xyzzy's source code repository, they want to authentic any changes before they use them. That is, they want to know that each change was made or approved by someone whom they consider authorized to make that change.¶
In the Xyzzy project, Alice is willing to rely on Bob to check-in changes he makes, and to approve contributions from third parties without auditing the code herself. But, she doesn't want to rely on anyone else without checking their proposed changes manually. Bob feels the same way about Alice.¶
In version control systems like git
, the meta-data for a commit or
tag includes author
and committer
fields. By themselves, these
fields cannot be used to reliably determine who a change's author and
committer are, because these fields are set by the committer and
unauthenticated. That is, Mallory could author a commit, set both of
these fields to "Bob," and push the malicious commit. No one would be
able to tell that they came from Mallory and not Bob.¶
There are two main ways to authenticate changes. First, changes to a repository or branch can be mediated by a trusted third party, which enforces a policy at the time a change is added to the repository. Second, individual changes can be signed, and a policy can be evaluated at any time. These two approaches can be mixed.¶
When using a trusted third party, only certain users are allowed to change the repository. This is often realized using access control lists: the trusted third party has a list of users who are allowed to do certain types of modifications. Before the trusted third party allows a user to modify the repository, the user has to authenticate themselves. When they attempt to make a change, the trusted third party checks that they are authorized. If they are, the third party allows the modification. If not, it is rejected. A user of this repository can now conclude that if they can authenticate the trusted third party, then the changes were approved.¶
A drawback of using a trusted third party is that it relies on centralized infrastructure. This means the only way for a user to determine if a version of Xyzzy is authentic is to fetch it from the trusted third party; the repository is not self authenticating. If the third party ever disappears, users will no longer be able to authenticate the project's source code.¶
Another disadvantage is that this approach doesn't expose the project's policy to its users. This means that both first-parties like Alice and third-parties like Paul are not able to audit the trusted third party. This is the case even if the set of users that are currently authorized to make changes are exposed via a separate API end point: because the set of authorized users changes with time, all updates to the ACLs would need to be exposed along with information about what user authorized each change.¶
An alternative approach is to have authors and committers sign their changes. Users then check that the changes are signed correctly, and authenticate the signers. For instance, for the Xyzzy project, Paul might decide that Alice or Bob are allowed to make changes. So when Paul fetches changes, he checks whether Alice or Bob signed the new changes, and flags changes made by anyone else. If Alice and Bob later decide that Carol should also be allowed to directly commit her changes, Paul needs to update his policy. If Bob leaves the team, Paul needs to pay enough attention to notice, and then disallow changes made by Bob after a certain date.¶
For projects that sign their commits today, this is more or less the status quo. Most users, however, do not want to maintain their own policy, and aren't even in a good position to do so. Since users are willing to rely on the maintainers to make changes to the project, they can just as well delegate the policy to them. Now, a user like Paul just needs to designate an initial policy. If he knows when the policy changes, and can authenticate changes to the policy based on the existing policy, then he is able to authenticate any subsequent changes to the repository.¶
An easy way to manage the policy is to include it in the repository itself. Then changes to the policy can be authenticated in the same way as normal changes. This also makes the repository self authenticating, because it is self contained.¶
One issue is how users should handle forks to a project. A fork in a project may occur due to a social or technical conflict, or because the project dies, and is later revived by a different party. In both cases, it may not be possible for there to be a clean hand off to the new maintainer. That is, Alice or Bob may not be willing or able to change the policy file to allow Dave to seamlessly continue the development of Xyzzy.¶
Forks are straightforward to handle, but require user intervention: from the system's perspective, Dave is not authorized, so his changes are rejected. And that's good, as Dave may be an attacker; the system can't tell. Users opt in to a fork by changing their trust root to designate a version in which Dave is authorized to make changes.¶
Consider an attacker, Mallory, who is trying to compromise a user, Ursula, by injecting a vulnerability into the software supply chain of a piece of software, Super Frob, that she uses. There are several different ways that Mallory could accomplish this. These include:¶
Mallory could pose as a contributor, and convince a develop to authorize a malicious change to one of Super Frob's dependencies, such as a library.¶
Mallory could take over an abandoned package that Super Frob depends on, and publish a new version with malicious code.¶
Mallory could use typo squatting to opportunistically or through social engineering inject malicious software into Super Frob's supply chain.¶
For instance, Mallory could publish a library called libevent
,
which is a copy of libevents
, but includes a malicious change,
and Super Frob accidentally includes libevent
as a dependency
instead of libevents
.¶
Mallory could publish a malicious package that has the same name as a package on another registry in order to confuse Super Frob's build tools.¶
This type of attack is called a dependency confusion attack, [dependency-confusion]. It can be launched when an organization uses an internal registry and a public registry to find dependencies. As dependencies are often referenced by name, and that name does not include the registry, an attacker may trick the organization into using their malicious version of the package.¶
Mallory could sneak a change into one of Super Frob's build dependencies, like the compiler.¶
Whereas software maintainers have a large degree of control over their direct dependencies, they have more limited control over the tools downstream users use to build their software. In the extreme, a software project may include a copy of a dependency in their version control system, or depend on a specific version of a dependency by cryptographic hash, but only specify a standard that the compiler needs, like C99.¶
This attack is most well-known from Ken Thompson's Reflections on Trusting Trust Turning award lecture, [reflections-on-trusting-trust].¶
Mallory could compromise the tools that a developer uses, e.g., by publishing a useful, but malicious plug-in for an editor, which detects certain code patterns, and quietly modifies them to insert malicious code.¶
Mallory could compromise the systems that the developers use, and modify their source code repositories.¶
For instance, if Mallory gets access to a developer's machine, he could stealthy modify code before it is signed and committed. Or, he could exfiltrate the developer's signing key, or login credentials and imitate her. Similarly, if a software project uses a forge and Mallory is able to compromise the forge, he could modify the source code.¶
Mallory could compromise Super Frob or one of its dependencies as it is being downloaded.¶
For instance, if a package registry like crates.io
depends on a
content delivery network (CDN) to distribute packages, a
compromised node in the CDN may return a modified version of the
software to the user.¶
The setting is as follows. To protect herself from Mallory, Ursula has to make sure that versions of the software she obtains do not contain malicious code. Ursula cannot afford to audit every version of the software, but she is willing to rely on the maintainers of the project to not add malicious code, and to review contributions from third parties.¶
The framework presented in this specification allows Ursula to audit a dependency and its developers once, and then to delegate decisions of what code and dependencies to include to the developers. Assuming the developers are reliable, this can protect Ursula from attacks where Mallory is not explicitly authorized to make a change. For instance, if the developers of an abandoned software package do not authorize a new maintainer, Ursula will be warned when a package has a new maintainer, as she can no longer authenticate it. She can then reaudit it. Similarly, when the software is modified in transit by a machine in the middle, Ursula will not be able to authenticate it. This can also stop dependency confusion attacks, because the software cannot be authenticated. It won't however, stop a downgrade attack, as older versions can still be authenticated.¶
This framework cannot protect Ursula from mistakes that she or a developer of the software that she depends on makes. For instance, if Mallory is able to convince a developer to authorize a malicious change to their software, this framework consider the change to be legitimate. This framework can facilitate forensic analysis in these case by making it easier to identify changes approved by the same person (potentially across different projects) and thereby conduct a targeted audit.¶
This framework helps users authenticate three types of artifacts: commits, tags, and tarballs or other archives.¶
Every commit has an associated policy. If a commit contains the file
openpgp-policy.toml
in the root directory, then that file describes
the commit's policy. If the commit does not contain that file, the
void policy is used. The void policy rejects everything.¶
openpgp-policy.toml
is a TOML v1.0.0 file [toml]. Version 0
defines the following three top-level keys: version
,
authorization
, and commit_goodlist
.¶
If a parser recognizes the version, but encounters keys that it does not know, then it must ignore the unknown keys. This allows a degree of forwards compatibility.¶
The value of the version
key is an integer and must be 0
:¶
version = 0¶
If the value of version
is not recognized, the implementation SHOULD
error out. It MAY instead treat the policy as the void policy.¶
The value of commit_goodlist
is an array of strings where each
string contains a commit identifier. The commit identifier MUST be a
full hash. The commit identifier MUST NOT be a branch name, a tag
name, or a truncated hash.¶
Commits listed in the commit_goodlist
are commits that have
retroactively been marked as valid. This may be useful when a
certificate's private key material has been compromised.¶
Each commit in a git
repository is part of a directed acyclic graph
(DAG) where a node is a commit, and a directed edge shows how two
commits are related. Specifically, the head of a directed edge is a
commit that is derived from the tail. Except for the root commits,
each commit has one or more parents. A commit that has multiple
parents is derived from multiple commits. Conceptually, it merges
multiple paths, and as such is called a merge commit.¶
A commit is consider authenticated if at least one of its parent commits considers the commit to be authenticated. This rule is different from Guix's authorization invariant as described in [guix], which states that all parent commits must consider the commit to be authenticated. The semantics described here allow a developer to add commits from unauthorized third-parties as-is using a merge commit. Using Guix's authorization invariant, the third party's commit would have to be resigned, which loses the third-party's signature, and consequently complicates forensic analysis.¶
A commit's parent authenticates it as follows.¶
First, the implementation looks up the signer's certificate in the
parent commit's policy file. If the implementation finds a
certificate, it scans the commit's policy file for any updates to that
certificate (and only that certificate) except for revocations. That
is, the implementation iterates over all of the certificates in the
commit's policy file, and looks for certificates with the same
fingerprint. If it finds any, it merges them into the original
certificate with the exception of any revocation signatures. In this
way, it is straightforward for a user to recover if the certificate in
the parent commit's policy file is no longer usable, e.g., because it
has expired, or the signing subkey has been replaced. Consider a
parent commit whose policy file that contains a certificate that
expires at time t
. After t
, the certificate is unusable; it can't
be used to authenticate any commits made at or after t
. This
mechanism allows the user to easily add new commits by extending their
certificate's expiration, and adding the update to a new commit.
Revocation certificates are skipped so that it is possible for a user
to add a commit that revokes their own certificate, or a component
thereof.¶
The implementation SHOULD then canonicalize the certificate so that the active self signatures are those that were active when the signature was made. A self signature is valid, if it is not revoked, and not expired. A self signature is active, if it is the most recent, valid self signature prior to a reference time. That is, if a new commit was made on June 9, 2023, then each component's most recent signature as of June 9, 2023, which is also not revoked, and not expired, is considered that component's active self signature.¶
If the canonicalized certificate is valid as of the signature's time, not expired as of that time, not soft revoked as of that time, not hard revoked at any time, and the signature is correct, then the signature is considered verified. The implementation MAY consider certificate updates from other sources. If it does, it SHOULD only consider hard revocations.¶
The implementation MUST then check that the type of change is authorized by the policy.¶
The following capabilities allow the specified types of changes:¶
sign_commit
: Needed for any change.¶
add_user
: Needed to delegate a capability to another user.
Updating keyring
does not require this capability if a
certificate is only updated, and not added.¶
retire_user
: Needed to rescind a capability from another user.¶
audit
: Needed to modify the version
field, and the
commit_goodlist
list.¶
If the signature is considered verified, and the signer is authorized to make the type of change that was made, then the commit is considered authenticated.¶
If the commit is not considered authenticated, because the signer's
certificate has been hard revoked, but the commit is included in a
later commit's commit_goodlist
, then the commit is considered to be
authenticated.¶
A commit is considered to occur later if when authenticating a range
of commits, a commit is a direct descendant of the commit in question,
and it is in the commit range. Consider the three commits a
, b
,
and c
where a
is b
's parent, b
is c
's parent, the
certificate used to sign b
has been hard revoked, and c
includes
b
in its commit_goodlist
. In this case, the hard revocation for
the certificate to use b
is ignored. All other criteria including
the fact that the signature on b
is valid are still checked.¶
Archives like tarballs are often generated as part of a software's
release process. These may be signed. To authenticate an archive
with respect to a signature, and a trust root, the trust root's policy
is used to authenticate the tarball's signature. The entity that
signed the tarball must have the sign_archive
capability.¶
Unlike a commit, an archive does not have a pointer to the commit that
it was derived from. Thus, if an archive is derived from commit c
,
it may be possible to authenticate commit c
, as well as tags
referring to commit c
using a given trust root, but to not
authenticate an archive derived from commit c
using the same trust
root, because the policy changed in the meantime.¶
If the signature includes the notation
commit@notations.sequoia-pgp.org
, then the value of the notation is
interpreted as the commit that the archive is derived from. The value
of the notation is a hexadecimal value corresponding to the commit's
full hash. Truncated hashes MUST be considered erroneous. The commit
identifier MUST NOT be a branch name, a tag name, or a truncated hash.¶
Since archives are often verified outside of a repository, one or more
repositories may be specified using the
repository@notations.sequoia-pgp.org
notation. In that case, each
notation indicates a git repository. For example, the main repository
of the reference implementation, sq-git
, is
https://gitlab.com/sequoia-pgp/sequoia-git.git
. So, archives SHOULD
include the repository@notations.sequoia-pgp.org
notation with
https://gitlab.com/sequoia-pgp/sequoia-git.git
as the value.¶
When commit@notations.sequoia-pgp.org
is present in the signature,
the implementation MUST use that commit's policy to authenticate the
archive, and then authenticate that commit by chaining back to the
trust root, as described above; in this case, it MUST NOT use the
trust root's policy directly unless the specified commit is also the
trust root.¶
A Rust implementation of this specification is part of Sequoia. See https://gitlab.com/sequoia-pgp/sequoia-git for the source code.¶
The scheme presented here can help mitigate malicious attacks on a code base, but it does nothing to prevent design flaws or code errors. That is, this scheme does not and cannot provide any protections from normal bugs.¶
The protections outlined in this document are mainly designed to stop third-parties from adding malicious code to a project. This system provides no protection from a developer who is authorized to make changes and turns out to be malicious. That said, because commits are signed, when malicious code is discovered, an audit is required to restore trust in the code base. Using this system, it is easier to identify other code added by the same person, and focus an audit on that code.¶
The approach described in this document relies on transitive trust. The basic idea is that if a user is willing to run a developer's code, then they can reasonably rely on that developer to modify the code, and to delegate that capability to a third party.¶
Yet, writing and reviewing code is fundamentally different from
evaluating another person's intents. This is demonstrated quite well
by the events surrounding the popular event-stream
npm package,
[event-stream]. In 2018, a new developer gained the trust of the
package's maintainer by contributing a number of high-quality changes.
The original developer eventually made the new developer the
maintainer, and the new maintainer introduced malicious code to steal
user's credentials.¶
Signing commits relies on each developer having a long-term identity key, which they keep safe. If the key is compromised, the attacker is able to impersonate the developer. It is possible to limit the damage by revoking the compromised key, or having another authorized user retire the developer's access.¶
In this regard, sigstore appears to be better as it relies on ephemeral signing keys, which are issued by a central authority. However, in order to obtain a signing key, the user needs to log in. If they use a password, then if an attacker gets access to the password, an attacker can impersonate the developer. If the developer uses a second factor like a hardware token, then they are again using private key cryptography, and may as well put their private keys on a hardware token, and forego the centralized infrastructure.¶
This specification has concentrated on enabling a user of a software project to authenticate new versions. But most software has its own dependencies, and those also need to be authenticated. A user could identify all software that they are willing to rely on, but this is more work than most users are willing and able to do. But, just as developers are usually in a better position to evaluate who should be allowed to contribute to their project, they are also in a better position to designate a trust root for their dependencies.¶
Enabling this functionality requires ecosystem-specific tooling. The
developer needs to be able to specifying a trust root for each
dependency, and the build infrastructure needs to authenticate the
dependencies. For instance, the Rust ecosystem uses Cargo for
building and dependency management. Currently, to add
sequoia-openpgp
as a dependency to a project, a developer would
modify their Cargo.toml
file as follows:¶
[dependencies] sequoia-openpgp = { version = "1" }¶
Instead, they would also specify a trust root, which they've presumably audited:¶
[dependencies] sequoia-openpgp = { version = "1", trust-root = "HASH" }¶
When downloading the dependency, cargo
would make sure that the
dependency can be authenticated from the specified trust root, and if
not throw an error.¶
This is a first draft that has not been published.¶
My thanks go---in particular, but not only---to the Sequoia PGP team for many fruitful discussions. Funding for this project was provided by the Sovereign Tech Fund.¶