Highlights from Git 2.25


The open source Git project just released Git 2.25 with features and bug fixes from over 84 contributors, 32 of them new. Here’s our look at some of the most exciting features and changes introduced since Git 2.24.

Sparse checkout management made easy

In the past few releases, you might have read mentions of topics like “partial clone support” and “sparse checkouts” in blog posts such as these. In 2.25, Git takes another step closer to bringing mature and configurable partial clone support to all users.

What are “partial clones”

Before we dive into the new changes, it’s worth taking some time to discuss what partial clones are, and where they’re at today.

A clone of a Git repository copies all of its data: every version of every file in the history. For very large repositories, the cost of network transfer and local storage can make this awkward or even impossible, even if you’re only interested in a subset of the files. In the past several versions, Git learned the ability to execute a “partial” clone, which means that it can now clone and work with repositories without having all of their contents.

Partial clones are still considered an experimental feature from Git’s point of view. For instance, many providers (such as GitHub) don’t support this feature yet, and it’s continually changing and evolving within Git from release to release.

For now, let’s focus on better understanding partial cloning by reviewing the perspective from both the server and client. The client must do two things: First, it must be able to tell the server that it wants only some objects from a repository. Likewise, it must also be able to tolerate local repositories which lack a complete set of objects. On the other hand, the server must be able to interpret the client’s request to serve only some objects, and be able to generate an adequate response.

How does this get done today? Let’s say that your repository has a manageable amount of history, but too many files to fit comfortably on your hard drive. In this case, you might want to clone only part a repository’s contents, by executing something like:

$ git clone --filter=blob:none --no-checkout /your/repository/here

Let’s break down what that means. First, specifying –filter= allows you to tell the server you’re cloning from the objects you choose. (In our example, we asked the server to avoid sending us blobs, but you can use a number of possible qualifiers). Next, we have to tell Git that it can skip checking out the repository after it receives a response from the server. Why? Because if Git tries to check out the contents, it will realize that it has missing objects, and try to request them from the server. We can prevent this from happening with--no-checkoutwhich tells Git to avoid checking out the repository entirely.

Now we have a repository on disk that has some of the objects from the server, but none of them are checked out to be read/written to/etc. What do we do now? Somehow, we have to tell Git which objects are okay to skip when checking out the repository to be able to actually check out. Thankfully, we can use a sparse checkout in order to make this happen.

Sparse checkouts

A sparse checkout is nothing more than a list of file path patterns that Git should attempt to populate in your working copy when checking out the contents of your repository. Effectively, it works like a .gitignore, except it acts on the contents of your working copy, rather than on your index. The downside is that sparse checkouts can be rather difficult to specify. For instance, here’s the incantation to avoid checking out files having depth two or greater:

$ git clone --filter=blob:none --no-checkout /your/repository/here repo
$ cd repo
$ cat >.git/info/sparse-checkout <<EOF
/*
!/*
EOF
$ git config core.sparseCheckout 1
$ git checkout .

Partial clones and git sparse-checkout

In Git 2.25, the previous example gets a lot easier with the introduction of a new git sparse-checkout command. We’ll review the new features shortly, but to give you a sense of how your workflow might change, here’s the same example using git sparse-checkout:

$ git clone --filter=blob:none --sparse /your/repository/here repo

The idea behind the git sparse-checkoutcommand is simple: allow users to play with partial clones and sparse-checkouts as easily as possible. It can do four things: set the list of paths to checkout, print the current list, and enable or disable sparse checkouts entirely. (Note: That’s the set, list, enable, and disable subcommands, respectively).

Instead of writing complicated .gitignore patterns into .git/info/sparse-checkout, git sparse-checkout handles the work for you. To checkout a new path, simply execute the following:

$ git sparse-checkout set /path/to/check/out

However, with great power comes great responsibility. If you have both an exceptionally large repository and an exceptionally long list of sparse-checkout patterns, it may take Git a substantial time to compute whether a given path does or doesn’t need to be checked out.

Thankfully, git sparse-checkout again comes to our rescue with “cone mode”.  When working in cone mode (opted into by running git config core.sparseCheckoutCone, the set of allowed patterns becomes more restrictive. Instead of arbitrary .gitignore patterns, you can specify whether all paths, or all files (within a given subdirectory) should be checked out.

For example, if you have a directory A/B/C, within a large repository and C is where you do most of your work, you’ll probably want to have C fully checked out. You’ll also want to have A and B checked out enough so that you can get to C, but not much more. When in cone mode, git sparse-checkout set A/B/C will do exactly that. To learn more about cone mode, check out the documentation.

For now the git sparse-checkout command is experimental, and its behavior is subject to change. Likewise, many providers (including GitHub) are still experimenting with partial clone support and it’s not yet generally available. We’ll make sure to keep you updated with the progress of both.

In the meantime, we’re publishing an in-depth overview of the sparse checkout feature by the author, Derrick Stolee. We’ll have lots more information about different workflows including cone mode—check back within the next couple of days, and we’ll share the link once it’s published.

Tidbits

Several blog posts ago, we talked about the --rebase-merges option, which is used to preserve the branch structure of your repository when rebasing. In v2.22.0 the option that --rebase-merges replaced, --preserve-merges, was deprecated. This release takes that deprecation even further by removing all mentions of --preserve-merges from the help text for git rebase.

If you’re still relying on scripts that use git rebase --preserve-merges, this release is a good time to update them.

[source]

Even though pull requests and issues may feel familiar to us on GitHub, the Git project itself works differently and uses a mailing list to email patches back and forth.

A feature Git has that makes this workflow easier is called branch descriptions. A branch’s description is used to fill the cover letter when sending a series of patches, and can be useful if you wish to send multiple versions of the same patches.

Git can now be instructed to use a branch description’s first paragraph to fill in the placeholder value for the Subject: header in the cover-letter email. To tell Git to do this, use git format-patch --cover-from-description subject.

[source]

Here’s a pair of Git features that you might not have known about: git apply --3way and the merge.conflictStyle setting.  You may have used git apply to apply a *.patch file to your repository, and perhaps even the --3way option to leave yourself in a conflict resolution state when the patch didn’t apply cleanly. Likewise, the latter configuration value is used to control how Git formats merge conflicts for you to resolve.

Now in Git 2.25, the two can work together so that git apply honors the conflict style you’ve set when it encounters patches that require merge conflict resolution before applying.

[source]

You may recall from our Git 2.23 blog post that Git supports function signature detection for features like git <diff|grep> --show-function and --function-context. (To use these, you’ll need to mark the file type using one of your repository’s .gitattributes files).

In Git 2.25, support has been improved to also detect function boundaries for programs written in the Elixir language.

[source]

Many commands that take a pathspec, for example, git add, git commit, git reset and so on understand a new option --pathspec-from-file. If you have many pathspecs to pass to one of these commands, you might write git add $(cat your-pathspecs). If your-pathspecs is too long, you might instead use xargs, which works fine in this example since xargs will simply run git add more than once. However, what if your command is git commit, in which case running it more than once no longer works?

Now, you can write git commit --pathspec-from-file=your-pathspecs and add as many pathspecs as arguments as you desire, which can be handy if you’re scripting around Git in especially large repositories.

[source, source, source]

In an older blog post, we talked about Git’s ability to detect renames at the directory level when writing your commits. Until now there’s been a subtle bug that caused this detection mechanism to fail when the contents of a subdirectory moved to the root of your repository. In Git 2.25, this bug has been squashed.

[source]

One of the very first Git commands you likely learned was git add. One particularly neat way to use this command that you may not have learned is with that same option we were discussing earlier, -i.

When invoked with -i, git add splits the changes you’re trying to stage into piece (colloquially, “hunks”) and asks you whether or not you want to stage each one. This is really useful if you want to split the changes from your working copy into multiple commits.

Some Git trivia is that this command has, since its inception, been backed by Perl. In 2018, an effort has begun to rewrite the engine powering git add -i in C, like the majority of the rest of Git. Cooler still is that this project is from an Outreachy internship. This work is still waiting on a few remaining changes to make git add -p feature-complete, but expect those features soon.

[source, source, source, source]

You may have used git log --graph to look at an ASCII rendering of the graph of history in your repository. If you’ve ever used this on a particularly large repository with a lot of long-running branches, the output may have filled the width of your terminal.

In Git 2.25, this command got a lot of love: a careful refactoring made it possible to significantly improve and simplify the output of git log --graph while still being faithful to the structure of history.

The before-and-after shots don’t quite fit here, but they’re too cool to ignore. So, if you’re into gratuitously awesome ASCII art, it’s the place to be.

[source]

Why stop at just one git log tidbit? If you wanted more, here’s another one. Back when Git 2.22 was released, we talked about ways to change the output of your log with git log --format=....In Git 2.25, --format learned the verb l/L, to use the part of an email address preceding the @[1].

If you’re working on a repository where everybody at your company shares the same email domain address, this can be useful for seeing usernames without wasting space printing the same domain over and over. If you’re curious, try git log --format='%h %C(cyan)%al %C(yellow)%s on one of your company’s repositories.

[source]

Learn more

That’s just a sample of changes from the latest version. Check out the release notes for 2.25 or any previous version in the Git repository.

[1]: The casing on this verb indicates whether to show the email address with or without applying the .mailmap translation(s).