Tuesday, August 18, 2020

Cleaning up stale git branches

Broken branches

If you have pull-requests, you likely have stale branches. What’s the best way to find which branches can be safely deleted? This post will explore some approaches to find and delete stale branches.

As there are many different ways we can approach this, I’m going to start with the most generic concepts and build up to a more programmatic solution. I want to use the Azure DevOps CLI and REST APIs for this instead of git-centric commands because I want to be able to run the scripts from any computer against the latest version of the repository. This also opens up the possibility of running these activities in a PowerShell script in an Azure Pipeline, as outlined in my previous post.

Table of Contents

Who can delete Branches?

One of the reasons branches don’t get deleted might be a permissions problem. In Azure DevOps, the default permission settings set up the creator of the branch with the permission to delete it. If you don’t have permission, the option isn’t available to you in the user-interface, and this creates a missed opportunity to remediate the issue when someone other than the author completes the PR.

pr-complete-merge-disabled-delete

pr-delete-source-branch

You can change the default setting by adding the appropriate user or group to the repository’s permissions and the existing branches will inherit. You’ll need the “Force push (rewrite history, delete branches and tags)” permission to delete the branch.  See my last post on ways to apply this policy programmatically.

branch-permission-forcepush

If we want to run this from a pipeline, we would have to grant the Build Service the same permissions.

Finding the Author of a Branch

One approach to cleaning-up stale branches is the old fashion way: nagging. Simply track down the branch authors and ask them to determine if they’re done with them.

We can find the authors for the branches with the following PowerShell + Az DevOps CLI:

$project    = "<project-name>"
$repository = "<repository-name>"

$refs = az repos ref list `
--query "[?starts_with(name, 'refs/heads')].{name:name, uniqueName:creator.uniqueName}" ` --project $project --repository $repository | ConvertFrom-Json $refs | Sort-Object -Property uniqueName

az-repos-ref-list-example1

Finding the Last Contributor to a Branch

Sometimes, the author of the branch isn’t the person doing the work. If this is the case, you need to track down the last person to commit against the branch. This information is available in the Azure DevOps user interface (Repos –> Branches):

branch-authors

If you want to obtain this information programmatically, az repos list ref provides us with the objectId SHA-1 of the most recent commit. Although az repos doesn’t expose a command to retrieve the commit details, we can use the az devops invoke command to call the Get Commit REST endpoint.

When we fetch the detailed information on the branch, we want to get the author that created the commit and the date that they pushed it to the server. We want the push details because a developer may have made the commit a long time ago but only recently updated the branch.

$project    = "<project-name>"
$repository = "<repository-name>"

$refs = az repos ref list –p $project –r $repository --filter heads | ConvertFrom-Json

$results = @()

foreach($ref in $refs) {

$objectId = $ref.objectId
# fetch individual commit details $commit = az devops invoke `
--area git `
--resource commits ` --route-parameters ` project=$project ` repositoryId=$repository ` commitId=$objectId |
ConvertFrom-Json $result = [PSCustomObject]@{ name = $ref.name creator = $ref.creator.uniqueName lastAuthor = $commit.committer.email
lastModified = $commit.push.date } $results += ,$result } $results | Sort-Object -Property lastAuthor

az-repos-ref-list-example

This gives us a lot of details for the last commit in each branch, but if you’ve got a lot of branches, fetching each commit individually could be really slow. So, instead we can use the same Get Commit endpoint to fetch the commit information in batches by providing a collection of objectIds in a comma-delimited format.

Note that there’s a limit to how many commits we can ask for at a time, so I’ll have to batch my batches. I could also use the Get Commits Batch endpoint that accepts the list of ids in the body of a POST message.

The following shows me batching 50 commits at a time. Your batch size may vary if the name of your server or organization name is a longer length:

$batchSize = 50
$batches = [Math]::Ceiling($refs.Length / $batchSize)

for( $x=0; $x -lt $batches; $x++ )
{
# take a batch $batch = $refs | Select-Object -First $batchSize -Skip ($x * $batchSize)

# grab the ids for the batch $ids = ($batch | ForEach-Object {$_.objectId}) -join ','

# ask for the commit details for these items $commits = az devops invoke --area git --resource commits ` --route-parameters ` project=$project `
repositoryId=$repository ` --query-parameters ` searchCriteria.ids=$ids ` searchCriteria.includePushData=true ` --query "value[]" | ConvertFrom-Json
# loop through this batch of commits for($i=0; $i -lt $commits.Length; $i++) {
$ref = $refs[($x*$batchSize)+$i] $commit = $commits[$i]

# add commit information to the batch $ref | Add-Member -Name "author" -Value $commit.author.email -MemberType NoteProperty $ref | Add-Member -Name "lastModified" -Value $commit.push.date -MemberType NoteProperty

# add the creator’s email on here for easier access in the select-object statement...
$ref | Add-Member –Name "uniqueName” –Value $ref.creator.uniqueName –MemberType NoteProperty } } $refs | Select-Object -Property name,creator,author,lastModified

Caveat about this approach: If you've updated the source branch by merging from the target branch, the last author will be from that target branch – which isn’t what we want. Even worse, there's no way to infer this scenario from the git commit details alone. One way we can solve this problem is to fetch the commits in the branch and walk up the parents of the commit until we find a commit that has more than one parent – this would be our merge commit from the target branch, which should have been done by the last author on the branch. Note that the parent information is only available if you query these items one-by-one, so this approach could be painfully slow. (If you know a better approach, let me know)

Check the Expiry

Now that we have date information associated to our branches, we can start to filter out the branches that should be considered stale. In my opinion anything that’s older than 3 weeks is a good starting point.

$date = [DateTime]::Today.AddDays( -21 )
$refs = $refs | Where-Object { $_.lastModified -lt $date }

Your kilometrage will obviously vary based on the volume of work in your repository, but on a recent project 10-20% of the branches were created recently.

Finding Branches that have Completed Pull-Requests

If you’re squashing your commits when your merge, you’ll find that the ahead / behind feature in the Azure DevOps UI is completely unreliable. This is because a squash merge re-writes history, so your commits in the branch will never appear in the target-branch at all. Microsoft recommends deleting the source branch when using this strategy as there is little value in keeping these branches around after the PR is completed. Teams may argue that they want to cherry-pick individual commits from the source-branch, but the practicality of that requires pristine

Our best bet to find stale branches is to look at the Pull-Request history and consider all branches that are associated to completed Pull-Requests as candidates for deletion. This is super easy, barely an inconvenience.

$prs = az repos pr list `
          --project $project `
          --repository $repository `
          --target-branch develop `
          --status completed `
--query "[].sourceRefName" | ConvertFrom-Json | $refs | Where-Object { $prs.Contains( $_.name ) } | ForEach-Object { $result = az repos ref delete ` --name $_.name ` --object-id $_.id ` --project $project ` --repository $repository | ConvertFrom-Json
Write-Host ("Success Message: {0}" –f $result.updateStatus) }

At first glance, this would remove about 50% of the remaining branches in our repository, leaving us with 10-20% recent branches and an additional 30-40% of branches without PRs. This is roughly a 40% reduction, and I’ll take that for now. It’s important to recognize this only includes the completed PRs, not the active or abandoned.

Wrapping Up

Using a combination of these techniques we could easily reduce the amount of stale branches, and then provide the remaining list to the team to have them clean-up the dredges. The majority of old branches are likely abandoned work, but there's sometimes scenarios where partially completed work is waiting on some external dependency. In that scenario, encourage the team to keep these important branches up-to-date to retain the value of the invested effort.

The best overall strategy is to adopt a strategy that does not let this situation occur: give the individuals reviewing and completing PRs the permission to delete branches and encourage teams to squash and delete branches as they complete PRs. Good habits create good hygiene.

Happy coding.

0 comments: