SSH Agent with Powershell

Another "note to my future self", recoded here for posterity.

Today I finally got around to making SSH agent work in Powershell with Git.  For the last year or so, I haven't had to deal with because my work mostly involved writing PHP code inside of WSL.  In that scenario, you're essentially using Linux, so ssh-agent works just fine.

But on native Windows with Powershell...not so much.

I mean, sure, I could just use Git BASH.  But why would I want to do that?  Powershell is awesome, and even if I probably know BASH better, I prefer Powershell.

But it turns out it wasn't all that difficult to get things working in Powershell.  There were two pieces to it:

  1. Use the right SSH agent (i.e. the one that comes with Git).
  2. Write a Powershell adapter for it.

It turns out that there's a Windows service for "OpenSSH Authentication Agent".  I'm not entirely sure what installs that, but apparently it's different from the ssh-agent that's installed with the Windows Git package and the ssh-add command from that doesn't seem to talk to it properly.

My solution was to just disable that service and use the plain-old ssh-agent that comes with Git.  The only problem with that is that the traditional invocation of eval `ssh-agent` doesn't work in Powershell because it outputs a BASH-formatted script.  But that's easily fixed with a couple of regular expressions.  So I added this to my Powershell $profile:

Set-Alias ssh-agent "$env:ProgramFiles\git\usr\bin\ssh-agent.exe"
Set-Alias ssh-add "$env:ProgramFiles\git\usr\bin\ssh-add.exe"

# Need to turn off Open SSH Authentication Agent system service,
# then run the ssh-agent from Git.
function Start-SshAgent {
    $output = (ssh-agent)
    $sock_match = ($output | Select-String -Pattern 'SSH_AUTH_SOCK=(\S+);')
    $sock_path = $sock_match[0].Matches.Groups[1].Value
    $pid_match = ($output | Select-String -Pattern 'SSH_AGENT_PID=(\d+);')
    $agent_pid = $pid_match[0].Matches.Groups[1].Value
    $env:SSH_AUTH_SOCK = $sock_path
    $env:SSH_AGENT_PID = $agent_pid
    Write-Output "Agent pid $agent_pid"
}

Start-SshAgent

And there we go!  Now I can just run ssh-add to add my key and Git picks it up as expected.

On GitHub pipelines and diverging branches

Just before Christmas I started a new job.  I won't get into the details, but my new team has a different workflow than I'm used to, and the other day I noticed a problem with it.  My colleague suggested that the experience might make for good blog-fodder, so here's the break-down.

First, let me start by describing the workflow I was used to at my last job.  We had a private GitLab instance and used a forking workflow, so getting a change into production went like this:

  1. Developer forks the main repo to their GitLab account.
  2. Developer does their thing and makes a bunch of commits.
  3. Developer opens a merge request from their fork's branch to the main repo's main branch.  Code review ensues.
  4. When review is complete, the QA team pulls down the code from the developer's fork and tests in a QA environment.  Obviously testing needs differ, but a "QA environment" is generally exactly the same thing as a developer environment (in this case, a set of disposable OpenStack VMs).
  5. When testing is complete, the merge request gets merged and the code will go out in the next release (whenever that is - we didn't do continuous deployment).
  6. Every night, a set of system-level tests runs against a production-like setup that uses the main branches of all the relevant repos.  Any failures get investigated by developers and QA the next morning.

I'm sure many people would quibble with various parts of this process, and I'm not going to claim there weren't problems, but it worked well enough.  But the key feature to note here is the simplicity of the branching setup.  It's basically a two-step process: you fork from X and then merge back to X.  You might have to pull in new changes along the way, but everything gets reconciled sooner or later.

The new team's process is not like that.  We use GitHub, and instead of one main branch, there are three branches to deal with: dev, test, and master, with deployment jobs linked to dev and test.  And in this case, the merging only goes in one direction.  So a typical workflow would go like this:

  1. Developer creates a branch off of master, call if "feature-X"
  2. Developer does their thing and makes a bunch of commits.
  3. Developer opens a pull request from feature-X to dev and code review ensues.
  4. When the pull request is approved, the developer merges it and the dev branch code is automatically deployed to a shared development environment where the developer can test it.  (This might not be necessary in all cases, e.g. if local testing is sufficient.)
  5. When the developer is ready to hand the code off to QA, they open a pull request from feature-X to test.  Again, review ensues.
  6. When review is done, the pull request gets merged and the test branch code is automatically deployed to test, where QA pokes at it.
  7. When QA is done, the developer opens a pull request from feature-X to master and (drum roll) review ensues.
  8. When the master pull request is approved, the code is merged and is ready to be deployed in the next release, which is a manual (but pretty frequent) process.

You might notice something odd here - we're only ever merging to dev and test, never from them.  There are occasionally merges from master to those branches, but never the other way around.  Now, in theory this should be fine, right?  As long as everything gets merged to all three branches in the same order, they'll end up with the same code.  Granted, it's three times as many pull requests to review as you really need, but other than that it should work.

Unfortunately, theory rarely matches practice.  In fact, the three branches end up diverging - sometimes wildly.  On a large team, this is easy to do by accident - Bob and Joe are both working on features, Bob gets his code merged to test first, but testing takes a long time, so Joe's code gets out of QA and into master first.  So if there are any conflicts, you have the potential for things like inconsistent resolutions.  But in our case, I found a bunch of code that was committed to the dev branch and just never made it out to test or master.  In some cases, it even looks like this was intentional.

So this creates an obvious procedural issue: the code you test in QA is not necessarily the same as what ends up in production.  This may be fine, or it may not - it depends on how the code diverges.  But it still creates an obvious risk because you don't know if the code your releasing is actually the same as what you validated.

But it gets worse.  This also creates issues with the GitHub pipeline, which is where we get to the next part of the story.

Our GitHub pipelines are set up to run on both "push" and "pull_request" actions.  We ended up having to do both in order to avoid spurious error reporting from CodeQL, but that's a different story.  The key thing to notice here is that, by default, GitHub "pull_request" actions don't run against the source branch of your pull request, they run against a merge of the source and target branches.  Which, when you think about it, is probably what you want.  That way you can be confident that the merged code will pass your checks.

If you're following closely, the problem may be evident at this point - the original code is based on master, but it needs to be merged to dev and test, which diverge from master.  That means that you can get into a situation where a change introduces breakage in code from the target branch that isn't even present in the source branch.  This makes it very hard to fix the pipeline.  Your only real choice at that point is to make another branch of the target branch, merge your code into that, and then re-create the pull request with the new merged branch.  This is annoying and awkward at best.

But it gets worse than that, because it turns out that your pipeline might report success, even if the merge result would be broken!  This appears to be a GitHub issue and it can be triggered simply by creating pull requests.  

The easiest way to explain is probably by describing the situation I actually ran into.  I had a change in my feature-X branch and wanted to go through our normal process, which involves creating three pull requests.  But in my case, this was just a pipeline change (specifically, adding PHPStan analysis), so it didn't require any testing in dev or test.  Once it was approved, it could be merged immediately.  So here's what I did:

  1. First, I created a pull request against dev.  The "pull_request" pipeline here actually failed, because there was a bunch of code in the dev branch that violated the PHPStan rules and wasn't in master, so I couldn't even fix it.  Crud.
  2. After messing around with dev for a while, I decided to come back to that and just create the pull requests for test and master.
  3. So I created the pull request for test.  That failed due to drift from master as well.  Double crud.
  4. Then I created the pull request for master.  That succeeded, as expected, since it was branched from master.  So at least one of them was reviewable.
  5. Then I went back and looked at the dev pull request and discovered that the "pull_request" pipeline job now reported as passing!

Let me say that more explicitly: the "pull_request" job on my pipeline went from "fail" to "pass" because I created a different pull request for the same branch.  There was no code change or additional commits involved.

Needless to say, this is very bad.  The point of running the pipeline on a pull request is to verify that it's safe to merge.  But if just doing things in the wrong order can change a "fail" to a "pass", that means that I can't trust the results of my GitHub pipeline - which defeats the entire purpose of having it!

As for why this happens, I'm not really certain.  But from my testing, it looks like GitHub ties the results of the "pull_request" job to the last commit on the source branch.  So when I created the pull request to dev, GitHub checked out a merge of my code and dev, ran the pipeline, and it failed.  It then stores that as part of the results for the last commit on the branch.  Then I created the master pull request.  This time GitHub runs the pipeline jobs against a merge of my code with master and the jobs pass.  But it still associates that result with the last commit on the branch.  Since it's the same commit and branch are for both pull requests, this success clobbers the failure on the dev pull request and they both report a "pass".  (And in case you're wondering, re-running the failed job doesn't help - it just runs whatever the last branch it tested was, so the result doesn't change.)

The good news is that this only seems to affect pull requests with the same source branch.  If you create a new branch with the same commits and use that for one pull request and the original for the other, they don't seem to step on each other.  In my case, I actually had to do that anyway to resolve the pipeline failures.

So what's the bottom line?  Don't manage your Git branches like this!  There are any number of valid approaches to branch management, but this one just doesn't work well.  It introduces extra work in the form of extra pull requests and merge issues; it actually creates risk by allowing divergence between what's tested and what's released; and it just really doesn't work properly with GitHub.  So find a different approach that works for you - the simpler, the better.  And remember that your workflow tools are supposed to make things easier.  If you find yourself fighting with them, then you're probably doing something wrong.

Stupid Git tricks: Undoing a stash pop

Here's a handy little Git nugget I learned the other day: if you get conflicts trying to apply a popped stash, it doesn't get removed from the stack.  I didn't actually know this, but it's a very handy piece of information.

In this case, I was trying to stash my changes so I could switch to a different branch.  Normally this is as easy as git stash ; git co develop ; git stash pop, but I failed to account for conflicts.  And this time it wasn't the "nice" kind of conflict, where you just fix it and move on with your life.  Noooooo.  This time, I realized that I actually needed to base my changes on the branch I was originally on.  (That branch was due to be merged into develop, but it just hadn't happened yet.)  So fixing the conflicts would have been a waste of time because I'd just have the same conflicts in reverse when I switched back.

But it turns out you can just do a git reset --hard whatever and because of the conflicts, the stashed changes will still be on the top of the stack.  Very handy!  This little nugget saved me a bunch of time.

I give up - switching to GItHub

Well, I officially give up.  I'm switching to GitHub.

If you read back through this blog, you might get the idea that I'm a bit of a contrarian.  I'm generally not the type to jump on the latest popular thing.  I'd rather go my own way and do what I think is best than go along with the crowd.  But at the same time, I know a lost cause when I see it and I can recognize when it's time to cut my losses.

For many years, I ran my own Mercurial repository on my web host, including the web viewer interface, as well as my own issue tracker (originally MantisBT, more recently The Bug Genie).  However, I've reached the point where I can't justify doing that anymore.  So I'm giving up and switching over to GitHub like everybody else.

I take no real pleasure in this.  I've been using Git professionally for many years, but I've never been a big fan of it.  I mean, I can't say it's bad - it's not.  But I think it's hard to use and more complicated than it needs to be.  As a comment I once saw put it, Git "isn't a revision control system, it's more of a workflow tool that you can use to do version control."  And I still think the only reason Git got popular is because it was created by programming celebrity Linus Torvalds.  If it had been created by Joe Nobody I suspect it would probably be in the same boat as Bazaar today.

That said, at this point it's clear that Git has won the distributed VCS war, and done so decisively.  Everything supports Git, and nothing supports Mercurial.  Heck, even BitBucket, the original cloud Mercurial host, is now dropping Mercurial support.  For me, that was kind of the final nail in the coffin.  

That's not the only reason for my switch, though.  There are a bunch of smaller things that have been adding up over time:

  • There's just more tool support for Git.  These days, if a development tool has any VCS integration, it's for Git.  Mercurial is left out in the cold.
  • While running my own Mercurial and bug tracker installations isn't a huge maintenance burden, it is a burden.  Every now and then they break because of my host changing some configuration, or they need to be upgraded.  These days my time is scarce and it's no longer fun or interesting to do that work.
  • There are some niggling bugs in my existing environment.  The one that really annoys me is that my last Mercurial upgrade broke the script that integrates it with The Bug Genie.  I could probably fix it if I really wanted to, but the script is larger than you'd expect and it's not enough of an annoyance to dedicate the time it would take to become familiar with it.
  • My web host actually now provides support for Git hosting.  So I can actually still have my own repo on my own hosting (in addition to GitHub) without having to do any extra work.
  • Honestly, at this point I've got ore experience with Git than Mercurial, to the point that I find myself trying to run Git commands in my Mercurial repos.  So by using Mercurial at home I'm kind of fighting my own instincts, which is counterproductive.

So there you have it.  I'm currently in the process of converting all my Mercurial repos to Git.  After that, I'll look at moving my issue tracking in to GitHub.  In the long run, it's gonna be less work to just go with the flow.

Making Windows Git Do SSH

Another quick note to my future self: setting up Git under Windows to use SSH key authentication is pretty easy...once you know what to do.

At work, we have some Composer and Bower packages that we fetch from our internal GitHub Enterprise server.  And, of course, all the source lines in the composer.json and bower.json files use SSH references.  I just use HTTP for my code checkouts, so I finally had to figure out how to make Git for Windows authenticate with my SSH key.

Turns out it's pretty easy.  I have a key pair I created with PuTTYgen.  All I had to do was export my private key to OpenSSH format and copy that file to C:\Users\<my_username>\.ssh\id_rsa.  Git picks it up with no further configuration.  Then I just added my key to GitHub and I was good to go.