Installing PHPStorm under WSL2

The other week I tried to install PHPStorm under WSL2.  Because that's a thing you can do now (especially since Linux GUI apps now work in recent Windows 10 updates).  The installation process itself was pretty simple.

  • Download PHPStorm for Linux from JetBrains website.
  • Now extract the tarball and run the bin/ script.
  • PHPStorm should start up.

The next step is to configure your license.  In my case, I was using a corporate license server.  The issue with this is that you need to log into JetBrains' website using a special link to activate the license.  Unfortunately:

  • By default, WSL doesn't have a browser installed.
  • Firefox can't be installed because the default build uses a snap image, and WSL apparently doesn't support snap.
  • PHPStorm doesn't appear to be able to properly deal with activating via a Windows browser (I tried pointing it to the Windows Chrome executable and got an error page that points to a port on localhost).

So how do we get around this?  Well, we need to install a browser in WSL and configure PHPStorm to use it.  So here's what we do:

  • Skip the registration for now by starting a trial license.
  • Download the Vivaldi for Linux DEB package from Vivaldi's website.  You could use a different browser, but I like Vivaldi and it offers a convenient DEB package, so I used that.
  • Install the Vivaldi DEB.  WSL will be missing some packages, so you have to run apt install --fix-broken after installing it.
  • Go into the PHPStorm settings and configure your web browsers to include Vivaldi and set it as the default browser.
  • Go back to the registration dialog and try again.  This time, PHPStorm should start up Vivaldi and direct you to the appropriate link.
  • Log into your JetBrains account and follow the instructions.  The web-based portion should succeed and registration should complete when you click "activate" in PHPStorm again.

There we go - PHPStorm is registered and works.  Mildly annoying setup, but not actually that bad.

DuoLingo is actually pretty fun

Last summer, I started playing with the DuoLingo app.  My wife originally started looking at it as a resource for teaching our son a foreign language (which is going to be part of his curriculum this year), and I thought I'd check it out.  After all, I figured we had an upcoming vacation in Mexico and it couldn't hurt to brush up on my Spanish, right?  (Turned out it wasn't really necessary - we spent the whole time on the resort and nearly everybody spoke enough English to communicate.  But that's not the point.)

Turns out it's kind of a fun little app.  Sure, the presentation is very cartoonish and oriented toward children, but not distractingly so.  More importantly, it offers enough gamification to keep it interesting and allows you to do lessons in very small bites.

I subscribed to the premium package (since I figured it would be a family thing), so I haven't really messed with the free version much.  I'm not sure how much of a difference that makes to the user experience, so caveat emptor.

The exercises cover a good range of capabilities.  They include basic written translation exercises, where you read a sentence in one language and translate it to the other, either though free-form typing or a pick-a-word interface; fill-in-the-blank exercises where you have to complete a sentence; listening exercises where you type back what you hear; speaking exercises where you read/repeat a sentence; and stories that you listen to and then answer comprehension questions.  For most exercises, the interface allows you to tap a word to get the definition, which is handy.  There are also tips that you can access and which get displayed if you get a question wrong too many times.

The gamification aspect is what I find interesting and enjoyable.  There are a number of aspects to it, so you can go as deep as you want.  These include daily challenges, like completing 12 listening exercises; long-term challenges, like learning a certain number of new words; levels to progress through; various streaks to establish and maintain; leagues to compete in; and even "friend quests" to work with another user to collectively reach a goal, like a certain number of lessons completed in a week.  You can earn "points" for leagues and challenges by completing lessons and "gems" that can be used to buy power-ups by completing challenges.  You can also buy gems with cash, if you're so inclined, but they're really only used to buy "streak freezes" or "time boosts", which you don't really need.

The thing that really helps me stay with the app, though, is the lesson sizes.  They're very short.  It varies, of course, depending on what type of lesson you're doing, but it's not a big time commitment at all.  The fastest can be as short as one minute, up to maybe 6 or 7 minutes.  Sure, you're not going to learn all that much in 5 minutes, but reinforcement helps.  Smaller lessons give you lots of opportunity for review, so you can pick up new stuff slowly and get comfortable using it.  And most importantly, if you're busy, you're more likely to actually do small lessons on a regular basis.  Is that the path to rapid fluency?  Clearly not.  But it's still a way to improve your skill with a language.

On GitHub pipelines and diverging branches

Just before Christmas I started a new job.  I won't get into the details, but my new team has a different workflow than I'm used to, and the other day I noticed a problem with it.  My colleague suggested that the experience might make for good blog-fodder, so here's the break-down.

First, let me start by describing the workflow I was used to at my last job.  We had a private GitLab instance and used a forking workflow, so getting a change into production went like this:

  1. Developer forks the main repo to their GitLab account.
  2. Developer does their thing and makes a bunch of commits.
  3. Developer opens a merge request from their fork's branch to the main repo's main branch.  Code review ensues.
  4. When review is complete, the QA team pulls down the code from the developer's fork and tests in a QA environment.  Obviously testing needs differ, but a "QA environment" is generally exactly the same thing as a developer environment (in this case, a set of disposable OpenStack VMs).
  5. When testing is complete, the merge request gets merged and the code will go out in the next release (whenever that is - we didn't do continuous deployment).
  6. Every night, a set of system-level tests runs against a production-like setup that uses the main branches of all the relevant repos.  Any failures get investigated by developers and QA the next morning.

I'm sure many people would quibble with various parts of this process, and I'm not going to claim there weren't problems, but it worked well enough.  But the key feature to note here is the simplicity of the branching setup.  It's basically a two-step process: you fork from X and then merge back to X.  You might have to pull in new changes along the way, but everything gets reconciled sooner or later.

The new team's process is not like that.  We use GitHub, and instead of one main branch, there are three branches to deal with: dev, test, and master, with deployment jobs linked to dev and test.  And in this case, the merging only goes in one direction.  So a typical workflow would go like this:

  1. Developer creates a branch off of master, call if "feature-X"
  2. Developer does their thing and makes a bunch of commits.
  3. Developer opens a pull request from feature-X to dev and code review ensues.
  4. When the pull request is approved, the developer merges it and the dev branch code is automatically deployed to a shared development environment where the developer can test it.  (This might not be necessary in all cases, e.g. if local testing is sufficient.)
  5. When the developer is ready to hand the code off to QA, they open a pull request from feature-X to test.  Again, review ensues.
  6. When review is done, the pull request gets merged and the test branch code is automatically deployed to test, where QA pokes at it.
  7. When QA is done, the developer opens a pull request from feature-X to master and (drum roll) review ensues.
  8. When the master pull request is approved, the code is merged and is ready to be deployed in the next release, which is a manual (but pretty frequent) process.

You might notice something odd here - we're only ever merging to dev and test, never from them.  There are occasionally merges from master to those branches, but never the other way around.  Now, in theory this should be fine, right?  As long as everything gets merged to all three branches in the same order, they'll end up with the same code.  Granted, it's three times as many pull requests to review as you really need, but other than that it should work.

Unfortunately, theory rarely matches practice.  In fact, the three branches end up diverging - sometimes wildly.  On a large team, this is easy to do by accident - Bob and Joe are both working on features, Bob gets his code merged to test first, but testing takes a long time, so Joe's code gets out of QA and into master first.  So if there are any conflicts, you have the potential for things like inconsistent resolutions.  But in our case, I found a bunch of code that was committed to the dev branch and just never made it out to test or master.  In some cases, it even looks like this was intentional.

So this creates an obvious procedural issue: the code you test in QA is not necessarily the same as what ends up in production.  This may be fine, or it may not - it depends on how the code diverges.  But it still creates an obvious risk because you don't know if the code your releasing is actually the same as what you validated.

But it gets worse.  This also creates issues with the GitHub pipeline, which is where we get to the next part of the story.

Our GitHub pipelines are set up to run on both "push" and "pull_request" actions.  We ended up having to do both in order to avoid spurious error reporting from CodeQL, but that's a different story.  The key thing to notice here is that, by default, GitHub "pull_request" actions don't run against the source branch of your pull request, they run against a merge of the source and target branches.  Which, when you think about it, is probably what you want.  That way you can be confident that the merged code will pass your checks.

If you're following closely, the problem may be evident at this point - the original code is based on master, but it needs to be merged to dev and test, which diverge from master.  That means that you can get into a situation where a change introduces breakage in code from the target branch that isn't even present in the source branch.  This makes it very hard to fix the pipeline.  Your only real choice at that point is to make another branch of the target branch, merge your code into that, and then re-create the pull request with the new merged branch.  This is annoying and awkward at best.

But it gets worse than that, because it turns out that your pipeline might report success, even if the merge result would be broken!  This appears to be a GitHub issue and it can be triggered simply by creating pull requests.  

The easiest way to explain is probably by describing the situation I actually ran into.  I had a change in my feature-X branch and wanted to go through our normal process, which involves creating three pull requests.  But in my case, this was just a pipeline change (specifically, adding PHPStan analysis), so it didn't require any testing in dev or test.  Once it was approved, it could be merged immediately.  So here's what I did:

  1. First, I created a pull request against dev.  The "pull_request" pipeline here actually failed, because there was a bunch of code in the dev branch that violated the PHPStan rules and wasn't in master, so I couldn't even fix it.  Crud.
  2. After messing around with dev for a while, I decided to come back to that and just create the pull requests for test and master.
  3. So I created the pull request for test.  That failed due to drift from master as well.  Double crud.
  4. Then I created the pull request for master.  That succeeded, as expected, since it was branched from master.  So at least one of them was reviewable.
  5. Then I went back and looked at the dev pull request and discovered that the "pull_request" pipeline job now reported as passing!

Let me say that more explicitly: the "pull_request" job on my pipeline went from "fail" to "pass" because I created a different pull request for the same branch.  There was no code change or additional commits involved.

Needless to say, this is very bad.  The point of running the pipeline on a pull request is to verify that it's safe to merge.  But if just doing things in the wrong order can change a "fail" to a "pass", that means that I can't trust the results of my GitHub pipeline - which defeats the entire purpose of having it!

As for why this happens, I'm not really certain.  But from my testing, it looks like GitHub ties the results of the "pull_request" job to the last commit on the source branch.  So when I created the pull request to dev, GitHub checked out a merge of my code and dev, ran the pipeline, and it failed.  It then stores that as part of the results for the last commit on the branch.  Then I created the master pull request.  This time GitHub runs the pipeline jobs against a merge of my code with master and the jobs pass.  But it still associates that result with the last commit on the branch.  Since it's the same commit and branch are for both pull requests, this success clobbers the failure on the dev pull request and they both report a "pass".  (And in case you're wondering, re-running the failed job doesn't help - it just runs whatever the last branch it tested was, so the result doesn't change.)

The good news is that this only seems to affect pull requests with the same source branch.  If you create a new branch with the same commits and use that for one pull request and the original for the other, they don't seem to step on each other.  In my case, I actually had to do that anyway to resolve the pipeline failures.

So what's the bottom line?  Don't manage your Git branches like this!  There are any number of valid approaches to branch management, but this one just doesn't work well.  It introduces extra work in the form of extra pull requests and merge issues; it actually creates risk by allowing divergence between what's tested and what's released; and it just really doesn't work properly with GitHub.  So find a different approach that works for you - the simpler, the better.  And remember that your workflow tools are supposed to make things easier.  If you find yourself fighting with them, then you're probably doing something wrong.

OneDrive for Linux

As I mentioned a while ago, I replaced my desktop/home server this past summer.  In the process, I switched from my old setup of Ubuntu running Trinity Desktop to plain-old Ubuntu MATE, so I've been getting used to some new software anyway.  As part of this process, I figured it was time to take another look for OneDrive clients for Linux.

See, I actually kind of like OneDrive.  I have an Office 365 subscription, which means I get 1TB of OneDrive storage included, so I might as well use it.  I also happen to like the web interface and photo-syncing aspects of it pretty well.

However, I'm slightly paranoid and generally distrustful of cloud service providers, so I like to have local copies and offline backups of my files.  This is a problem for me, because my primary Windows machine is a laptop, and I don't want to pay the premium to put a multi-terabyte drive in my laptop just so I can sync my entire OneDrive, and scheduled backups to a USB disk are awkward for a laptop that's not plugged in most of the time.  Now, I do have a multi-terabyte drive connected to my Linux desktop, but for a long time there were no good OneDrive sync clients for Linux.  In the past, I had worked around this by using one-off sync tools like Unison (which...mostly worked most of the time) or by setting up an ownCloud sync on top of the OneDrive sync (which worked but was kind of janky).  However, but those depended on syncing from my Windows laptop, which was OK when I had 20 or 30 gigabytes of data in OneDrive, but at this point I'm well over 100GB.  Most of that is archival data like family photos and just eats up too much space on a 500GB SSD.

Enter InSync.  InSync is a third-party file sync tool that runs on Windows, Mac, and Linux and supports OneDrive, Google Drive, and Dropbox.  It has all the bells and whistles you'd expect, including file manager integrations, exclusions, directory selection, and other cool stuff.  But what I care about is the basics - two-way syncing.  And it does that really well.  In fact, it totally solves my problem right out of the box.  No more janky hacks - I can just connect it to my OneDrive account and it syncs things to my Linux box.

The only down-side to InSync is that it's proprietary (which I don't mind) and the licensing is confusing.  The up side is that it's not actually that expensive - currently, the pricing page lists licenses at $30 USD per cloud account.  So if you only want to sync OneDrive, it's $30 and you're done.  However, there's also an optional support contract and there's some difference between "legacy" licenses (which I think is what I have) and their new subscription model.  Frankly, I don't fully understand the difference, but as long as it syncs my OneDrive and doesn't cost too much, I don't really care.  

So if you're a OneDrive user and a Linux user, InSync is definitely worth a try.  I don't know about the other platforms or services (I assume they're all similar), but OneDrive on Linux works great.

Nextcloud session annoyances

This is a note to my future self about an annoyance with Nextcloud.  If you're not aware of it, Nextcloud is basically a fork of ownCloud, which is a "self-hosted cloud" platform.  Basically, they both provide a bunch of cloud-based services, like file sync and share, calendar, contacts, and various other things.  I switched to Nextcloud last year because ownCloud was lagging way behind in its support for newer PHP versions.

Anyway, I noticed a rather annoying issue where Nextcloud was leaving hundreds of stale auth tokens in the database.  Apparently, I'm not the only person this has happened to.

While Nextcloud has a menu item to revoke and remove stale sessions on their settings page, it's on a per-item basis.  So if you have hundreds of stale sessions, the only way to remove them is to go through, one by one, and click the menu and select the "revoke" option.  Needless to say, this is terrible.

The less annoying solution is to just go straight into the database and delete them there.  You can just run something like:
DELETE FROM oc_authtoken WHERE last_activity < <whatever_timestamp>;
That might be ugly, but at least it doesn't take forever.

It's important to note that, in addition to being annoying, this is evidently also a performance problem.  From what I've read, it's the reason that authenticating to my Nextcloud instance had gotten absurdly slow.  The app responded fine once I was logged in, but the login process itself took forever.  It also seems to be the reason why my hosting provider's control panel has been showing I'm way over my allotted MySQL execution time.  After deleting all those stale sessions, not only is login nice and snappy again, but my MySQL usage dropped off a ledge.  Just look at this graph:


As you can see, January is a sea of red, and then it drops off to be comfortably under the limit after I deleted the old sessions.  The Nextcloud team really needs to fix this issue.