<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title><![CDATA[LinLog]]></title>
    <link>https://linlog.skepticats.com/</link>
    <description><![CDATA[Linux, Programming, and Computing in General]]></description>
    <lastBuildDate>2023-05-14T22:31:36+00:00</lastBuildDate>
    <managingEditor>pageer@skepticats.com (Peter Geer)</managingEditor>
    <language>en-US</language>
    <generator>https://lnblog.skepticats.com/?v=2.3.1</generator>
    <item>
      <title><![CDATA[Installing PHPStorm under WSL2]]></title>
      <link>https://linlog.skepticats.com/entries/2023/05/installing-phpstorm-under-wsl2.php</link>
      <description><![CDATA[<p>The other week I tried to install PHPStorm under WSL2.&nbsp; Because that's a thing you can do now (especially since Linux GUI apps now work in recent Windows 10 updates).&nbsp; The installation process itself was pretty simple.</p>
<ul>
<li>Download PHPStorm for Linux from JetBrains website.</li>
<li>Now extract the tarball and run the&nbsp;<code>bin/phpstorm.sh</code> script.</li>
<li>PHPStorm should start up.</li>
</ul>
<p>The next step is to configure your license.&nbsp; In my case, I was using a corporate license server.&nbsp; The issue with this is that you need to log into JetBrains' website using a special link to activate the license.&nbsp; Unfortunately:</p>
<ul>
<li>By default, WSL doesn't have a browser installed.</li>
<li>Firefox can't be installed because the default build uses a snap image, and WSL apparently doesn't support snap.</li>
<li>PHPStorm doesn't appear to be able to properly deal with activating via a Windows browser (I tried pointing it to the Windows Chrome executable and got an error page that points to a port on localhost).</li>
</ul>
<p>So how do we get around this?&nbsp; Well, we need to install a browser in WSL and configure PHPStorm to use it.&nbsp; So here's what we do:</p>
<ul>
<li>Skip the registration for now by starting a trial license.</li>
<li>Download the Vivaldi for Linux DEB package from Vivaldi's website.&nbsp; You could use a different browser, but I like Vivaldi and it offers a convenient DEB package, so I used that.</li>
<li>Install the Vivaldi DEB.&nbsp; WSL will be missing some packages, so you have to run&nbsp;<code>apt install --fix-broken</code> after installing it.</li>
<li>Go into the PHPStorm settings and configure your web browsers to include Vivaldi and set it as the default browser.</li>
<li>Go back to the registration dialog and try again.&nbsp; This time, PHPStorm should start up Vivaldi and direct you to the appropriate link.</li>
<li>Log into your JetBrains account and follow the instructions.&nbsp; The web-based portion should succeed and registration should complete when you click "activate" in PHPStorm again.</li>
</ul>
<p>There we go - PHPStorm is registered and works.&nbsp; Mildly annoying setup, but not actually that bad.</p>]]></description>
      <author><![CDATA[pageer@skepticats.com (Peter Geer)]]></author>
      <pubDate>Sat, 06 May 2023 22:20:57 +0000</pubDate>
      <category><![CDATA[Windows]]></category>
      <category><![CDATA[Ubuntu]]></category>
      <category><![CDATA[PHP]]></category>
      <category><![CDATA[Programming]]></category>
      <category><![CDATA[Linux]]></category>
      <category><![CDATA[WSL]]></category>
      <guid isPermalink="true">https://linlog.skepticats.com/entries/2023/05/installing-phpstorm-under-wsl2.php</guid>
      <comments>https://linlog.skepticats.com/entries/2023/05/06_1820/comments/</comments>
    </item>
    <item>
      <title><![CDATA[Nextcloud session annoyances]]></title>
      <link>https://linlog.skepticats.com/entries/2023/02/nextcloud-session-annoyances.php</link>
      <description><![CDATA[<p>This is a note to my future self about an annoyance with <a href="https://nextcloud.com">Nextcloud</a>.&nbsp; If you're not aware of it, Nextcloud is basically a fork of <a href="https://owncloud.com">ownCloud</a>, which is a "self-hosted cloud" platform.&nbsp; Basically, they both provide a bunch of cloud-based services, like file sync and share, calendar, contacts, and various other things.&nbsp; I <a href="https://linlog.skepticats.com/entries/2022/07/finally-switching-to-nextcloud.php">switched to Nextcloud</a> last year because ownCloud was lagging way behind in its support for newer PHP versions.</p>
<p>Anyway, I noticed a rather annoying issue where Nextcloud was leaving hundreds of stale auth tokens in the database.&nbsp; Apparently, I'm not the <a href="https://github.com/nextcloud/server/issues/8720">only</a> <a href="https://forum.cloudron.io/topic/7657/cleaning-up-old-sessions">person</a> this has happened to.</p>
<p>While Nextcloud has a menu item to revoke and remove stale sessions on their settings page, it's on a per-item basis.&nbsp; So if you have&nbsp;<em>hundreds</em> of stale sessions, the only way to remove them is to go through, one by one, and click the menu and select the "revoke" option.&nbsp; Needless to say, this is terrible.</p>
<p>The less annoying solution is to just go straight into the database and delete them there.&nbsp; You can just run something like:<br /><code>DELETE FROM oc_authtoken WHERE last_activity &lt; &lt;whatever_timestamp&gt;;</code><br />That might be ugly, but at least it doesn't take forever.</p>
<p>It's important to note that, in addition to being annoying, this is evidently also a performance problem.&nbsp; From what I've read, it's the reason that authenticating to my Nextcloud instance had gotten absurdly slow.&nbsp; The app responded fine once I was logged in, but the login process itself took <em>forever</em>.&nbsp; It also seems to be the reason why my hosting provider's control panel has been showing I'm&nbsp;<em>way</em> over my allotted MySQL execution time.&nbsp; After deleting all those stale sessions, not only is login nice and snappy again, but my MySQL usage dropped off a ledge.&nbsp; Just look at this graph:</p>
<p><a href="https://linlog.skepticats.com/entries/2023/02/25_1821/2023-02-21T17-16-56-020Z.png"><img src="https://linlog.skepticats.com/entries/2023/02/25_1821/2023-02-21T17-16-56-020Z-med.png" alt="2023-02-21T17-16-56-020Z-med.png" /></a></p>
<p>As you can see, January is a sea of red, and then it drops off to be comfortably under the limit after I deleted the old sessions.&nbsp; The Nextcloud team really needs to fix this issue.</p>]]></description>
      <author><![CDATA[pageer@skepticats.com (Peter Geer)]]></author>
      <pubDate>Sat, 25 Feb 2023 23:21:23 +0000</pubDate>
      <category><![CDATA[Software]]></category>
      <category><![CDATA[PHP]]></category>
      <category><![CDATA[Note to Self]]></category>
      <guid isPermalink="true">https://linlog.skepticats.com/entries/2023/02/nextcloud-session-annoyances.php</guid>
      <comments>https://linlog.skepticats.com/entries/2023/02/25_1821/comments/</comments>
    </item>
    <item>
      <title><![CDATA[Finally switching to NextCloud]]></title>
      <link>https://linlog.skepticats.com/entries/2022/07/finally-switching-to-nextcloud.php</link>
      <description><![CDATA[<p>It's the end of an era. (Cue overly dramatic music.)&nbsp; I've been using ownCloud as my personal file/caldav/carddav server for years.&nbsp; This week, I finally decided to switch to NextCloud.&nbsp; This is my story.</p>
<p>The thing is, I actually remember when NextCloud split from ownCloud.&nbsp; At the time, I was working on a (now-defunct) product that involved ownCloud.&nbsp; Basically, my company's core business at the time was data backup, so we had a lot of servers with big disks and were looking for a way to monetize that extra space.&nbsp; The idea at the time was to do that by integrating a "file sync and share" product into our offerings, and that product was a rebranded ownCloud Enterprise.&nbsp; Of course, the "file sync and share" space was already pretty crowded, so that product never gained much traction, but it did help me get more into ownCloud and the company even paid to send me to their user conference in Berlin, where I got to meet their team (who, at the time, seemed not-very-impressed with the whole "NextCloud" thing) and see some sites.&nbsp; So it was actually a great experience, even if the product didn't pan out.</p>
<p>Anyway, despite my affection for ownCloud, my motivation for this change was actually pretty simple and prosaic - I was upgrading my home server (that'll be another post), and I didn't want to downgrade shit.&nbsp; See, I actually run two ownCloud instances - one on my local network for accessing various media files, and another in my web hosting, for caldav/carddav and files that I want to be highly available.&nbsp; For my home instance, I was doing a fresh install of the latest Ubuntu MATE on&nbsp; a brand-new box.&nbsp; This shouldn't be an issue, except that MATE comes with PHP 8.1, but for some reason, ownCloud only supports PHP 7.4.</p>
<p>Yes, you heard that right - 7.4.&nbsp; That's the <em>newest</em> version that's officially supported.&nbsp; The last 7.x release.&nbsp; The one that's <a href="https://www.php.net/supported-versions.php">no longer actively supported</a> and has less than six months of security updates left.&nbsp; That one.&nbsp; That's what they <em>still</em> expect me to use.</p>
<p>For my previous home box, I believe I'd actually hacked up the source a bit to make it work (since I don't think I depended on anything that didn't work in 8.x), but week I was sick and I just didn't feel like it.&nbsp; Depending on a version that's about to lose security fixes is crazy anyway.&nbsp; So I figured I'd "upgrade" to NextCloud, since they actually recommend PHP 8.1.</p>
<p>For my home server, I just did a fresh install, which is fairly straight-forward.&nbsp; The only annoying part was the Apache configuration, and that was only annoying because I was running NextCloud on a non-standard port and forgot to add a "Listen" directive. 🤦&zwj;♂️ For this instance, there was no real need to do any migration, because the only data I had in there was the (very small) list of users - the rest was just files, which can be trivially re-indexed.</p>
<p>Upgrading the instance on my web hosting was another story.&nbsp; Since that had my carddav and caldav data, I really did need to migrate that.&nbsp; I was also already several versions behind on my updates - it was running ownCloud 10.3, whereas 10.8 was current.&nbsp; However, this turned out to be a blessing in disguise.</p>
<p>You see, NextCloud includes support for migrating from an ownCloud instance.&nbsp; The thing is, they only support <em>specific</em> migrations.&nbsp; In my case, the relevant case was that you can migrate from <em>exactly</em> ownCloud 10.5 to NextCloud 20.&nbsp; Sadly, it took me a couple of tries to realize that the version <a href="https://docs.nextcloud.com/server/latest/admin_manual/maintenance/migrating_owncloud.html">migration matrix</a> are exact, so there was no path to directly migrate from ownCloud 10.3 to NextCloud.&nbsp; So I had to use the auto-updater to update ownCloud 10.3 to 10.4, and then manually update ownCloud 10.4 to 10.5 (because the auto-updater wanted to go all the way to 10.8).&nbsp; <em>Then</em> I could follow the migration process and manually update to NextCloud 20.&nbsp; From there, I was able to use the NextCloud auto-updater <em>four times</em> to upgrade to the current version.</p>
<p>So the upgrade process was...tedious.&nbsp; Not really "hard", but definitely tedious.&nbsp; The directions are pretty clear and simple, it's just a lot of steps to get to a current version of NextCloud.&nbsp; But at least none of the steps were particularly complicated or prone to error.&nbsp; As data migrations go, it could be much worse.&nbsp; And the best part is that it maintained URLs and credentials, so I didn't even have to reconfigure my caldav/carddav clients.</p>
<p>As far as NextCloud itself goes, it seems...pretty much like ownCloud, but nicer.&nbsp; They've made the UI prettier (both for the web interface and the client app), added a nice dashboard landing page, and made some other cosmetic improvements.&nbsp; They also seem to have a wider range of installable apps, which is nice.&nbsp; I haven't had all that long to play with it yet, but so far it seems like a distinct upgrade.</p>]]></description>
      <author><![CDATA[pageer@skepticats.com (Peter Geer)]]></author>
      <pubDate>Sat, 23 Jul 2022 15:16:04 +0000</pubDate>
      <category><![CDATA[Software]]></category>
      <category><![CDATA[PHP]]></category>
      <category><![CDATA[Free Software]]></category>
      <guid isPermalink="true">https://linlog.skepticats.com/entries/2022/07/finally-switching-to-nextcloud.php</guid>
      <comments>https://linlog.skepticats.com/entries/2022/07/23_1116/comments/</comments>
    </item>
    <item>
      <title><![CDATA[Docblocks in Vim]]></title>
      <link>https://linlog.skepticats.com/entries/2021/07/docblocks-in-vim.php</link>
      <description><![CDATA[<p>As you may or may not know, I've become an avid Vim user.&nbsp; I use it for work and home, having given up on PHPStorm a couple of years ago.</p>
<p>But one of the things that PHPStorm did automatically, which was quite handy, was to add PHPDoc comments to functions automatically.&nbsp; This is kinda nice because, let's face it, unless you're writing a long description, most of a docblock is just typing.&nbsp; You duplicate the parameters and return signature and, if the names and types are pretty obvious (which they should be), then there's not really much to say.&nbsp; But having them is part of the coding standard, so you can't just skip them, even though they don't add much.</p>
<p>Fortunately, Vim has a plugin for that, known as PDV.&nbsp; It will read the declaration of a function (or class, or a few other things) and auto-generate a docblock for you.&nbsp; This is nice, but the extension was a little out of date - it hadn't been updated to support return type annotations.&nbsp; There was a pending pull request to add that, but it hadn't been merged.&nbsp; I'm not sure why - apparently that repo is dead.</p>
<p>So I decided to just <a href="https://github.com/pageer/pdv">create my own fork</a> and merge the outstanding pull requests.&nbsp; Now I have a version that supports modern type annotations, which is nice.&nbsp; While I was at it, I also added an alternative set of templates for <a href="https://www.naturaldocs.org">NaturalDocs</a> doc comments.&nbsp; I use NaturalDocs for <a href="https://lnblog.skepticats.com/">LnBlog</a>, so I figured it would be nice to be able to auto-generate my docblocks there too.&nbsp; All I needed to do as add a line to my <a href="https://github.com/joonty/vim-sauce">Sauce</a> config to change the PDV template path.</p>]]></description>
      <author><![CDATA[pageer@skepticats.com (Peter Geer)]]></author>
      <pubDate>Mon, 12 Jul 2021 01:38:40 +0000</pubDate>
      <category><![CDATA[PHP]]></category>
      <category><![CDATA[Programming]]></category>
      <category><![CDATA[Vim]]></category>
      <guid isPermalink="true">https://linlog.skepticats.com/entries/2021/07/docblocks-in-vim.php</guid>
      <comments>https://linlog.skepticats.com/entries/2021/07/11_2138/comments/</comments>
    </item>
    <item>
      <title><![CDATA[PHP documentation and sockets]]></title>
      <link>https://linlog.skepticats.com/entries/2021/06/php-documentation-and-sockets.php</link>
      <description><![CDATA[<p>PHP's documentation gets way too much credit.&nbsp; I often hear people rave about how great it is.&nbsp; Many of them are newbies, but I hear the same thing from experienced developers who've been writing PHP code for years.</p>
<p>Well, they're wrong.&nbsp; PHP's documentation sucks.&nbsp; And if you disagree, you're just plain <em>wrong</em>.</p>
<p>Actually, let me add some nuance to that.&nbsp; It's not that the documentation sucks <em>per se</em>, it's that it sucks <em>as documentation</em>.&nbsp;</p>
<p>You see, a lot of PHP's documentation is written with an eye to beginners.&nbsp; It has lots of examples and it actually does a very good job of showing you what's available and giving you a general idea of how to use it.&nbsp; So in terms of a tutorial on how to use the language, the documentation is actually quite <em>good</em>.</p>
<p>The problem is that, sometimes, you don't need a tutorial.&nbsp; You need <em>actual documentation</em>.&nbsp; By that, I mean that sometimes you care less about the generalities and more about the particulars.&nbsp; For instance, you might want to know <em>exactly</em> what a function returns in specific circumstances, or <em>exactly</em> what the behavior is when you pass a particular argument.&nbsp; Software is about details, and these details <em>matter</em>.&nbsp; However, PHP frequently elides these details in favor of a more tutorial-like format.&nbsp; And while that might pass muster for a rookie developer, it's decidedly <em>not</em> OK from the perspective of a seasoned professional.</p>
<p>Case in point: <a href="https://www.php.net/manual/en/function.socket-read.php">the socket_read() function</a>.&nbsp; I had to deal with this function the other day.&nbsp; The documentation page is rather short and I was less than pleased with what I found on it.&nbsp;</p>
<p>By way of context, I was trying to talk to the OpenVPN management console, which runs on a UNIX domain socket.&nbsp; We had a small class (lifted from another project) that basically provided a nice facade over the socket communication functions.&nbsp; I'd noticed that, for some reason, the socket communication was slow.&nbsp; And I mean <em>really</em> slow.&nbsp; Like, a couple of seconds <em>per call</em> slow.&nbsp; Remember, this is not a network call - this is to a domain socket on the same box.&nbsp; It might not be the <em>fastest</em> way to do <abbr title="Inter-Process Communication">IPC</abbr>, but it should still be reasonably quick.</p>
<p>So I did some experimentation.&nbsp; Nothing fancy - just injecting some <code>microtime()</code> and <code>var_dump()</code> calls to get a general idea of how long things were taking.&nbsp; Turns out that's all I needed.&nbsp; It quickly became obvious that each call to the method to read from the was taking about 1 second, which is completely absurd.</p>
<p>For context, the code in that method was doing something like this (simplified for illustration):</p>
<p><code>$timeoutTime = time() + 30;<br />$message = '';<br />while (time() &lt; $timeoutTime) {<br />&nbsp;&nbsp;&nbsp; $character = socket_read($this-&gt;socket, 1);<br />&nbsp;&nbsp;&nbsp; if ($character === '' || $character === false) {<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; break;&nbsp; // We're done reading<br />&nbsp;&nbsp;&nbsp; }<br />&nbsp;&nbsp;&nbsp; $message .= $character;<br />}</code><code></code><code></code></p>
<p>Looks reasonable, right?&nbsp; After all, the documentation says that <code>socket_read()</code> will return the number of characters requested (in this case one), or false on error, or the empty string if there's no more data.&nbsp; So this seems like it should work just fine.&nbsp;</p>
<p>Well...not so much.</p>
<p>The problem is with the last read.&nbsp; It turns out that the documentation is wrong - <code>socket_read()</code> <em>doesn't</em> return the empty string when there's no more data.&nbsp; In fact, I couldn't get it to return an empty string <em>ever</em>.&nbsp; What actually happens is that it goes along happily until it exhausts the available data, and then it waits for more data.&nbsp; So the last call just hangs until it reaches a timeout that's set on the connection (in our case, it was configured to 1 second) and then returns false.</p>
<p>So because we were relying on that "empty string on empty buffer" behavior to detect the end of input, calling that method <em>always</em> resulted in a one-second hang.&nbsp; This was fairly easily fixed by just reading the data in much larger chunks and checking how much was actually returned to determine if we needed another read call.&nbsp; But that's not the point.&nbsp; The point is that we relied on what was in the documentation, and it was just totally wrong!</p>
<p>And it's not like this is the first time I've been bitten by the PHP docs.&nbsp; Historically, PHP has been very bad about documenting edge cases.&nbsp; For example, what happens if a particular parameter is null?&nbsp; What's the exact behavior if the parameters do not match the expected preconditions?&nbsp; Or what about that "flags" parameter that a bunch of functions take?&nbsp; Sometimes the available flags are well documented, but sometimes it's just an opaque one-line description that doesn't really tell you what the flag <em>actually does</em>.&nbsp; It's a crap shoot.</p>
<p>To be fair, the PHP documentation is not the worst I've ever seen.&nbsp; Not even close.&nbsp; And it really is very good about providing helpful examples.&nbsp; It's just that it errs on the side of being light on details, and <a title="GOTO 2020 talk by Kevlin Henney" href="https://youtu.be/kX0prJklhUE"><em>software is details</em></a>.</p>]]></description>
      <author><![CDATA[pageer@skepticats.com (Peter Geer)]]></author>
      <pubDate>Sat, 05 Jun 2021 22:20:24 +0000</pubDate>
      <category><![CDATA[PHP]]></category>
      <category><![CDATA[Software Engineering]]></category>
      <guid isPermalink="true">https://linlog.skepticats.com/entries/2021/06/php-documentation-and-sockets.php</guid>
      <comments>https://linlog.skepticats.com/entries/2021/06/05_1820/comments/</comments>
    </item>
    <item>
      <title><![CDATA[Composer autoload annoyances]]></title>
      <link>https://linlog.skepticats.com/entries/2021/02/composer-autoload-annoyances.php</link>
      <description><![CDATA[<p>Note to self: Sometimes you need to run <code>composer du -o</code> to make things work.</p>
<p>I'm not entirely sure why.&nbsp; But it's a pain in the butt.</p>
<p>This has come up a couple of times in working on one of our projects for work.&nbsp; This particular one is an internal command-line application written in PHP and using the command-line components from the Symfony framework.&nbsp; I won't get into the details, but the point is that it glues together the various steps required to configure and run certain things on Linux-based servers.&nbsp; So to test it, I have to put the code on my test server and make sure it works there.</p>
<p>The problem that I ran into today was that I tried to add a new command class to the application and it barfed all over itself.&nbsp; The Symfony <abbr title="Dependency Injection">DI</abbr> container complained that it couldn't find a certain class name in a certain file. The PSR-4 autoloader requires that the class name and namespace match the filesystem path, so usually this indicates a typo in one of those.&nbsp; But in this case, everything was fine.&nbsp; The app worked fine on my laptop, and if I deleted the new command, it worked again.</p>
<p>Well, it turns out that running <code>composer du -o</code> fixed it.&nbsp; I suspect, based on the <a href="https://getcomposer.org/doc/articles/autoloader-optimization.md">composer documentation</a>, that the issue was that the class map was being cached by the opcache.&nbsp; The Symfony cache was empty, so that's about the only one left.&nbsp; Unfortunately, this is pretty opaque when it comes to trouble-shooting.&nbsp; I would have expected it to fall back to reading the filesystem, but apparently there's more to it than that.&nbsp; Perhaps it's related to how Symfony collects the commands - I haven't put in the time to investigate it.</p>
<p>But in any case, that's something to look out for.&nbsp; Gotta love those weird errors that give you absolutely no indication of the solution.</p>]]></description>
      <author><![CDATA[pageer@skepticats.com (Peter Geer)]]></author>
      <pubDate>Wed, 10 Feb 2021 00:21:12 +0000</pubDate>
      <category><![CDATA[PHP]]></category>
      <category><![CDATA[Note to Self]]></category>
      <guid isPermalink="true">https://linlog.skepticats.com/entries/2021/02/composer-autoload-annoyances.php</guid>
      <comments>https://linlog.skepticats.com/entries/2021/02/09_1921/comments/</comments>
    </item>
    <item>
      <title><![CDATA[Stupid PHP serialization]]></title>
      <link>https://linlog.skepticats.com/entries/2020/10/Stupid_PHP_serialization.php</link>
      <description><![CDATA[<p><em><strong>Author's note:</strong> Here's another old article that I mostly finished, but never published - I'm not sure why.&nbsp; This one is from way back on August 23, 2013.</em></p>
<p><em>I was working for deviantART.com at the time.&nbsp; That was a very interesting experience.&nbsp; For one, it was y first time working 100% remote, and with a 100% distributed team, no less.&nbsp; We had people in eastern and western Europe, and from the east coast to the west coast of the US.&nbsp; It was also a big, high-traffic site.&nbsp; I mean, not Google or Facebook, but I believe it was in the top 100 sites on the web at the time, according to Alexa or Quantcast (depending on how you count "top 100").</em></p>
<p><em>It also had a lot of custom tech.&nbsp; My experience up to that point had mostly been with fairly vanilla stuff, stitching together a bunch of off-the-shelf components.&nbsp; But deviantART was old enough and big enough that a lot of the off-the-shelf tools we would use today weren't wide-spread yet, so they had to roll their own.&nbsp; For instance, they had a system to do <a href="https://www.php.net/manual/en/language.oop5.traits.php">traits in PHP</a> before that was actually a feature of PHP (it involved code generation, in case you were wondering).</em></p>
<p><em>Today's post from the archives is about my first dealings with one such custom tool.&nbsp; It nicely illustrates one of the pitfalls of custom tooling - it's usually not well documented, so spotting and resolving issues with it isn't always straight-forward.&nbsp; This was a case of finding that out the hard way.&nbsp; Enjoy!</em></p>
<hr />
<p>Lesson learned this week: object serialization in PHP uses more space than you think.</p>
<p>I had a fun problem recently. &nbsp;And by "fun", I mean "WTF?!?" &nbsp;</p>
<p>I got assigned a data migration task at work last week. &nbsp;It wasn't a particularly big deal - we had two locations where user's names were being stored and my task was to consolidate them. &nbsp;I'd already updated the UI code to read and save the values, so it was just a matter of running a data migration job.</p>
<p>Now, at <a href="http://www.deviantart.com/">dA</a> we have this handy tool that we call Distributor. &nbsp;We use it mainly for data migration and database cleanup tasks. &nbsp;Basically, it just crawls all the rows of a database table in chunks and passes the rows through a PHP function. &nbsp;We have many tables that contain tens or hundreds of millions of rows, so it's important that data migrations can be done gradually - trying to do it all at once would hammer the database too hard and cause problems. &nbsp;Distributor allows us to set how big each chunk is and configure the frequency and concurrency level of chunk processing.</p>
<p>There's three parts to Distributor that come into play here: the distributor runner (i.e. the tool itself), the "recipe" which determines what table to crawl, and the "job" function which performs the actual migration/cleanup/whatever. &nbsp;We seldom have to think about the runner, since it's pretty stable, so I was concentrating on the recipe and the job.</p>
<p>Well, things were going well. &nbsp;I wrote my recipe, which scanned the old_name table and returned rows containing the userid and the name itself. &nbsp;And then I wrote my migration job, which updated the new_name table. &nbsp;(I'm fabricating those table names, in case you didn't already figure that out.) &nbsp;Distributor includes a "counter" feature that allows us to trigger logging messages in jobs and totals up the number of times they're triggered. &nbsp;We typically make liberal use of these, logging all possible code paths, as it makes debugging easier.It seemed pretty straight-forward and it passed through code review without any complaints.</p>
<p>So I ran my distributor job. &nbsp;The old_name table had about 22 million rows in it, so at a moderate chunk size of 200 rows, I figured it would take a couple of days. &nbsp;When I checked back a day or two later, the distributor runner was reporting that the job was only 4% complete. &nbsp;But when I looked at my logging counters, they reported that the job had processed 28 million rows. &nbsp;WTF?!? &nbsp;</p>
<p>Needless to say, head-scratching, testing, and debugging followed. &nbsp;The short version is that the runner was restarting the job at random. &nbsp;Of course, that doesn't reset the counters, so I'd actually processed 28 million rows, but most of them were repeats. &nbsp;</p>
<p>So why was the runner resetting itself? &nbsp;Well, I traced that back to a database column that's used by the runner. &nbsp;It turns out that the current status of a job, including the last chunk of data returned by the crawler recipe, is stored in the database as a serialized string. &nbsp;The reason the crawler was restarting was because PHP's unserialize() function was erroring out when trying to deserialize that string. &nbsp;And it seems that the reason it was failing was that the string was being truncated - it was overflowing the database column!</p>
<p>The source of the problem appeared to be the fact that my crawler recipe was returning both a userid and the actual name. &nbsp;You see, we typically write crawlers to just return a list of numeric IDs. &nbsp;We can look up the other stuff st run-time. &nbsp;Well, that extra data was just enough to overflow the column on certain records. &nbsp;That's what I get for trying to save a database lookup!</p>]]></description>
      <author><![CDATA[pageer@skepticats.com (Peter Geer)]]></author>
      <pubDate>Sat, 10 Oct 2020 22:46:17 +0000</pubDate>
      <category><![CDATA[PHP]]></category>
      <category><![CDATA[Software]]></category>
      <category><![CDATA[From the Archives]]></category>
      <guid isPermalink="true">https://linlog.skepticats.com/entries/2020/10/Stupid_PHP_serialization.php</guid>
      <comments>https://linlog.skepticats.com/entries/2020/10/10_1846/comments/</comments>
    </item>
    <item>
      <title><![CDATA[LnBlog Refactoring Step 3: Uploads and drafts]]></title>
      <link>https://linlog.skepticats.com/entries/2019/08/LnBlog_Refactoring_Step_3_Uploads_and_drafts.php</link>
      <description><![CDATA[<p>It's time for the third, slightly shorter, installment of my ongoing series on refactoring my blogging software.&nbsp; In the <a href="../../entries/2017/10/LnBlog_Refactoring_Step_1_Publishing.php">first part</a>, I discussed reworking how post publication was done and in the second part I talked about reworking things to <a href="../../entries/2019/06/LnBlog_Refactoring_Step_2_Adding_Webmention_Support.php">add Webmention support</a>.&nbsp; This time, we're going to talk about two mini-projects to improve the UI for editing posts.</p>
<p>This improvement is, I'm slightly sad to say, pretty boring.&nbsp; It basically involves fixing a "bug" that's really an artifact of some very old design choices.&nbsp; These choices led to the existing implementation behaving in unexpected ways when the workflow changed.</p>
<h3>The Problem</h3>
<p>Originally LnBlog was pretty basic and written almost entirely in HTML and PHP, i.e. there was no JavaScript to speak of.&nbsp; You wrote posts either in raw HTML in a text area box, using "auto-markup", which just automatically linkified things, or using "LBCode", which is my own bastardized version of the BBCode markup that used to be popular on web forums.&nbsp; I had implemented some plugins to support WYSIWYG post editors, but I didn't really use them and they didn't get much love.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="The old LnBlog post editor" src="https://linlog.skepticats.com/entries/2019/08/04_1430/before-small.png" alt="The old LnBlog post editor" width="640" height="353" /></p>
<p>Well, I eventually got tired of writing in LBCode and switched to composing all my posts using the TinyMCE plugin.&nbsp; That is now the standard way to compose your posts in LnBlog.&nbsp; The problem is that the existing workflow wasn't really designed for WYSIWYG composition.</p>
<p>In the old model, the idea was that you could compose your entire post on the entry editing page, hit "publish", and it would all be submitted to the server in one go.&nbsp; There's also a "review" button which renders your post as it would appear when published and a "save draft" button to save your work for later.&nbsp; These also assume that submitting the post is an all-or-nothing operation.&nbsp; So if you got part way done with your post and decided you didn't like it, you could just leave the page and nothing would be saved to the server.</p>
<p>At this point it is also worth noting how LnBlog stores its data.&nbsp; Everything is file-based and entries are self-contained.&nbsp; That means that each entry has a directory and that directory contains all the post data, comments, and uploaded files that are belong to that entry.</p>
<p>What's the problem with this?&nbsp; Well, to have meaningful WYSIWYG editing, you need to be able to do things like upload a file and then be able to see it in the post editor.&nbsp; In the old workflow, you'd have to write your post, insert an image tag with the file name of your picture (which would not render), add your picture as an upload, save the entry (either by saving the draft or using the "preview", which would have trigger a save if you had uploads), and then go back to editing your post.&nbsp; This was an unacceptably workflow clunky.</p>
<p>On top of this, there was a further problem.&nbsp; Even after you previewed your post, it&nbsp;<em>still</em> wouldn't render correctly in the WYSIWYG editor.&nbsp; That's because the relative URLs were inconsistent.&nbsp; The uploaded files got stored in a special, segregated draft directory, but the post editor page itself was not relative to that directory, so TinyMCE didn't have the right path to render it.&nbsp; And you can't use an absolute URL because the URL will change after the post is published.</p>
<p>So there were two semi-related tasks to fix this.&nbsp; The first was to introduce a better upload mechanism.&nbsp; The old one was just a regular&nbsp;<code>&lt;input type="file"&gt;</code> box, which worked but wasn't especially user-friendly.&nbsp; The second one was to fix things such that TinyMCE could consistently render the correct URL for any files we uploaded.</p>
<h3>The solution - Design</h3>
<p>The actual solution to this problem was not so much in the code as it was in changing the design.&nbsp;&nbsp;The first part was simple: fix the clunky old upload process by introducing a more modern JavaScript widget to do the uploads.&nbsp; So after looking at some alternatives, I decided to implement <a href="http://www.dropzonejs.com/">Dropzone.js</a> as the standard upload mechanism.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" title="The new, more modern LnBlog post editor." src="https://linlog.skepticats.com/entries/2019/08/04_1430/after-small.png" alt="The new, more modern LnBlog post editor." width="523" height="480" /></p>
<p>The second part involved changing the workflow for writing and publishing posts.&nbsp; The result was a somewhat simpler and more consistent workflow that reduces the number of branches in the code.&nbsp; In the old workflow, you had the following possible cases when submitting a post to the server:</p>
<ol>
<li>New post being published (nothing saved yet).</li>
<li>New post being saved as a draft (nothing saved yet).</li>
<li>Existing draft post being published.</li>
<li>Existing draft post being saved.</li>
<li>New (not yet saved) post being&nbsp;<em>previewed</em> with attached files.</li>
<li>Existing draft post being&nbsp;<em>previewed</em> with attached files.</li>
</ol>
<p>This is kind of a lot of cases.&nbsp; Too many, in fact.&nbsp; Publishing and saving were slightly different depending on whether or not the entry already existed, and then there were the preview cases.&nbsp; These were necessary because extra processing was required when an entry was previewed with new attachments because, well, if you attached an image, you'd want to see it.&nbsp; So this complexity was a minor problem in and of itself.</p>
<p>So the solution was to change the workflow such that all of these are no longer special cases.&nbsp; I did this by simply issuing the decree that&nbsp;<em>all draft entries shall always already exist.</em>&nbsp; In other words, just create a new draft when we first open the new post editor.&nbsp; This does two things for us:</p>
<ol>
<li>It allows us to solve the "relative URL" problem because now we can make the draft editing URL always relative to the draft storage directory.</li>
<li>It eliminates some of those special cases.&nbsp; If the draft always exists, then "publish new post" and "publish existing draft" are effectively the same operation.&nbsp; When combined with the modern upload widget, this also eliminates the need for the special "preview" cases.</li>
</ol>
<h3>The implementation - Results</h3>
<p>I won't get into the actual implementation details of these tasks because, frankly, they're not very interesting.&nbsp; There aren't any good lessons or generalizations to take from the code - it's mostly just adapting the ideosyncratic stuff that was already there.</p>
<p>The implementation was also small and went fairly smoothly.&nbsp; The upload widget was actually the hard part - there were a bunch of minor issues in the process of integrating that.&nbsp; There were some issues with the other part as well, but less serious.&nbsp; Much of it was just integration issues that weren't necessarily expected and would have been hard to foresee.&nbsp; You know, the kind of thing you expect from legacy code.&nbsp; Here's some stats from Process Dashboard:</p>
<table style="border-collapse: collapse; width: 73.287%; height: 86px; margin: 0px auto;" border="1">
<tbody>
<tr style="height: 18px;">
<td style="width: 49.1717%; height: 18px;">Project</td>
<td style="width: 19.8147%; height: 18px;">File Upload</td>
<td style="width: 70.163%; height: 18px;">Draft always exists</td>
</tr>
<tr>
<td style="width: 49.1717%;">Hours to complete (planned):</td>
<td style="width: 19.8147%; text-align: right;">4:13</td>
<td style="width: 70.163%; text-align: right;">3:00</td>
</tr>
<tr style="height: 18px;">
<td style="width: 49.1717%; height: 18px;">Hours to complete (actual):</td>
<td style="width: 19.8147%; height: 18px; text-align: right;">7:49</td>
<td style="width: 70.163%; height: 18px; text-align: right;">5:23</td>
</tr>
<tr>
<td style="width: 49.1717%;">LOC changed/added (planned):</td>
<td style="width: 19.8147%; text-align: right;">210</td>
<td style="width: 70.163%; text-align: right;">135</td>
</tr>
<tr style="height: 18px;">
<td style="width: 49.1717%; height: 18px;">LOC changed/added (actual):</td>
<td style="width: 19.8147%; height: 18px; text-align: right;">141</td>
<td style="width: 70.163%; height: 18px; text-align: right;">182</td>
</tr>
<tr style="height: 18px;">
<td style="width: 49.1717%; height: 18px;">Defects/KLOC (found in test):</td>
<td style="width: 19.8147%; height: 18px; text-align: right;">42.6</td>
<td style="width: 70.163%; height: 18px; text-align: right;">27.5</td>
</tr>
<tr style="height: 14px;">
<td style="width: 49.1717%; height: 14px;">Defects/KLOC (total):</td>
<td style="width: 19.8147%; height: 14px; text-align: right;">81.5</td>
<td style="width: 70.163%; height: 14px; text-align: right;">44.0</td>
</tr>
</tbody>
</table>
<p>As you can see, my estimates here were not great.&nbsp; The upload part involved more trial and error with Dropzone.js than I had expected and ended up with more bugs.&nbsp; The draft workflow change went better, but I ended up spending more time on the design than I initially anticipated.&nbsp; However, these tasks both had a lot of unknowns, so I didn't really expect the estimates to be that accurate.</p>
<h3>Take Away</h3>
<p>The interesting thing about this project was not so much&nbsp;<em>what</em> needed to be done but&nbsp;<em>why</em> it needed to be done.&nbsp;</p>
<p>Editing posts is obvious a fundamental function of a blog, and it's one that I originally wrote way back in 2005.&nbsp; It's worth remembering that the web was a very different place back then.&nbsp; Internet Explorer was still the leading web browser; PHP 5 was still brand new; it wasn't yet considered "safe" to just use JavaScript for everything (because, hey, people might not have JavaScript enabled); internet speeds were still pretty slow; and browsing on mobile devices was just starting to become feasible.&nbsp; In that world, a lot of the design decisions I made at the time seemed pretty reasonable.</p>
<p>But, of course, the web evolved.&nbsp; The modern web makes it much easier for the file upload workflow to be asynchronous, which offers a much nicer user experience.&nbsp; By ditching some of the biases and assumptions of the old post editor, I was more easily able to update the interface.</p>
<p>One of the interesting things to note here is that changing the post editing workflow was&nbsp;<em>easier</em> than the alternatives.&nbsp; Keeping the old workflow was by no means impossible.&nbsp; I kicked around several ideas that <em>didn't</em> involve changing it.&nbsp; However, most of those had other limitations or complications and I eventually decided that they would ultimately be more work.&nbsp;&nbsp;</p>
<p>This is something that comes up with some regularity when working with an older code-base.&nbsp; It often happens that the assumptions baked into the architecture don't age well as the world around the application progresses.&nbsp; Thus, when you need to finally "fix" that aspect of the app, you end up having to do a bit of cost-benefit analysis.&nbsp; Is it better to re-vamp this part of the application?&nbsp; Or should you shim in the new features in a kinda-hacky-but-it-works sort of way?</p>
<p>While as developers, our first instinct is usually to do the "real" fix and replace the old thing, the "correct" answer is seldom so straight-forward.&nbsp; In this case, the "real" fix was relatively small and straight-forward.&nbsp; But in other cases, the old assumptions are smeared through the entire application and trying to remove them becomes a nightmare.&nbsp; It might take weeks or months to make a relatively simple change, and then weeks or months&nbsp;<em>after</em> that to deal with all the unforeseen fallout of that change.&nbsp; Is that worth the effort?&nbsp; It probably depends on what the "real" fix buys you.</p>
<p>I had a project at work once that was a great example of that.&nbsp; On the surface, the request was a simple "I want to be able to update this field", where the field in question was data that was generally&nbsp;<em>but not necessarily</em> static. In most systems, this would be as simple as adding a UI to edit that field and having it update the datastore.&nbsp; But in this case, that field was used internally as the unique identifier <em>and</em> was used that way across a number of different systems.&nbsp; So this assumption was <em>everywhere</em>.&nbsp; Everybody knew this was a terrible design, but it had been that way for a decade and was such a huge pain to fix that we had been putting it off for years.&nbsp; When we finally bit the bullet and did it right, unraveling the baked-in assumptions about this piece of data took an entire team over a month.&nbsp; At an <em>extremely conservative estimate</em>, that's well over $25,000 to fix "make this field updatable".&nbsp; That's a pretty hefty price tag for something that seems so trivial.</p>
<p>The point is, old applications tend to have lots of weird, esoteric design decisions and implementation-specific issues that constrain them.&nbsp; Sometimes removing these constraints is simple and straight-forward.&nbsp; Sometimes it's not.&nbsp; And without full context, it's often hard to tell when one it will be.&nbsp; So whenever possible, try to have pity on the future maintenance programmer who will be working on your system and anticipate those kind of issues.&nbsp; After all, that programmer might be you.</p>]]></description>
      <author><![CDATA[pageer@skepticats.com (Peter Geer)]]></author>
      <pubDate>Sun, 04 Aug 2019 18:30:33 +0000</pubDate>
      <category><![CDATA[Software]]></category>
      <category><![CDATA[Software Engineering]]></category>
      <category><![CDATA[PHP]]></category>
      <category><![CDATA[Web]]></category>
      <category><![CDATA[Blogging]]></category>
      <category><![CDATA[JavaScript]]></category>
      <guid isPermalink="true">https://linlog.skepticats.com/entries/2019/08/LnBlog_Refactoring_Step_3_Uploads_and_drafts.php</guid>
      <comments>https://linlog.skepticats.com/entries/2019/08/04_1430/comments/</comments>
    </item>
    <item>
      <title><![CDATA[LnBlog Refactoring Step 2: Adding Webmention Support]]></title>
      <link>https://linlog.skepticats.com/entries/2019/06/LnBlog_Refactoring_Step_2_Adding_Webmention_Support.php</link>
      <description><![CDATA[<p>About a year and a half ago, I wrote an entry about the <a href="https://linlog.skepticats.com/entries/2017/10/LnBlog_Refactoring_Step_1_Publishing.php">first step in my refactoring of LnBlog</a>.&nbsp; Well, that's still a thing that I work on from time to time, so I thought I might as well write a post on the latest round of changes.&nbsp; As you've probably figured out, progress on this particular project is, of necessity, slow and extremely irregular, but that's an interesting challenge in and of itself.</p>
<h3>Feature Addition: Webmention</h3>
<p>For this second step, I didn't so much refactor as add a feature.&nbsp; This particular feature has been on my list for a while and I figured it was finally time to implement it.&nbsp; That feature is webmention support.&nbsp; This is the newer generation of blog notification, similar to Trackback (which I don't think anyone uses anymore) and Pingback.&nbsp; So, basically, it's just a way of notifying another blog that you linked to them and vice versa.&nbsp; LnBlog already supported the two older versions, so I thought it made sense to add the new one.</p>
<p>One of the nice things about Webmention is that it actually has a <a href="https://www.w3.org/TR/2017/REC-webmention-20170112/">formal specification that's published as a W3C recommendation</a>.&nbsp; So unlike some of the older "standards" that were around when I first implemented LnBlog, this one is actually official, well structured, and well thought out.&nbsp; So that makes things slightly easier.</p>
<p>Unlike the last post, I didn't follow any formal process or do any time tracking for this addition.&nbsp; In retrospect I kind of wish I had, but this work was very in and out in terms of consistency and I didn't think about tracking until it was too late to matter.&nbsp; Nevertheless, I'll try to break down some of my process and results.</p>
<h3>Step 1: Analysis</h3>
<p>The first step, naturally, was analyzing the work to be done, i.e. reading the spec.&nbsp; The webmention protocol isn't particularly complicated, but like all specification documents it looks much more so when you put all the edge cases and optional portions together.&nbsp;&nbsp;</p>
<p>I actually looked at the spec several times before deciding to actually implement it.&nbsp; Since my time for this project is limited and only available sporadically, I was a little intimidated by the unexpected length of the spec.&nbsp; When you have <em>maybe</em> an hour a day to work on a piece of code, it's difficult to get into the any kind of flow state, so large changes that require extended concentration are pretty much off the table.</p>
<p>So how do we address this?&nbsp; How do you build something when you don't have enough time to get the whole thing in your head at once?</p>
<h3>Step 2: Design</h3>
<p>Answer: you document it.&nbsp; You figure out a piece and write down what you figured out.&nbsp; Then the next time you're able to work on it, you can read that and pick up where you left off.&nbsp; Some people call this "design".</p>
<p>I ended up reading through the spec over several days and eventually putting together UML diagrams to help me understand the flow.&nbsp; There were two flows, sending and receiving, so I made one diagram for each, which spelled out the various validations and error conditions that were described in the spec.</p>
<p><a href="https://linlog.skepticats.com/entries/2019/06/15_1801/webmention.png"><img title="Workflow for sending webmentions" src="https://linlog.skepticats.com/entries/2019/06/15_1801/webmention-small.png" alt="Workflow for sending webmentions" /></a> <a href="https://linlog.skepticats.com/entries/2019/06/15_1801/webmention_001.png"><img title="Workflow for receiving webmentions" src="https://linlog.skepticats.com/entries/2019/06/15_1801/webmention_001-small.png" alt="Workflow for receiving webmentions" /></a></p>
<p>That was really all I needed as far as design for implementing the webmention protocol.&nbsp; It's pretty straight-forward and I made the diagrams detailed enough that I could work directly from them.&nbsp; The only real consideration left was where to fit the webmention implementation into the code.</p>
<p>My initial thought was to model a webmention as a new class, i.e. to have a Webmention class to complement the currently existing TrackBack and Pingback classes.&nbsp; In fact, this seemed like the obvious implementation given the code I was working with.&nbsp; However, when I started to look at it, it became clear that the only <em>real</em> difference between Pingbacks and Webmentions is the communication protocol.&nbsp; It's the same data and roughly the same workflow and use-case.&nbsp; It's just that Pingback goes over XML-RPC and Webmention uses plain-old HTTP form posting.&nbsp; It didn't really make sense to have a different object class for what is essentially the same thing, so I ended up just re-using the existing Pingback class and just adding a "webmention" flag for reference.</p>
<h3>Step 3: Implementation</h3>
<p>One of the nice things about having a clear spec is that it makes it really easy to do test-driven development because the spec practically writes half your test cases for you.&nbsp; Of course, there are always additional things to consider and test for, but it still makes things simpler.</p>
<p>The big challenge was really how to fit webmentions into the existing application structure.&nbsp; As I mentioned above, I'd already reached the conclusion that creating a new domain object for the was a waste of time.&nbsp; But what about the rest of it?&nbsp; Where should the logic for sending them go?&nbsp; Or receiving?&nbsp; And how should sending webmentions play with sending pingbacks?</p>
<p>The first point of reference was the pingback implementation.&nbsp; The old pingback implementation for sending pingbacks lived directly in the domain classes.&nbsp; So a blog entry would scan itself for links, create a pingback object for each, and then ask the <em>pingback</em> if its URI supported pingbacks, and then the <em>entry</em> would sent the pingback request.&nbsp; (Yes, this is confusing.&nbsp; No, I don't remember why I wrote it that way.)&nbsp; As for receiving pingbacks, that lived entirely in the XML-RPC endpoint.&nbsp; Obviously none of this was a good example to imitate.</p>
<p>The most obvious solution here was to encapsulate this stuff in its own class, so I created a SocialWebClient class to do that.&nbsp; Since pingback and webmention are so similar, it made sense to just have one class to handle both of them.&nbsp; After all, the only real difference in sending them was the message protocol.&nbsp; The SocialWebClient has a single method, <code>sendReplies()</code>, which takes an entry, scans its links and for each detects if the URI supports pingback or webmention and sends the appropriate one (or a webmention if it supports both).&nbsp; Similarly, I created a SocialWebServer class for receiving webmentions with an&nbsp;<code>addWebmention()</code> method that is called by an endpoint to save incoming mentions.&nbsp; I had originally hoped to roll the pingback implementation into that as well, but it was slightly inconvenient with the use of XML-RPC, so I ended up pushing that off until later.</p>
<h3>Results</h3>
<p>As I mentioned, I didn't track the amount of time I spent on this task.&nbsp; However, I <em>can</em> retroactively calculate how much code was involved.&nbsp; Here's the lines-of-code summary as reported by <a href="https://www.processdash.com/">Process Dashboard</a>:</p>
<table style="margin: 0 auto;" border="">
<tbody>
<tr>
<td>Base:&nbsp;</td>
<td>8057</td>
</tr>
<tr>
<td>Deleted:&nbsp;</td>
<td>216</td>
</tr>
<tr>
<td>Modified:&nbsp;</td>
<td>60</td>
</tr>
<tr>
<td>Added:&nbsp;</td>
<td>890</td>
</tr>
<tr>
<td>Added &amp; Modified:&nbsp;</td>
<td>950</td>
</tr>
<tr>
<td>Total:&nbsp;</td>
<td>8731</td>
</tr>
</tbody>
</table>
<p>For those who aren't familiar, the "base" value is the lines of code in the affected files before the changes, while the "total" it the total number of lines in affected files after the changes.&nbsp; The magic number here is "Added &amp; Modified", which is essentially the "new" code.&nbsp; So all in all, I wrote about a thousand lines for a net increase 700 lines.</p>
<p>Most of this was in the new files, as reported by Process Dashboard below.&nbsp; I'll spare you the 31 files that contained assorted lesser changes (many related to fixing unrelated issues) since none of them had more even 100 lines changed.&nbsp;&nbsp;</p>
<table style="width: 0px; height: 144px; margin: 0 auto;" border="">
<tbody>
<tr style="height: 18px;">
<th style="width: 373px; height: 18px;">Files Added:</th>
<th style="width: 44px; height: 18px;">Total</th>
</tr>
<tr style="height: 18px;">
<td style="width: 373px; height: 18px;">lib\EntryMapper.class.php</td>
<td style="width: 44px; height: 18px;">27</td>
</tr>
<tr style="height: 18px;">
<td style="width: 373px; height: 18px;">lib\HttpResponse.class.php</td>
<td style="width: 44px; height: 18px;">60</td>
</tr>
<tr style="height: 18px;">
<td style="width: 373px; height: 18px;">lib\SocialWebClient.class.php</td>
<td style="width: 44px; height: 18px;">237</td>
</tr>
<tr style="height: 18px;">
<td style="width: 373px; height: 18px;">lib\SocialWebServer.class.php</td>
<td style="width: 44px; height: 18px;">75</td>
</tr>
<tr style="height: 18px;">
<td style="width: 373px; height: 18px;">tests\unit\publisher\SocialWebNotificationTest.php</td>
<td style="width: 44px; height: 18px;">184</td>
</tr>
<tr style="height: 18px;">
<td style="width: 373px; height: 18px;">tests\unit\SocialWebServerTest.php</td>
<td style="width: 44px; height: 18px;">131</td>
</tr>
</tbody>
</table>
<p>It's helpful to note that of the 717 lines added here, slightly less than half (315 lines) is unit test code.&nbsp; Since I was trying to do test-driven development, this is to be expected - the rule of thumb is "write at least as much test code as production code".&nbsp; That leaves the meat of the implementation at around 400 lines.&nbsp; And of those 400 lines, most of it is actually refactoring.</p>
<p>As I noted above, the Pingback and Webmention protocols are quite similar, differing mostly in the transport protocol.&nbsp; The algorithms for sending and receiving them are practically identical.&nbsp; So most of that work was in generalizing the existing implementation to work for both Pingback and Webmention.&nbsp; This meant pulling things out into new classes and adjusting them to be easily testable.&nbsp; Not exciting stuff, but more work than you might think.</p>
<p>So the main take-away from this project was: don't underestimate how hard it can be to work with legacy code.&nbsp; Once I figured out that the implementation of Webmention would closely mirror what I already had for Pingback, this task&nbsp;<em>should</em> have been really short and simple.&nbsp; But 700 lines isn't really that short or simple.&nbsp; Bringing old code up to snuff can take a surprising amount of effort.&nbsp; But if you've worked on a large, brown-field code-base, you probably already know that.</p>]]></description>
      <author><![CDATA[pageer@skepticats.com (Peter Geer)]]></author>
      <pubDate>Sat, 15 Jun 2019 22:01:50 +0000</pubDate>
      <category><![CDATA[Software]]></category>
      <category><![CDATA[Software Engineering]]></category>
      <category><![CDATA[PHP]]></category>
      <category><![CDATA[Web]]></category>
      <category><![CDATA[Blogging]]></category>
      <guid isPermalink="true">https://linlog.skepticats.com/entries/2019/06/LnBlog_Refactoring_Step_2_Adding_Webmention_Support.php</guid>
      <comments>https://linlog.skepticats.com/entries/2019/06/15_1801/comments/</comments>
    </item>
    <item>
      <title><![CDATA[Global composer]]></title>
      <link>https://linlog.skepticats.com/entries/2019/06/Global_composer.php</link>
      <description><![CDATA[<p>Nice little trick I didn't realize existed: you can <a href="https://akrabat.com/global-installation-of-php-tools-with-composer/">install Composer packages globally</a>.</p>
<p>Apparently you can just do <code>composer global init ;</code> <code>composer global require phpunit/phpunit</code> and get PHPUnit installed in your home directory rather than in a project directory, where you can add it to your path and use it anywhere.&nbsp; It works just like with installing to a project - the init creates a composer.json and the require adds packages to it.&nbsp; On Linux, I believe this stuff gets stored under <code>~/.composer/</code>, whereas on Windows, they end up under <code>~\AppData\Roaming\Composer\</code>.</p>
<p>That's it.&nbsp; Nothing earth-shattering here.&nbsp; Just a handy little trick for things like code analyzers or other generic tools that you might not care about adding to your project's composer setup (maybe you only use them occasionally and have no need to integrate them into your <abbr title="Continuous Integration">CI</abbr> build).&nbsp; I didn't know about it, so I figured I'd pass it on.</p>]]></description>
      <author><![CDATA[pageer@skepticats.com (Peter Geer)]]></author>
      <pubDate>Wed, 12 Jun 2019 03:17:28 +0000</pubDate>
      <category><![CDATA[PHP]]></category>
      <category><![CDATA[Programming]]></category>
      <category><![CDATA[Tools]]></category>
      <guid isPermalink="true">https://linlog.skepticats.com/entries/2019/06/Global_composer.php</guid>
      <comments>https://linlog.skepticats.com/entries/2019/06/11_2317/comments/</comments>
    </item>
  </channel>
</rss>
