Random Python and WSGI tip

In light of the fact that Python 2.x loses support...now, I've been upgrading some of my personal projects that I haven't done much with recently to Python 3.  One of them is a little web-based comic book reader I call Linga (I don't remember why) that I run on my home network.

Since my home server runs Ubuntu 18.04 LTS, where Python 2.7 is the default, this has been a bit of a pain.  Before doing the conversion to Python 3.x, I had Linga set up and working using MySQL and Apache with mod_wsgi.  But once I pushed the code for the 3.x version, everything broke.  After digging in, there were several problems I needed to correct:

  1. The default mod_wsgi install uses Python 2.7, not 3.6 like I want.
  2. My WSGI script uses execfile(), which no longer exists in Python 3.x.
  3. SQLAlchemy can't load the MySQLdb module.

Luckily, the solutions to these problems weren't particularly hard to find or implement.  With a little Googling (well, DuckDuckGo-ing really, but that doesn't have the same ring to it) and a little experimentation, I found the following fixes:

  1. The version of Python to use is hard-coded into the Apache module.  So to use Python 3.6 I just needed to install the Python 3 WSGI module with apt install libapache2-mod-wsgi-py3.
  2. Apparently the recommended replacement for execfile("filename") is exec(open("filename").read()).
  3. It seems that Zen of Python's saying that "There should be one-- and preferably only one --obvious way to do it" breaks down when it comes to connecting to MySQL, as there are apparently several client libraries available.  The solution that worked for me was to pip install pymysql and run pymysql.install_as_MySQLdb() in my WSGI script.  Having to call a method to turn on MySQLdb compatibility is annoying and seems a little hacky, but it works and it's not really a big deal.

With that, Linga now works just fine.  Maybe this will save someone else having to Google the answers to these questions as well.

Up 79 million in 20 seconds

We had one of those weird experiences at work today. The kind where something strange happens, and nobody knows how or why, and when you look into it, it doesn't even make sense.

Basically, the auto-numbered ID field on the table we use for media items (videos, pictures, etc.) jump up noticably this morning. And by "noticably" I mean it went from about 1 million to about 80 million. And when we looked at the timestamps, the jump happened in about 20 seconds. The IDs created before 8:38:12 AM were in teh 1 million range, and the ones createdafter 8:38:32 AM were in the 80 million range.

So the obvious question is: how did this happen? It doesn't look like it was caused by actually adding 79 million records to the table. There were only about 800,000 records total, and no indication in our moderation logs of any mass deletions. We didn't get any indication of increased server load either. In fact, the only reason we even noticed it is because the media IDs are in our URLs. I kind of doubt a bot could have created 79 million new media items in 20 seconds without at least generating a Nagios warning. In fact, I doubt our master database server could handle 79 million writes in 20 seconds.

So what does that leave? User error? Nobody with access to the database server was even working at 8:30 in the morning. Random MySQL screw up? Maybe, though that's a really wierd random error. Something else? Who knows...?

After sniffing around the server and tossing out ideas for 30 or 40 minutes, we ultimately gave up. It's a little disquieting that we don't know what happened, but we really can't justify spending all that much time on this. It's a very weird problem, but nothing is broken and we all have more important things to worry about at the moment.

Edit: Found the cause. Apparently we were hacked.

Weird index breakage

Here's an odd one. So I was trying to run a simple query for a data import I'm working on. I've got about 11,000 rows in the media table with a site_id of 2 and I wanted to see which media types were represented. So I ran this:

SELECT DISTINCT media_type FROM media WHERE site_id = 2;

To my surprise, this returned...nothing! No rows at all. So I took off the DISTINCT, just to check, and got back all 11,000 rows. What the heck?

Something was obviously wrong. This should never happen - if there are rows with the correct site_id, that query should always return something. So what now? Might as well look at the query plan:

mysql> explain extended select distinct media_type from media where site_id = 2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: media
type: range
possible_keys: idx_media_site_media_type,idx_frontpage
key: idx_typesiteid
key_len: 2
ref: NULL
rows: 15
Extra: Using where; Using index for group-by
1 row in set, 1 warning (0.00 sec)

mysql> show warnings \G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select distinct `cms`.`media`.`media_type` AS `media_type` from `cms`.`media` where (`cms`.`media`.`site_id` = 2)
1 row in set (0.00 sec)

Nothing out of the ordinary there. It's using the idx_typesiteid index, which is correct. Maybe that index is broken. Let's try another one:

select distinct media_type from media use index (idx_media_site_media_type) where site_id = 2;

Yup, it worked that time - I got 5 rows back, which is about what I was expecting. I get the same results if I use
IGNORE INDEX (idx_typesiteid) as the index hint. So idx_typesiteid must just be broken. I should probably fix that, but it was getting into peak hours for our site and I didn't want to run an OPTIMIZE TABLE or anything that would lock the media table.

But wait, there's one other weird thing. Just for the heck of it, I tried running the query with a few different index hints. And guess what: the query still works if I use FORCE INDEX (idx_typesiteid). Does that make sense to anybody? So if I ignore that index it works, and if I force the same index, it works, but if I give no hint, it claims to use that index and the query returns no data. I got nothin' on that one.

MySQL comment weirdness

Here's one I discovered for the first time yesterday - MySQL doesn't actually use the ANSI SQL comment sequence. Who knew?

For those who are a little rusty on such things, ANSI SQL-92 specifies the comment character as a doule dash (--). Anything from a double dash to the end of a line is a comment.

Well, in MySQL, that doesn't actually work. You have to have a space (or control character) after the double dash for MySQL to treat is as a comment. This is good style anyway, but MySQL makes it a hard requirement. Apparently the reason for this is that it can cause conflicts with software that auto-generates really crappy queries.

Is it just me, or does that seem like a really gratuitous change? I mean, I can see their reasons - it would be nice if an auto-generated query like UPDATE account SET credit=credit--1 actually worked instead of reducing to a no-op. But, on the other hand, is it really a good idea to change the database server because some vendor is too lazy to make their query-generator produce the more correct and logically equivalent UPDATE account SET credit=credit+1 instead? I don't know. I'm sure they had their reasons, but it just feels like a half-baked hack to me.

Updating from the same table

Last time, I was bitching about doing an update based on a select from the same table in MySQL. Well, after reading the chapter on "Views, Derived Tables, Materialized Tables, and Temporary Tables" in SQL for Smarties, followed by a search of the MySQL documentation, I figured out a way around that annoying limitation.

If you recall from last time, the query that wasn't working looked something like this:

mysql> UPDATE test_tbl SET val2 = (SELECT val2 FROM test_tbl WHERE val1 = 1) WHERE val1 = 3;
ERROR 1093 (HY000): You can't specify target table 'test_tbl' for update in FROM clause

Basically, this is doing an update and setting a column to the value of a scalar subquery. The catch is that, in MySQL, this doesn't work when the select and update are on the same table.

Well, the trick is that you have to make make MySQL materialize the subquery. I actually decided to try that after reading Joe Celko's story about views in the early days of DB2. At the time, it was using inline-expansion for views, which would sometimes generate invalid SQL. Thus it was sometimes necessary to make the database materialize the view.

Well, I figured the same thing might apply here. And so, after a little inspiration from the MySQL manual, here's a working version of the same query:

UPDATE test_tbl SET val2 = (SELECT val2 FROM (SELECT val2 FROM test_tbl AS tmp1 WHERE val1=1) AS tmp2) WHERE val1 = 3;

I stole the sub-subquery idea from the MySQL manual. It is, of course, completely gratuitous from a logical standpoint, but it forces MySQL to materialize the subquery. That, in turn, allows us to bypass the "no updating the same table" restriction, since we're technically selecting from a temporary table.

So there you have it! I was wrong about not being able to do that. It just takes a little finagling. Of course, I still think MySQL should be smart enough to make a temporary table on its own, but at least the end result is still possible.