Weird index breakage

Here's an odd one. So I was trying to run a simple query for a data import I'm working on. I've got about 11,000 rows in the media table with a site_id of 2 and I wanted to see which media types were represented. So I ran this:

SELECT DISTINCT media_type FROM media WHERE site_id = 2;

To my surprise, this returned...nothing! No rows at all. So I took off the DISTINCT, just to check, and got back all 11,000 rows. What the heck?

Something was obviously wrong. This should never happen - if there are rows with the correct site_id, that query should always return something. So what now? Might as well look at the query plan:

mysql> explain extended select distinct media_type from media where site_id = 2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: media
type: range
possible_keys: idx_media_site_media_type,idx_frontpage
key: idx_typesiteid
key_len: 2
ref: NULL
rows: 15
Extra: Using where; Using index for group-by
1 row in set, 1 warning (0.00 sec)

mysql> show warnings \G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select distinct `cms`.`media`.`media_type` AS `media_type` from `cms`.`media` where (`cms`.`media`.`site_id` = 2)
1 row in set (0.00 sec)

Nothing out of the ordinary there. It's using the idx_typesiteid index, which is correct. Maybe that index is broken. Let's try another one:

select distinct media_type from media use index (idx_media_site_media_type) where site_id = 2;

Yup, it worked that time - I got 5 rows back, which is about what I was expecting. I get the same results if I use
IGNORE INDEX (idx_typesiteid) as the index hint. So idx_typesiteid must just be broken. I should probably fix that, but it was getting into peak hours for our site and I didn't want to run an OPTIMIZE TABLE or anything that would lock the media table.

But wait, there's one other weird thing. Just for the heck of it, I tried running the query with a few different index hints. And guess what: the query still works if I use FORCE INDEX (idx_typesiteid). Does that make sense to anybody? So if I ignore that index it works, and if I force the same index, it works, but if I give no hint, it claims to use that index and the query returns no data. I got nothin' on that one.

PHP is developed by morons

Well, it's official: the people who develop PHP are morons. Or, rather, the people responsible for adding namespaces to PHP 5.3 are.

Why do I say this? Because I just read an annoucnement on Slashdot that they've decided on the operator to use for separating namespace in PHP 5.3: the backslash (\).

Seriously? The friggin' backslash? What kind of choice is that? Last I knew they'd pretty much decided to go with the double colon (::), like C++, which at least makes sense. But the backslash?

What's worse, just look at the RFC listing the operators they were considering. In addition to the backslash, they had the double star (**), double caret (^^), double percent (%%), a shell prompt (:>), a smiley face (:)), and a triple colon (:::). For God's sake, it looks like they picked this list out of a hat. They might as well have just used the string NAMESPACESEPARATOR. It's no less absurd than any of those.

Now, let's be realistic for a minute. In terms of syntax, PHP is a highly derivative language. It's an amalgamation of Perl, C++, and Java, with a dash of a few other things thrown in.

Given that heritage, there's really only a handful of choices for namespace separators that even make sense. The first, and most natural, is the double colon (::). This is what C++ uses and it's already used for static methods and class members in PHP. So the semantics of this can naturally be extended to the generic "scope resolution operator." Keeps things clean and simple.

The second choice is the dot (.), which is what's used in Java, C#, Python, and many others. This is a bit unnatural in PHP, as dot is the string concatenation operator, but it at least offers consistency with other related languages.

Third is...actually, that's it. There are only 2 valid choices of namespace separator. And the PHP namespace team didn't pick either one. Nice work guys.

The Slashdot article also linked to an interesting consequence of the choice of backslash: it has the potential to mess up referencing classes in strings. So if your class starts with, say, the letter "t" or "n", you're going to have to be very careful about using namespaces in conjunction with functions that accept a class name as a string. Just what we needed. As if PHP isn't messed up enough, now the behaviour of a function is going to depend on the names of your classes and the type of quotes you use.

I guess I'm going to have to bone up on my C#, because PHP seems to be going even farther off the deep end that before. It was always a thrown-together language, but this is just silly. The backslash is just a stupid choice for this operator and there's just no excuse for it.

A VZW WTF

I've been looking into new cell phones recently. My Verizon Wireless contract is up next month, so I've been doing my homework, trying to figure out what model phone I should get this cycle.

Like any good computer geek, I've been doing my research online. So while I was browsing the Verizon Wireless site to see what was available, I came across this lovely little popup:
Due to inactivity, your session will end in approximately 5 minutes.  Extend your session by clicking "OK" below

Seriously Verizon? Asking the user to manually "extend" his session? What the heck?!? Is this some kind of throw back? Is this the Web 2.0 version of a pay phone - you just click a button instead of putting a quarter in the slot?I mean, REALLY?!?

Aside from the mind-bendingly horrible design, there are a few other things that bother me about this dialog.

First, what does it even mean? It's not like I was logged in or anything, so what possible data could this "session" have that I even care if it expires?

Second, why is this an alert box and not a confirmation box? The only option is to click "OK". Well, what if I don't want to extend my session? What do I do then? And if not extending the session isn't an option, why are they even bothering to ask?

Third, why is this even here at all? Speaking as a professional web developer, I can't think of a single technological reason why they would have to do this. It just isn't necessary. There are other, less obtrusive ways to persist data that don't involve issuing prompts. Is this really nothing more than a lame attempt to keep the user engaged, whether he wants to be or not? "Look at me! Look at me! You have to click me! Hey, pay attention!"

Sigh.... Just another reason to hate Verizon. If only it wasn't for their coverage area and free in-network calling, I'd jump ship.

On the up side, at least I got some useful phone information. I was originally leaning toward those new touch-screen iPhone knock offs, like the Samsung Glyde or LG Voyager. However, we just got back from a vacation in Hawaii, so I'm trying to keep the expense down. And even after the contract pricing and rebates, the touch screen phones are pretty much all $150 or more, and I really don't want to drop $300+ on a pair of new phones right now. Plus, appart from the inherent coolness of the touch screen, I'm not really sure what those phones would do for me that anything else with a QWERTY keyboard couldn't.

So, I'm thinking maybe a basic smartphone. After contract pricing and rebate, Verizon has the Motorola MOTO Q 9c and Palm Centro for $100. Both of them seem nice and have document viewing and other features that would actually be quite handy for me. There are still things to debate, though - the Centro has a touch screen, but the MOTO Q has Windows Mobile. I'm going to have to do a little more reading.

MySQL comment weirdness

Here's one I discovered for the first time yesterday - MySQL doesn't actually use the ANSI SQL comment sequence. Who knew?

For those who are a little rusty on such things, ANSI SQL-92 specifies the comment character as a doule dash (--). Anything from a double dash to the end of a line is a comment.

Well, in MySQL, that doesn't actually work. You have to have a space (or control character) after the double dash for MySQL to treat is as a comment. This is good style anyway, but MySQL makes it a hard requirement. Apparently the reason for this is that it can cause conflicts with software that auto-generates really crappy queries.

Is it just me, or does that seem like a really gratuitous change? I mean, I can see their reasons - it would be nice if an auto-generated query like UPDATE account SET credit=credit--1 actually worked instead of reducing to a no-op. But, on the other hand, is it really a good idea to change the database server because some vendor is too lazy to make their query-generator produce the more correct and logically equivalent UPDATE account SET credit=credit+1 instead? I don't know. I'm sure they had their reasons, but it just feels like a half-baked hack to me.

Keys and NULLs

Well, the cat is currently laying on the book I was trying to read, so I guess I might as well blog.

Note to self: the rule in SQL is that you can have NULLs in foreign keys, but not in primary keys. Edwin asked me a question related to this the other day and I had to stop and think about it for a couple of minutes, so I figured I'd write it out to reinforce the concept in my own mind.

When you think about it, these rules make perfect sense. Since NULLs in SQL denote an unknown or inapplicable value, having one in a primary key would be nonsense, because a primary key must uniquely identify a row. Having a NULL in the key would mean part of the unique value was unknown. (Note that you can't say that a tuple with a NULL in it is unique because one NULL is not equal to another due to the 3-valued logic used by SQL.)

As for foreign keys, SQL gives the "benefit of the doubt." If you think of a NULL as an unknown value which may later become known, this makes sense. Sometimes you might have to add the values of foriegn keys in multiple operations, which would not be allowed if NULLs were blocked by the constraint.

From a logical point of view, this is kind of an interesting case as well. You can view a foreign key constraint check as a conjunction of equivalence checks. For example if T1(c1, c2) is foreign keyed to T2(c1, c2), then the constraint check can be viewed as the expression T1.c1 = T2.c1 AND T1.c2 = T2.c2. In other words, if all the key fields match on both records, the constraint is satisfied. If not, it fails. However, with NULLs, the value of one of the conjuncts would be NULL, which would render the entire expression NULL.

But what would that mean?

The DMBS can't say that it doesn't know if a relational constraint holds or not. So, given the interpretation of NULL, it makes sense to use the "benefit of the doubt" principle and defer final judgement on the constraint until all the NULLs are filled in.

The world's most accurate Twitter account

Thanks to Carl and Richard from DNR for pointing out the best Twitter stream even. They mentioned it on show 379. You can find it at: http://twitter.com/thisispointless.

There are 3 things I like about this stream. First, it's actually kind of funny. Second, I think the username pretty well sums up everything about Twitter. And third, I just love the open mockery when there are people out there who are actually trying to communicate over Twitter. Because apparently nothing worth saying could possibly take more than 140 characters.

I'm sorry, that was a cheap shot. I was misrepresenting the true purpose of Twitter - to send pointless messages that people really shouldn't care about.