Text-based UML

Recently I discovered a new tool that I never knew I needed - PlantUML.

If you're like me, you probably want to do more UML.  I mean, I'm interested in software design and architecture.  I read books and articles about it.  I even wrote my thesis on formal modeling.  So I'd love to do more UML modeling.

The thing is...I don't like UML modelers.  I mean, it's not that the tools are bad - in fact, some of them are pretty good.  It's just that creating a UML model feels so heavy.  And while the actual modeling features that many tools have are really cool and useful in some circumstances, I find that 90% of the time all I really need is a simple diagram.  And while any UML tool can help you make a diagram, I feel like I usually end up getting bogged down in the mechanics of putting it together.  You know, you've got to select the right type of box, select the right type of relationship, then the tool renders the connections in a weird way by default so you have to fix it, etc.  Before you know it, you've spent 20 minutes on a diagram that would have taken two minutes if you'd done it on paper.

Enter PlantUML.  It bills itself as a "drawing tool" for UML, but the upshot is that it's a way to define your models in plain text.  You just write your models in your favorite text editor (and yes, there's a Vim syntax file available), run the tool, and it will spit out a rendered UML diagram.  Here's an example:

,

And here's the text that generated that: @startuml class Link { name : string description : string url : string } class Tag { name : string } class Folder { name : string } class User { username : string password : string setPassword(password : string) } class Comment { body : string } Link "1" -- "*" Tag : has > Link "1" -- "*" Comment : < belongs to Folder "1" -- "*" Link : contains Folder "1" -- "*" Folder : contains User "1" -- "*" Link : owns > @enduml

As you can see, the syntax is fairly straight-forward and pretty compact.  All of the standard UML diagram types are supported and the syntax allows you to provide minimal detail and still produce something meaningful.  In addition to the GUI shown above, it can also run from the command line and just create PNG images (or whatever format you like) of your diagrams, so you could easily work it into your build pipeline.  And the installation is simple - just download and run the JAR file.

The thing I really like, though, is that this text-based format makes it easy to store and source-control UML alongside your code.  Yes, you technically can do that with other formats, but it's awkward.  XMI files are huge and ugly and I don't even want to think about the project files for Eclipse-based tools.  But with PlantUML you can just have a directory with some "modelname.pu" files in it that are small, simple, and produce diffs that are easy to read when you change them.

I haven't tried it out yet, but I'm also interested in how feasible it would be to put the models right in the code, e.g. put the text in comments.  Seems like it might help with the whole "keeping code and models in sync" thing.  But maybe that's a bit much.

I recommend checking it out.  If you want a quick and easy method, there's an online version that you can test.

Keys and NULLs

Well, the cat is currently laying on the book I was trying to read, so I guess I might as well blog.

Note to self: the rule in SQL is that you can have NULLs in foreign keys, but not in primary keys. Edwin asked me a question related to this the other day and I had to stop and think about it for a couple of minutes, so I figured I'd write it out to reinforce the concept in my own mind.

When you think about it, these rules make perfect sense. Since NULLs in SQL denote an unknown or inapplicable value, having one in a primary key would be nonsense, because a primary key must uniquely identify a row. Having a NULL in the key would mean part of the unique value was unknown. (Note that you can't say that a tuple with a NULL in it is unique because one NULL is not equal to another due to the 3-valued logic used by SQL.)

As for foreign keys, SQL gives the "benefit of the doubt." If you think of a NULL as an unknown value which may later become known, this makes sense. Sometimes you might have to add the values of foriegn keys in multiple operations, which would not be allowed if NULLs were blocked by the constraint.

From a logical point of view, this is kind of an interesting case as well. You can view a foreign key constraint check as a conjunction of equivalence checks. For example if T1(c1, c2) is foreign keyed to T2(c1, c2), then the constraint check can be viewed as the expression T1.c1 = T2.c1 AND T1.c2 = T2.c2. In other words, if all the key fields match on both records, the constraint is satisfied. If not, it fails. However, with NULLs, the value of one of the conjuncts would be NULL, which would render the entire expression NULL.

But what would that mean?

The DMBS can't say that it doesn't know if a relational constraint holds or not. So, given the interpretation of NULL, it makes sense to use the "benefit of the doubt" principle and defer final judgement on the constraint until all the NULLs are filled in.

F# on Mono

I learned something cool yesterday - F# runs on Mono! That means I can mess with it without having to use VMware!

I got interested in F# from hearing about it on Hanselmintes and .NET Rocks. For those who haven't heard of it, F# is a functional programming language, similar to OCaml, built on .NET. It's actually not an "official Microsoft product," but rather a project out of Microsoft Research, which is pretty cool.

Incidentally, working Microsoft Research is on my list dream jobs. I mean, how many organizations can boast of having had two Turing Award winners on staff? How could you not want to work someplace like that?

Anyway, the thing that excites me about F# is the combination of functional programming and its status as a first-class .NET language. I've been meaning to get up to speed on functional programming for a few years now, but I've just never gotten around to it. Learning LISP or ML always seemed on par with refreshing my Prolog and Ada skills - an interesting exercise, but not profitable in terms of marketability. I mean, how many Standard ML listing have you ever seen on Monster?

However, it looks like functional programming may start pushing more into the main stream. The current trend in hardware is that CPU speeds are flattening out and performance gains are being made by adding more processors or cores. However, most code today is not written to use more than one core/proc at a time. So we're going to have to start parallelizing our code to fully take advantage of the hardware. That's where functional programming comes in. Pure functions, by definition, have no side effects. So if you're writing pure functional programs, parallelism becomes much easier, as you have no worries about thread safety and whatnot.

So with F# I can now learn functional programming while using .NET. This means that I can leverage some of my existing knowledge while learning the new language, which always makes things go faster and smoother. It also means that this learning experience has some vague marketability, i.e. I can at least count it as .NET experience. In other words, it's not one of those "off in left-field" learning ventures like if I took up Intercal or APL. I'm not going to feel (as much) like I could be making better use of my time.

Anyway, it turns out that installing F# on Ubuntu 8.04 wasn't quite as painless as I had hoped. On the up side, the F# site does supply a ZIP archive with generic Mono-compatibile binaries and full source (under the MS shared-source license). However, it seems the binaries don't quite work right with Mono 1.2.6. That's fixable, though, thanks to Laurent Le Brun's article on using F$ 1.9.4.17 on Mono. Basically, the important thing is to remember to pass mono the --runtime=v2.0.50727 option when running the F# compiler or F# binaries.

I haven't been blogging much lately, but hopefully I'll be posting back in the coming months with tid-bits on F#. It's been a while since I tried to learn a new language, especially a non-procedural one, so I'm looking forward to it.

On e-mail encryption

You know what pisses me off? Bad arguments. As a philosophy major, I read lots fo bad arguments. As a "philosophy hobbyist," I still do. And nothing makes me madder than those arguments that are so hideously wrong you don't even know where to begin explaining the problem.

I ran into one of those at work today.  Because of HIPAA and other such laws, we're looking into encryption products (laptop drives, database, e-mail, etc.), and I was stupid enough to volunteer for the assignment. Of course, it proably doesn't matter, because the whole effort is doomed from the start. There is absolutely zero buy-in from our IT staff on the idea of deploying encryption products. The only person who is even remotely pleased with the idea is the boss, and I'm the only one of the staff who isn't dead-set against it. The network administrator is particularly against the idea, and without his support, it just isn't going to happen.

Anyway, the particular comment that pissed me off was concerning anti-virus filtering in our mail system. Basically, one of our network people was concerned that we might get encrypted viruses, and because they're encrypted, the anti-virus filters wouldn't be able to catch them. As support, this person cited a virus report from earlier this week about a "new" virus that works by enclosing the executable in a password-protected ZIP archive with the password in the message body. This "encryption" stops the virus filter from catching it, but if we just blocked all encrypted files, we wouldn't have to worry about it.

Now, to me, this entire argument sounds like complete bull. Of course, I'm not an expert on e-mail, network security, or encryption, so I could be wrong. If that's the case, somebody please correct me.  But the more I think about it, the more I feel like this is one of those arguments that's just barely true enough that you can make it with a straight face, and yet still have it be completely misleading.

First, I certainly never suggested that we allow any old encrypted file through the mail filters. Just because a file is encrypted doesn't automatically make it trustworthy. An attacker could certainly find a freeware symetric encryption utility from FooBaz Questionable Software, LLC, use it to encrypt a virus, and send it to everyone in the world along with the decryption key and instructions on how to get the naked pictures of Pamela Anderson out of the encrypted file. That would just be a variation on the password-protected ZIP file trick.

My choice would be to use a standard public-key system, like PGP or GnuPG. If you stick to allowing just encrypted messages in that format, then the "virus problem" goes away.  After all, the whole point of public-key cryptosystems is that the recipient of the encrypted file already has the decryption key before the file is even encrypted. Hell, the recipient is the one who generates both of the keys. To send public-key encrypted mass-mail, you'd have to encrypt the malicious attachment separately for each recipient. And since many recipients won't have a key pair, or the attacker won't the recipient's public key, the target audience is dramatically cut at the outset.

Plus you can have accountability in public-key cryptosystems.  After all, that's what digital signatures are for - so you can know who sent a message. If you're really paranoid, you could only accept encrypted attachments from messages signed by someone you trust.

Of course, nothing about public-key cryptography can prevent a someone with your public key from intentionally sending you a virus. And that's where the "just true enough" part comes in. Yes, an encrypted virus sent by a malicious attacker trusted by user won't be detected by the mail filters. Is this a problem? Well, if you have to do business with people you can't trust, then I guess so. But if you don't publish your public keys and don't do business with 13 year old script-kiddies, I don't see it as a big concern.  Besides, this is negated by the other anti-encryption argument this person has been pushing: that the data we're dealing with isn't really important enough to bother with.

So let me get this straight: we can't do e-mail encryption because we're swapping unimportant data with untrustworthy people. My question is: then why are we even bothering? We ought to just lock the doors and start browsing the want ads!

Sigh.... I'm done blowing off steam now. Time to start working on my CV.

A new interest?

This week I've been playing around a little with the Community Z Tools. They are a set of free tools for working with the Z specification language. They're still not anywhere near complete, but I guess they're better than nothing.

If you're an average programmer, you're probably not familiar with Z. It's a formal specification language based on set theory and first-order predicate calculus. Basically, that means it's a mathematical language to write program specifications. You build schemas, the Z unit of encapsulation and reuse, using the notation of predicate logic and set theory, and then compose these schemas to describe the various states and operations of the system. This has several advantages over the traditional natural-language specification, including the ability to deductively prove various properties of the specified system, such as consistency, and the simple fact that formal languages are, by their very nature, much more precise than natural ones. If you're interested in learning more, take a look at the book Using Z. I've been reading it and I think it's pretty good. It's also available for free in PDF format, which is really nice.

So anyway, I was playing with the CZT. The current incarnation is pretty much just a bunch of Java classes that provide a plug-in to jEdit. The plug-in allows you write Z specificaitons in Unicode and LaTeX (note that Z, like most mathematical notaitons, uses lots fo funy symbols, so you can't just write it in a word processor), an XML schema, and a typechecker. The main problems, aside from being incomplete, are that saving and reloading Unicode specifications doesn't seem to work correctly and that installing CZT is hideously complex. OK, that's probably an overstatement, but it's cretainly not easy, and it's made more difficult by the fact that the main download site only offers source, not binaries (which makes no sense, since it's written in Java).

As a brief aside, jEdit seems pretty nice. It's not likely to replace Vim or Kate for my purposes, but I like it so far. My only real complaint is that Swing's GTK+ look doesn't seem to work the GTK/Qt theme engine.

Getting back to the point, since I had problems with editing specifications in Unicode, I decided to give LaTeX a try. I knew nothing about LaTeX other than it's related to TeX which is used for typesetting, so I went to the web site and ended up downloading The (Not So) Short Introduction to LaTeX2e.

I haven't gotten very far into it yet, but I have to say that LaTeX looks extremely cool. It's a "document preparation system" that, frankly, looks kind of like a programming language. It's generally written by editing text files, so it's not a WYSIWYG system. With LaTeX, you focus on the logical structure of the material rather than the details of how it looks on the screen. That immediately struck me as the right thing to do. I've seen too many Word documents that were an incoherent soup of seemingly random formatting that looked decent on paper, but was a huge pain to try to edit later. I'm actually kind of anxious to try LaTeX out.