The recalcitrant engineer
Wednesday, November 05, 2003
 

301: This blog has moved permanantly

Please see http://www.burnthacker.com/
Tuesday, November 04, 2003
 

The "too complex" rule: if you can't figure it out in a day or two, it shouldn't be done

There are a few things that have been bouncing around in my head the last few days with regard to software. The first is an old adage, from I think Brian Kernighan, that says (roughly) "you must be twice as smart to debug code than you need to be to write it". If you use all your cleverness to write the code, then by definition you aren't smart enough to debug it. The other thing is a quote from Albert Einstein that (paraphrased) is "If you really understand something, you should be able to explain it to your mother."

I've been thinking about this a lot, both due to the inherent conceptual complexity of a web framework I've been working with and the difficulty trying to explain it. The other thing that's been bouncing around is trying to figure out a really complex web UI if you're a backend java person.

Real world problems are hard. Really really hard. A commercial webapp that needs to actually scale to thousands of users and cluster, that has hundreds of pages and probably 50-100 tables is a beast. Super complex. Not necessarily in any part, but as a whole, it's a freakin' monster. When you try and build a framework that can be a platform for such a thing, you're up against a very big, very tall cliff. There's almost this tradeoff: by the time you get your framework complete enough to actually work for the app, it's going to be so complex in its details and behavior that it won't be understandable without a huge amount of ramp up time. There's a lot to be said for good theoretical backing, orthogonality, good general design, etc, but that won't make things necessarily crystal clear. I know that ultimately, if its a good design, it will have that clarity, but I am wondering if anyone is smart enough to do it, or if its just going to take a lot more time and a lot more baby steps before we can get there.

The other half of this is complex user interfaces. I'm currently working with a UI where developers have implemented extremely clever JavaScript, which, when it explodes, blows the minds of everyone expect perhaps 1 or 2 people. Such a thing is really not workable, as the long term maintenance is not going to be either cheap or possible, depending on your perspective.

I'm starting to wonder if the only thing that actually works is really, really dumb code. Not necessarily bad, but forgoing a lot of things like complex user interfaces, skinnability (micrositing), massive performance throughput, etc, in order to just get something that a normal human can understand and maintain. It's depressing that microsoft access is looking like a good answer for business applications.  

Monday, October 27, 2003
 

Naming Conventions: Language versus Technology Holy Wars

One thing I've been thinking about a lot is poor design decisions, and how they cripple software. I'm probably going to write a good bit about this over the coming week or so as I serialize out my musings and try and extract something worthwhile to read. There's a lot of angles on this, and I'll try and write down all the things that have been floating around in my head on them.

Today, we see a lot of API and framework designs on top of existing languages, rather than a lot of new language designs. The main driver here is install base on the lanaguage side, but as Ralph Johnson told me several years back, APIs and frameworks themselves constitute a meta language, and are every bit as hard to design as the language itself. Sometimes it seems like this point is lost on the people that are doing the actual design.

Anyway, what I was thinking about is people that are doing APIs for a specific technology that exists in multiple languages. The classic (old) example of this is CORBA: When you build your IDL (interface description), it defines your object API in every language (with a single naming convention). The notable exception to this is in Smalltalk, where the language binding for CORBA actually fixes the naming conventions from "foo_bar" to "fooBar" at IDL compile time so the code you write still LOOKS like Smalltalk. In my opinion, this is another case where the Smalltalkers made the absolute correct decision.

A modern example of this is the APIs for acessing all the various X* technologies (DOM, XSLT, XPath, ...) and their language bindings. If you use the W3C standard APIs for Java (TRAX, JAXB, etc), you end up using these weird NodeList objects and other things that make your Java code awkward, ugly and difficult to maintain. However, if you use something designed for Java, like dom4j, you end up with code that looks, reads and feels like Java. And, as such, is amazingly cleaner and easier to maintain.

The disconnect here, as I see it, is the hubris of the people driving the technology, and their focus on a single language (usually C++). The assumption is that the technology will be far bigger than any single language, so the goal is to have code in the various languages look basically similar. This makes it so developers can move between languages and immediately be able to work on code that is driven by technology X.

I think this highly misguided, and misses the point of how real programmers work. In the real world, Java programmers write Java code. It has a set of conventions that are well defined, and the more Java looks like Java, the easier it is for people to come in and work on it. I will probably never work on DOM trees in C++, and if I did, I would have to go back and read the documentation anyway to find out all the language specific nuances of the APIs, so this convoluated API isn't buying me anything.

Having missed this point, many standards boards (such as the W3C) end up burdening other languages with these funky APIs, forcing bifurcation and duplication of effort as people work to reinvent what the API should have been in the first place, after the standard implementation is complete. 

Friday, October 24, 2003
 

How can is it possible that BEA's JMS is so slow?

A while back, I wrote a large amount of code that used the JMS implementation in Weblogic 6.1 for internal messaging. The system was a user interactive system, where timely deliveries of messages were critical for user display. Ideally, you wanted the user input to trigger a message, have the system act upon it, and have the user interface update with the result in milliseconds. What I expected when I began was that things would run at or near method invocation speeds, as long as I was in the same JVM.

The result I found was that things would stay magically "in the queue" for 3-5 seconds (that's seconds, not milliseconds) before the response was heard, each time a write was done. If you had to do a JMS write, then a JMS read to service a single page request, the result was an additional 6-10 seconds PER PAGE, on top of your normal computation and page rendering time.

You can say that a lot of things, such as removing the filters (I was using filters of moderate complexity) might have helped. You can say "Gee, everybody knows Weblogic JMS is slow." I, honestly, don't really care. The Weblogic JMS server is basically unusable for user interactive applications. I had to turn around and write my own internal messaging system, developing my own filter mechanism, etc. But once I was done, it ran at the speeds that I expected, and is in heavy production use today.

The honest truth behind it is that JMS has a lot of guarantees on the message delivery. In fact, it makes TCP seem like a total weakling in comparison. And yes, doing a lot of these things probably requires a lot of effort on the server's part. What I think BEA is honestly missing is the ability to relax what the server guarantees you in terms of reliability in order to recover the speed you need. Most people and most applications honestly don't require all the fancy pants stuff that JMS offers, at least most of the time. But for Weblogic to really make good on some fraction of JMS's potential is going to require it to become much more flexible and lightweight. It's a great mechanism for scaling computations, if it was fast enough to deploy. 

 

TimeZones: It's not just a java problem

I was sent this link this morning about one man's pain with daylight savings handling under Win32. If you read this, it's basically a dump on how the internals of the time zone computation mechanisms in windows are as weird and flaky as Java when it comes to DST handling. 
Wednesday, October 22, 2003
 

Who thought of the J2EE configuration mechanisms?

I think I'm something of a rare bird, in a way: I've worked as a system administrator for large, enterprise systems, and I've worked as the architect developing those systems. Sometimes it seems like the distance between the two is a chasm. The disconnect between the people that write software and the people that install and maintain it is amazing.

Case in point: Configuration files. What a system administrator wants is a flat file that lives somewhere (preferrably in /etc or /var somewhere) that clearly and concisely allows them to tune all the knobs of the system. This thing should be reloadable either via a SIGHUP, or, at worst, via a JVM restart. Either way, the file is readily accessible and easily changed.

J2EE actually puts the config files (deployment descriptors) INSIDE the application BINARY. I don't care that war and ear files are "just zip files". I, as a system administrator, do not want to disassemble the application to find the configuration file. Honestly, as a developer, I won't even manually attempt to take a war apart and then put it back together, because I almost always break it.

When I get in the config file, I want it to make logical sense to someone that doesn't have the source code in front of them. Don't mix the config options in with the nuts & bolts of what makes the application run (such as the servlet mappings).

This frustration has actually led me to abandon the j2ee config mechanisms entirely and implement my own system that is actually accessible to system administrators, doesn't require a GUI, and automatically detects and reloads itself. The response from the people using the applications that I've developed have been amazingly more positive now that they can see config options without trying to disassemble and decipher the deployment descriptor.  

Thursday, October 16, 2003
 

Taglibs and internal state

So I just discovered the hard way that you can't safely store internal state in a taglib.

My current application sports a fairly large taglib set that lets you do many powerful things, such as iteration, value display, automatic form element rendering and lots more. The thing I found is that if you have tags that do things like looping, when they nest, many appservers reuse instances of the Tag class for the nested calls to the tag. The problem here is when the tag saves instance variables: each nested invocation clobbers the state of the parent. If you're doing something like looping tags, when the innermost loop completes, the whole nested iteration completes because the state about being "at the end" of the loop just got clobbered by the inner most tag.

Instead, you have to push the state that you stored in the taglib member variables as attributes in the pageContext, which are scoped by the "depth" of the nesting of the tags. The depth of the nesting has to be computed by incrementing on doStartTag and decrementing right before you return SKIP_BODY. This "stack depth" gives each nesting a namespace in which it can declare its context variables and save its state. 

Tuesday, October 14, 2003
 

Time Zone Representations & Software

I've built a few systems in java that required doing lots of TimeZone conversions for end user date display. It's a pretty standard problem for any software that has to work outside of a single site, and generally falls into the i18n/l10n bucket of problems that have to be solved when you're building the app. Which, in my experience, means it rarely gets solved well.

I'm not going to start down the standard path of complaining about the implementation of java.util.Date; it's safe to say that horse is dead. What I did find fascinating, though, is just how complex time zones actually are. The first thing to realize is the timezone list that the JVM ships with is totally inadequate. It changes depending on the platform, and is fairly incomplete. You can't rely on it to build a real application. This generally means you have to make your own subclass of java.util.TimeZone which you then populate from a database (or something) that you ship with the product.

When you go to implement your own subclass of java.util.TimeZone, you'll find you need to provide a lot of values to the constructor of SimpleTimeZone to create the time zone, such as the day of the week that daylight savings time starts, the day of the week in the month it ends, etc. I still find these confusing to some extent, but thankfully I was provided with a list of the data so I just had to implement it.

The tricky part is: how do you actually refer to a time zone? The typical thing I've seen is "GMT-6" or something like that. What you find out when you delve into the details is, "GMT-6" is highly imprecise. All it says is "-6 hours from GMT", which maps to an actual SET of time zones. For -6 hours offset, you could have a set of different timezones with different DST rules.

The next natural question is: What is this set and where can I get the canonical list? The short answer is: It doesn't really exist. The crux of the problem is, time zones are a political creation. If you look, there is no ISO standard for timezone names (at least that I could find). These things are highly fluid; they change at the whim of legislatures and are not well defined to start with. Quite honestly, it's a complete mess.

If you look in /usr/share/zoneinfo on a UNIX machine, you'll see the "state of the art." One of the things that poking around at this data showed me is that timezones have a "historical" dimension, which is 100 times worse. This adds the problem of "from 1971-1979, daylight savings time started on the last Thursday of the month." If you're curious about what doing it "right" entails, read the manpage for tic, which is the timezone info compiler

The best answer I found to all of this is to flatten it into a namespace you define and pick the timezones you need to support from your userbase. Also, discard the historical dimension; not only is it confusing, Java doesn't support it.

The naming convention that I found most useful for naming timezones is "Country/City" like "America/Los_Angeles" because it actually maps to the political entity that created the timezone, and then show that string to the user to let them chose their time zone. 

Monday, October 13, 2003
 

Key Conversion Step Through: The Mechanics

I had to actually finish the job of getting the PVK file (mentioned below) into a JKS keystore today, and it was non trivial. I did not mean to mislead my loyal readers by portraying the PVK to PEM conversion as the last step in the process, as it most certainly is not.

The whole story is: The PVK format is known as "Microsoft Authenticode" format for a key, and is commonly used by COM (read: VB) developers for signing controls. For actually doing the signing, there's actually two parts that you need: The PVK file, containing the private key, and the SPC file, which is the actual certificate. Both of these must be used together and imported into Windows in order to export to a neutral format so they can be used. The SPC file is downloaded from Thawte after they complete the signing process.

The tool you need for the import is here, and it's called pvkimprt. Once you get that, you follow the directions, do the import and then (interestingly enough) the key shows up in Internet Explorer. In IE, choose Tools -> Internet Options -> Content -> Certificates... and you'll see your key under the Personal tab.

The thing to know about this next step in the process is you need to export the certificate with the private key, and the only way you can do that here is in the PKCS12 format (a well supported standard) , which the JVM (>= 1.3.1) can read correctly. Select Export.... Make sure you export the private key, accept the defaults and safe the cert to a PFX file somewhere.

Once you have a PKX file, you simply need to convert that to JKS format for maximum ease of use with jarsign. As usual, Sun displayed plenty of foresight and doesn't provide a tool to do this. Thankfully, someone has done the legwork for us. They go into much more detail about how to use the key in other environments, but for our purposes, all you need is KeystoreMove (which I have mirrored, but is available from the original page above as well). Compile this, and then run:
java -cp . KeystoreMove pkcs12 yourkey.pfx yourpass jks yourkey.jks yourjkspass Which then leaves you with yourkey.jks that you can use with jarsign. 

Thursday, October 09, 2003
 

PVK, PEM and JKS: Code signing key conversions

Ok: So if you go to thawte and buy a universal code signing key, you've just spent a very small amount of money to be able to run trusted code on pretty much everything everywhere. It's a great deal, but I won't comment on the security implications. It scares me silly. Suffice it to say, you can sign microsoft code (COM controls), java jars and anything else you need.

Anyway, the key you get is in "PVK" format, which is msft proprietary. Don't despair, this CAN be used for Java. What you are trying to do is get it into JKS format for ease of use with the JVM. What you need to do is first convert the "PVK" format to "PEM" (which is a standard) and then convert the "PEM" file to "JKS" using keytool.

I'm not going to explain how to use keytool here, as that's a lot of work. What I will say is the missing link is how to get the damn "PVK" file into "PEM". Thankfully this kind person reverse engineered the key store format and provides a downloadable utility to get the PVK file into PEM. From there, you can use BouncyCastle JCE (see below) to convert the PEM file into JKS with keytool. 

 

Keystores, keytool, JCE and the state of Java encryption

Not everyone has had to do this, but when you deal with signing jars and lots of crypto in java, things can very quickly get very complex. And you also quickly discover that most of the crypto stuff that's out there is actually broken and flaky. Most important is that the default JCE that Sun ships with the VM is anemic and buggy. In particular, before 1.4, the keytool that you use to manipulate keystores that comes with the JVM could only manipulate a custom proprietary keystore format (jks) unless you swapped the JCE provider. The problem is, none of the key signing agencies actually deal with anything but standards like PEM encoded keystores.

But what is one to do? RSA charges a truckload for their BSAFE libraries, and having them in your app doesn't do much for you except to be able to say "Yep, there's BSAFE in the classpath." OpenSSL (the de facto C standard) can be used for some things like doing command line operations and key generation, but it doesn't do you any good from Java.

The best thing I've managed to find is The Legion of the Bouncy Castle. They make a great JCE provider that actually reads and writes standard formats, along with a lot of other great crypto tools, like OpenPGP compliant encryption, etc. And it's all BSD license, it works and it's a full cleanroom implementation. 

 

Self Documenting Builds & Ant

As I've been thinking more and more about ant, I've rememberd a lot more tricks that I've found useful. I didn't intend to trickle all this out over a week or so, but I suppose having several articles about ant is more interesting than one gigantic huge article.

There are really two ways to approach telling your users about how to run the build: send them a big document that describes the build, or just have the build itself guide them. In my experience, nobody reads documentation anyway, and it gets rapidly out of date. So it's better to just use what ant provides. In this case, your biggest boost is setting the description on each task. This is done like:
<task name="webapp" description="Build the main deployment ready web application. This makes the war file.">
.... </task>

Once you do this, everyone can see what the build targets do with a simple call of:
ant -projecthelp
Which shows you all the targets defined, and their descriptions.

Also, It is frequent that builds need things like Weblogic to be on the host to be able to actually build the project. Usually, this is done by requiring a user to have a local properties file that defines paths for their machine so that the build can get running. The powerful thing here is ant's unless and if attributes on tasks.

The easiest way to do this is to do something make a task like:
<task name="weblogic.config.check" unless="weblogic.home">
  <fail message="Please define weblogic.home in your local.properties file to point to your weblogic server home"/>
</task>

<task name="compile" depends="weblogic.config.check">
....
</task>

This way, you know that unless the user defines thier weblogic.home, the build will fail, and the build will tell them exactly what they need to do to make the build run. This tends to reduce ramp-up time a lot for new hires. 

Wednesday, October 08, 2003
 

One more ant tip: paths and multiple build scripts

The reason I wrote the chunk of ant advice below originally was I was trying to remember all the stuff I had learned about advanced usage of ant that was seriously difficult to figure out to pass it on to someone. The problem was, I forgot the thing that I found the weirdest and that actually inspired me to start writing the blog entry.

Anyway, when you're splitting a large build up into multiple files, you really want all those files to have a single global classpath that they share for varius parts of the build. The problem is, normal path blocks are actually local to the xml file that declares them. They are not inerhited across <ant> invocations, even if you set inheritAll=true. What you instead have to do is flatten the path into a property that is colon separated and use that as a classpath rather than a classpathRef.

If you flatten the path in the global build script into a property, then all the invoked scripts can see it correctly and all is bueno. The magic to do this is:

<path id="base.path">
....
</path>

<property name="base.path" refid="base.path"/>
Note that the property and the path exist in separate namespaces, so there's no collision here having them both named base.path. For clarity, it might be better to name the property version base.path.property
Main Entry: re�?�cal�?�ci�?�trant
Pronunciation: -tr&nt
Function: adjective
Etymology: Late Latin recalcitrant-, recalcitrans, present participle of recalcitrare to be stubbornly disobedient, from Latin, to kick back, from re- + calcitrare to kick, from calc-, calx heel
Date: 1843
1 : obstinately defiant of authority or restraint

ARCHIVES
04/01/2003 - 05/01/2003 / 05/01/2003 - 06/01/2003 / 06/01/2003 - 07/01/2003 / 07/01/2003 - 08/01/2003 / 08/01/2003 - 09/01/2003 / 09/01/2003 - 10/01/2003 / 10/01/2003 - 11/01/2003 / 11/01/2003 - 12/01/2003 /


Powered by Blogger