The recalcitrant engineer
Saturday, July 19, 2003

File encodings suck, or dealing with localizing a real world app

If you're like me, you probably have to deal with bringing translated text (resource bundles) into your application. This means, you send your english property files off to another company, and they send them back to you in french, russian, spanish, korean and a truckload of other languages. The problem is, the translated files arrive from the (terrible) translators in any random file encoding that the translator felt like using.

That is, you get the files in some weird Latin-3 encoding and you need it in UTF-8. Except you don't know that it's in Latin-3, it just looks like a bunch of garbage because you don't have fonts to render greek on your box. What is one to do?

The two things you need, if you're dealing with java, are iconv and native2ascii .. native2ascii is the tool that converts from the binary utf-8 encoding to an ascii encoding with escapes, so it looks like \uXXXX whenever there should be a non-ascii chacter. native2ascii comes with the jdk, so you've already got that. iconv or piconv are available under cygwin or any linux distro, and both work equally well. Remeber, in the end, you're going to end up running the file thru native2ascii before you try it in the app, anyway, because that is the only format that java can read natively.

Remember, when you're working on a moden linux machine, vi is UTF-8 clean, and on windows, notepad is UTF-8 clean. So what you need to do is using iconv or piconv to convert the file from whatever encoding you guess it is to UTF-8, and then open it with vi or notepad. If it does not look corrupt, you're in business. At that point, run native2ascii -encoding < file >, where is the encoding you used with iconv to produce the file that looked ok in notepad or vi.

The homebrew heuristic to detect the corrupt files that were converted with the wrong -encoding switch with native2ascii, based only on experience, is as follows: Pick a file that has a sequence like this [ascii character][non-ascii character][ascii character]. Take the file and run it thru native2ascii, with the source set to a your encoding that you think is right. If the output looks like [ascii character]\uXXXX[ascii character] then you've found the encoding. However, if there are more than 1 \uXXXX sequences between the ascii characters, your encoding is wrong and you need to try again.

Of course, if you had a hex editor and a copy of the code pages on hand, you could determine this much less heuristically, but I can never find either of those when I need them. 

Friday, July 18, 2003

Relearn how to type, it's worth your effort; gesture keyboards rule

I've recently switched to using a Fingerworks gesture keyboard. It takes a lot of effort to re-learn how to type (how well can you type on a flat table?). Once you throw in the 3 weeks of retraining, however, the boost that the gestures support gives your development is amazing. Imagine making the "hang loose" sign with your left hand and your ide does autocompletion. Then you slide around your right hand, and without leaving the keyboard, you're jetting around using the mouse, arrow keys at lightning speed with your left hand, and simple gestures that trigger extemely complex application level macros.

The biggest challenge is not learning how to gesture or use your mouse, though, its learning how to type. I have yet to switch over to the fingerworks keyboard for my main device, just because my programming on it is like 1/3 speed of my normal development, which personally makes me want to throw the computer out the window. But I am getting faster, and I can see the day where my Kinesis will go up on ebay and I'll be that much faster and efficent when I work. 


"COTS" Open Source: Practically Delivering on the COTS vision

I've recently been putting in a lot of work building a custom web application framework for internal product work at my organization. Largely, this effort has focused on using off the shelf components, leveraging their strengths and implementing the custom framework as a layer of glue between those components to integrate them into a platform. Some would say "duh" and claim that is the whole point of using off the shelf software. And I do not entirely disagree.

The exciting bit here is that this is actually real, and the integration can happen in such a fashion that things are actually nicely coherant and integrated as a tools platform without having to spend a fortune on customization of the invidual tools from vendors, and deal with the fact that tools don't meet your requirements exactly. Instead, just having the ability to look inside the tools and work around issues, work around features that are getting in your way, you can very quickly assemble something that will rocket you ahead in your development effort.

The tools that I have glued together are jStateMachine (no longer publicly available), Hibernate and a collection of open source libs (dom4j, etc), using Eclipse UML as a charting tool. This has given us a system that lets you draw uml statecharts that describe your app, run them immediately in the appserver and cleanly drop in dhtml renderers for your data that are decoupled from the backend logic. 

Tuesday, July 08, 2003

Scrollable Results & Hibernate

I was looking for a method to do nice, database indedendent paged results without sucking the whole result set out of the database. Just found out from crazy bob that scrollable result sets don't work the way they "should", mostly because the drivers suck. So much for that idea.

However, it turns out that Hibernate (which I am looking to use) has an Oracle9 dialect that understands the pattern of using a nested select and the Oracle ROWNUM pseudo-column. Here's to the Hibernate developers for a clean solution. 


Plugging some amazing effort: Reality Interactive

I'd like to take this opportunity to plug Reality Interactive and my buddy Rob. He's done some amazing things with the 1.4 JDK, and, despite the fact that it looks like he rewrote a game that's several years old, he's done something infinitely cooler. The promise here (which he has shown proof that he can realize) is that simulations (i.e., games or things like flight simulators) can be written much faster, much more reliably in Java than they can in C++. Over and above the boost of using a better language that's more suited to the task, he's built a toolkit that brings a lot of the hard won experience of the enterprise java world to simulations. This lets people do things like focus on actually writing their physics engine rather than invent another architecture that will be discarded in 8 months.

3 cheers for some real innovation. 


Why did you make us wait for this?

To echo the sentiment of a thundering crowd: Is this really what we were waiting for? 
Main Entry: re�?�cal�?�ci�?�trant
Pronunciation: -tr&nt
Function: adjective
Etymology: Late Latin recalcitrant-, recalcitrans, present participle of recalcitrare to be stubbornly disobedient, from Latin, to kick back, from re- + calcitrare to kick, from calc-, calx heel
Date: 1843
1 : obstinately defiant of authority or restraint

04/01/2003 - 05/01/2003 / 05/01/2003 - 06/01/2003 / 06/01/2003 - 07/01/2003 / 07/01/2003 - 08/01/2003 / 08/01/2003 - 09/01/2003 / 09/01/2003 - 10/01/2003 / 10/01/2003 - 11/01/2003 / 11/01/2003 - 12/01/2003 /

Powered by Blogger