Posted by wiverson
on July 14, 2003 at 1:29 PM PDT
One of the things that I've been somewhat annoyed by is the sheer variety of data types floating around. I'm reminded of the old days, when connecting to a BBS and you had to specify even/odd parity, or the bit order - nowadays, you can pretty much just assume that you can connect with a PPP connection to get a TCP/IP "dialtone."
I've been somewhat surprised at how long the data type problem has been floating around (no pun intended) in the computing world. I can talk about a bit pretty conclusively. I can hope that a byte is ordered the right way. I have a dim hope about a char in the lower 127 range, but the high bit stuff is less convincing. And a string? I can hope that a Java String is Unicode, but if I write it to disk (or read it back) it's a bit more dicey.
One of my more interesting consulting gigs was explaining to people why the text data in their database was in a lot of trouble. They had been accepting regular US ISO-8859-1 text, ASCII, and also some south European, and even a bit of Japanese, and were running into all sorts of problems.
Then, there's the database impedence mismatch. What's a character? What's an INT or a BIGINT, or a TEXT? Or, forbid, the various date and time incarnations? How do the differences there match up to Java? What about if somebody else is using PHP for a bit, and they're talking to the same database...?
This is cropping up in the web services field as well (again, no pun intended). Consider the various lumps of text that might be flying back and forth, for example in a catalog. Are little bits of HTML allowed? What are the rules for that?
As a side note, I've been working on a small framework for an application that provides for a common XML format and a set of Java classes for installing schema into a relational database. The idea is that your application can install the schema automatically when you first install the application - you just have to specify the schema in XML and then let the application know to install. One of the nicer aspects it that It uses a lot of ALTER TABLE commands to generate the schema - you don't have to lose your existing data set if you add a field. I think this might be a useful LGPL open source project - thoughts...?