Posted by kohsuke
on August 11, 2006 at 2:14 PM PDT
Edit distance computation, or how the CS undergraduate education is sometimes actually useful.
One of the things that really differenciate a good tool from a mediocre tool is the error handling. So in the JAXB RI , I spend a lot of efforts in making sure that the tools and the runtime detects errors, print them in a way that makes sense, and try to diagnose the problem better.
One of the typical human mistakes is a typo. After all, humans are intelligent but not supposed to be diligent, so there's no way we can remember long namespace URIs like http://java.sun.com/xml/ns/jaxb (was there a slash between xml and ns? Was there a trailing slash?)
This happens to the JAXB RI users, too. When you type namespace URIs, you often make a typo. So when the JAXB RI finds some unknown type name inside @xsi:type, we first check if it's a typo of some known types.
This is where the notion of edit distance becomes useful. This pretty simple algorithm can compute how "close" one string is to another string, and works well for suggesting a fix for a typo. That is, if you have a list of "correct values", then find the one that has the smallest edit distance to the user given "wrong" value, and that one might be what the user have meant.
This algorithm is useful in many other contexts, when a program deals with human inputs. Just this morning, we were talking about using this algorithm in Glassfish asadmin command, which takes a sub-command name as an option (and there are more sub-commands than we can remember!)
This would be useful in the JAXP validator, too. Wouldn't it be nice if a validator could say "invalid element <naem>. Did you mean <name>?" (I did exactly this in my MSV validator, but the last time I checked the JAXP RI didn't do this.) Or how about your IDE telling you that not only is the class name MassageContext wrong, but it's actually likely a typo of MessageContext? And the list goes on.
If you are interested in using this in your project, Here is the working code under CDDL that I use in the JAXB RI.
P.S. the edit distance algorithm says "diligent" and "intelligent" are not related (distance 5.) Yes, it's that smart.