Skip to main content

Using regular expression meta characters as columnDelimeters

1 reply [Last post]
Anonymous

If the colummDelimiter attribute of the table element is a regular
expression meta character then a line of data may not be broken into
columns like the developer/user expects. The default column delimeter is
a tab character '\t' but some data files - like J2EE server logs - use
regular expression meta characters to represent delimiters.

For example, given a statement like:

Where a line looks like: "One|Two|Three|Four". The "|" will be treated
as the meta character "or" and the resulting columns will look like:

O
n
e
|
T
w
o
|
T
h
r
e
e
|
F
o
u
r

Rather than the expected:

One
Two
Three
Four

The problem manifests itself in
swingx....TabularDataTextLoader.loadMetaData() and the following test
case illustrates the issue:

<br />
public class RegexTest {</p>
<p>     public static void main(String[] args) {<br />
	String[] data = { "One", "Two", "Three", "Four" };</p>
<p>	String delim = "|"; // try other meta character delimeters<br />
	String test = makeString(data, delim);</p>
<p>	// Bug! delim is a regex meta character.<br />
	String[] columns = test.split(delim);<br />
	// Solution: must escape regex chars<br />
	// String[] columns = test.split("\\" + delim);</p>
<p>	for (int i = 0; i < columns.length; i++) {<br />
	    if (!data[i].equals(columns[i])) {<br />
		throw new RuntimeException(data[i] + " is not equal to " + columns[i]);<br />
	    }<br />
	}<br />
     }</p>
<p>     public static String makeString(String[] data, String delim) {<br />
	String test = "";<br />
	for (int i = 0; i < data.length; i++) {<br />
	    test += data[i];<br />
	    test += delim;<br />
	}<br />
	return test;<br />
     }<br />
}<br />

There are a few solutions to this issue:

1. Maintain support for columnDelimiter as a regex. This would mean that
it should be documented that columnDelimeter is a regex and the xml
devloper would have to know the regex meta characters and escape them in
the xml. This would fix the above problem:

2. Remove support for regular expression columnDelimiters. In which case
the implementation of TabularDataTextLoader.loadMetaData should change.

3. Add a new attribute to tabularData to indicate that the
columnDelimiter is a regular expression. In which case, columnDelimeter
may have a diffrent context depending on the state and the
implementation would still have to change:

I'm in favor of option 2. If the xml layer is targeted towards
novice/markup developers then I feel it's too much to expect for them to
understand the nuances of regular expressions. I could be persuaded for
option 3 but only if regex is turned off by default. In otherwords,
regex="true" would be required to make columnDelimiter represent a
regular expression.

Comments?

I've filed this as issue 27 if anyone wants to track this.

--Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: jdnc-unsubscribe@jdnc.dev.java.net
For additional commands, e-mail: jdnc-help@jdnc.dev.java.net

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
rameshgupta
Offline
Joined: 2004-06-04

Option 2 is what we had in the first implementation about a year ago. But we changed the spec and implementation based on feedback we received from early reviewers, and from practical use-cases which required regex support.

With regex support now integrated at the lowest levels in J2SE (e.g., String.split), I expect more and more use of this facility in future. In the case of JDNC and Swing Extensions, regex pattern matching in filters and highlighters has consistently been regarded as one of the coolest and most useful features. So, I see no reason to shy away from this feature :-)

In any case, this is not an XML-specific issue, as all of the XML constructs are ultimately based on our Java APIs. In this case, the setColumnDelimiter method in TabularDataTextLoader is clearly documented to accept a regular expression string. And I agree that the XML documentation needs to make this clear as well.

From a language/api design standpoint, I still prefer option 1 to option 3. Just compare String.split(String delimiter) to String.split(String delimiter, boolean isRegEx); Which would you prefer? My vote is to keep the argument of TabularDataTextLoader.setColumnDelimiter() consistent with that of String.split(), and reflect that in our markup language as well.