Skip to main content

Handling large documents

2 replies [Last post]
mthornton
Offline
Joined: 2003-06-10

I am starting this topic to see if anyone else is interested in the issues involved with handling large (or enormous) documents using Swing.

I have been experimenting with viewing large documents with a JTextArea. The default PlainDocument (and GapContent) are OK for documents upto about 1MB (possibly larger if the GapContent was created with an appropriate initial size). There is a bug in the top 25 list which relates to the problems involved with larger documents:

http://developer.java.sun.com/developer/bugParade/bugs/4203912.html

Some of the issues in that bug do appear to have been addressed. However for large (and enormous) files, problems still remain especially with the way that lines are managed.

I have written a Document class which allows me to view
files of 15MB (over 300000 lines) using the standard
JTextArea without any memory problems. With a little more
work (loading the document in the background) I hope to be
able to handle the 100MB+ trace files I sometimes create.

My experimental classes have the following properties
1) The document is read only
2) The file is memory mapped using the nio classes
3) I parse the file quickly to locate lines, but only store their
position in an int[] (no object created per line). Line
elements are only created on request
4) The mapped byte buffer is decoded into characters on
demand (and the decoded blocks cached). This only works
with some character encodings.

If anyone is interested I can add more details.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
zander
Offline
Joined: 2003-06-13

Hi,

any new 'widget' that makes Swing better is always appreciated. Please make a reference to the widget at the wiki (see the button bar above).
I don't have any reason to use this widget; but as all tools go, if you have them, you'll eventually find a use for it. :)

I'm trying to group good widgets into one 'add-on' jar in the UICompiler project, which is open source (Apache licence) and I would be more then honored if you could add the widget to the project. You will gain good redistribution, we (all) get to access a good widget in a standard place.

Thank you.

http://uic.sf.net/api/

mthornton
Offline
Joined: 2003-06-10

It is just experimental code at the moment. I'm particularly unhappy about the character encoding fudge --- it makes assumptions which are only true for some encodings (and it isn't easy to tell if a given encoding satisfies the assumption). The only solution I can see to this is if the encoding is one of a small set of 'known' cases we can use it directly, otherwise the entire input has to be copied to a temporary file in one of the known encodings. The property I require is the ability to decode the data without starting right back at the beginning of the file. All the single byte character sets satisfy this as well as UTF-8, UTF-16LE and UTF-16BE, but not UTF-16 (which always wants to see the byte order mark before it can decode anything).

The issue which I have yet to implement is background loading of the file. The catch here is notifying changes seems to imply creating events which explicitly include objects for all the newly added elements (I have yet to see if JTextArea actually asks for these).

I may do something about this at the weekend, in the mean time any suggestions/comments are welcome.