RE: [JAI-IMAGEIO] Getting filename to codec?
I'm debating whether this is off-topic, but I think not. We
all develop software that reads images using ImageIO, and this
is relevant. If you don't enjoy long-winded discussions of
software design, just skip this now. This is what makes me
love my job so I'm enthusiastic about it.
> All you need to do is implement a custom object
> (MultipleImageInput ?) that encapsulates both files. It can
> encapsulate it using a stream for each, byte for each, file
> for each, url for each, etc. If only one of the "files" is
> provided, in most cases you can find the other object, in
> other cases (e.g. the first is a byte) you will NEVER be
> able to find the second.
this is roughly how our original approach worked. however, it
has some shortcomings. the general problem is one of dependencies.
first, we want to keep our application independent of the details
of any particular plugin or image reader. second, we want to keep
the ImageReaders independent of the details of where the data
comes from. All we really need in the ImageReader is a stream to
this raises the following two problems with your MultiImageInput:
1. some object has to know where all of the files/urls for a
particular format are located, in order to put them into the
MultiImageInput. this object necessarily has to know the details
of your format, which means that this object depends upon
format-specific details and thus can't be part of a general
software package. if you push this knowledge down into the
ImageReader, which is a plugin and thus is not necessarily part
of a general software package, then this dependency is broken.
Take for example the RPF format, which you can read about at
This format contains hundreds of files, which are all listed in
a format-specific "table of contents" file. Regardless of where
you choose to put the code that creates the list of files, it is
format-specific. Furthermore, just listing all of the files isn't
sufficient to read the image -- the ImageReader has to know the
significance of each file, which it can't determine from a filename
2. in the case where you are not reading from Files, let's say
you're using HTTP URLs instead. you might be able to take an
initial URL and manipulate it to give you the names of the other
resources. That gives you a list of URLs instead of a list of
Now the ImageReader has to know how to open URLs for reading.
If you want to be able to read Files too, then the ImageReader
has to be able to differentiate between URLs and Files, and
do the right thing in either case.
Then what happens if you write a new ImageReader that also wants
to handle the same decision (URLs vs. Files)? And what happens
if you add a new type of resource (database, for example) that
you need to be able to support in all of your existing readers?
So that's how the StreamFactory came to be. It abstracts the
calling program from the details of how many or what files the
ImageReader needs to do its job, and it abstracts the readers
from the details of what kind of resource it is dealing with.
It uses a SPI architecture so that new sources can be supported
without impacting the core libraries or the image readers.
> It is impossible to work with a standard ImageInputStream
> that was created by the default ImageInputStreamSpis since
> the ImageInputStream does not have any meta information on
> where the stream is located.
agreed. see below for more on this.
> You could however also implement the ImageInputStreamSpi and
> for certain input types (e.g. file and URL) probably
> construct an ImageInputStream where the first object is at
> offset 0 in the stream, and the second object is at offset
> (length of first object). This has problems in that you would
> need to make sure your Spi was always asked first since you
> are using standard input objects.
are you saying that the custom ImageInputStream created by your
hypothetical SPI would contain the bytes of each file concatenated
into a single byte? such a solution is not feasible simply
because of the potential size of images. Reading every one of
these related files into memory is simply unacceptable. Plus,
in the RPF format for example, you don't know all of the files
without reading data in a format-specific way.
> You will also have problems with some of the static read()
> methods in ImageIO though, since those create the stream from
> the source (e.g.
> url.getInputStream()) before asking any SPIs if they can read
> a URL. But it will work for others (e.g. ImageIO.read(File)
> ). You may be able to instanceof on the Stream and some
> casting, but this is going to fail in the arbitrary case.
Yes, we don't support ImageIO.read() for the formats that require
this extra functionality of finding multiple files. Since our
applications don't use this method, and we can live with not
supporting this for our API users, we are fine without it.
I don't see how you could support it anyway in this case --
to reiterate, some piece of code SOMEWHERE has to know the
format specifics to know what files are needed to open the
image. And if you're using the ImageInputStreamSpi, you don't
know anything format specific. You can't make one
ImageInputStreamSpi for each of your supported formats, if
only because the API chooses one SPI over another based only
on the java class. So you can't have two different format SPIs
both accept String or File, since the ImageIO.createImageInputStream()
will always choose the same one.
(This is actually an argument for a chain of responsibility type
pattern in ImageIO -- where each Spi has a method like canDecodeInput(),
perhaps acceptInput(Object) that lets it accept or decline to provide
an ImageInputStream for a given input. This gets you at least part
of the way there, but it just isn't in the API now and couldn't
possibly get in before Java 6, which is impossible to rely on.)
My basic point is that yes, the price of flexibility is software
that is more complex. That is a basic tradeoff of software
design, and in my case I tend to err on the side of flexibility.
That's why the command-line programs I've created in the past
tend to have two dozen extra options, even though no one ever
uses them. ;-)
> In general, this seems like a bad image format design. All
> related files should be appended into a single file with an
> index structure at the start. Far easier to work with and
> less chance for corruption.
to some degree i agree. i don't make the image formats, i just
read them. ;-)
but on the other hand, a single huge file can be problematic
where a handful of smaller files might not be. take, for example,
the Landsat format (another case of the "associated files" problem).
this multispectral format places each band in a separate file.
some info here: http://earth.esa.int/services/esa_doc/doc_tpm.html
in this format, you might want to independently read only, say,
bands 1 3 and 4 (or some other arbitrary band combination). placing
them in separate files could save on data transmission/storage.
the same is even more true when dealing with hyperspectral imagery.
those are my thoughts.
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com