Skip to main content

Why doesn't java.io.ObjectOutputStream use weak references?

9 replies [Last post]
cowwoc
Offline
Joined: 2003-08-24
Points: 0

One of the common complaints against Java Serialization is that you get OutOfMemoryError when serializing a large number of objects. The Serialization FAQ indicates this is caused by the [object, handle] table: http://java.sun.com/javase/technologies/core/basic/serializationFAQ.jsp#...

I was wondering why the class can't use a [WeakReference, handle] instead? That is, a table that would remove keys that have been garbage-collected.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
arae
Offline
Joined: 2004-06-22
Points: 0

I've had the situation with a long-lived stream where I've had to do a reset() after every so many writes to prevent these memory problems. If weak references are too expensive then it would be nice to have a new property on ObjectOutputStream to tell it not to bother storing references - a sort of auto-reset.

tricky10
Offline
Joined: 2007-10-10
Points: 0

Hi,
I had the same problem. I don't know if it will resovle your problem but mine did. Why don't you use writeUnshered and readUnshered ?

tackline
Offline
Joined: 2003-06-19
Points: 0

writeUnshared is not deep - it only applies to the immediate object passed in as the argument. So using it is invasive and you can't serialise non-hierarchical graphs

(But if you do want to use it, serialPersistentFields allows you to set unshared on fields used in defaultWriteObject/putFields/etc.)

baffyofdaffy
Offline
Joined: 2009-02-05
Points: 0

So why is the handle table there to begin with?

After the object is written, couldn't it be discarded?

Why does it have to stay referenced?

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

> So why is the handle table there to begin with?
> After the object is written, couldn't it be
> discarded?
> Why does it have to stay referenced?

Because of the guarantees ObjectOutputStream has.
If you have one Object referenced hundred times in the whole object graph, you most likely only would like to send it a single time and let ObjectInputStream restore the references - so ObjectOutputStream needs some way to determine which Objects have already been written.
Imagine you would send an object graph and suddenly on the client side a statement like a.obj1 == a.obj2 would suddenly break whereas it worked on the server-side.

- Clemens

tackline
Offline
Joined: 2003-06-19
Points: 0

Two problems spring to mind.

Firstly, WeakReference is relatively expensive. Serialisation is already slow enough.

Secondly, the serialisation format and ObjectInputStream would have to be altered to achieve useful results. When deserialising the full object graph is in memory. To allow ObjectInputStream to remove handles from its table, extra information would need to be added to the stream. All this hurts performance.

So, the way to use Object(In|Out)putStream is to use them briefly and then reset. Don't let them hang around.

forax
Offline
Joined: 2004-10-07
Points: 0

I don't see the point.

If you are able to split the object graph in multiple parts, you
can write them separately. If you are not able, the whole graph
must be saved, and because you have cycle in it, you have to
remember all instances you have already written.

This have a cost in memory, but i don't see how to workaround that.

Rémi

cowwoc
Offline
Joined: 2003-08-24
Points: 0

Remi,

You bring up a good point but I can't help but wonder whether we're missing something. Why does http://xstream.codehaus.org/objectstream.html read "XStream provides alternative implementations of java.io.ObjectInputStream and java.io.ObjectOutputStream, allowing streams of objects to be serialized or deserialized from XML. This is useful when processing large sets of objects, as only one needs to be in memory at a time."

What you say is probably true for reading or writing a single object with many recursive references but it isn't necessarily true for reading or writing multiple objects when there is no need to keep them alive for long once they have been read or written. This would apply mostly for intermediate values. Maybe the XStream author is also thinking of another use-case we are missing.

I'm not sure I buy tackline's argument that weak references would actually degrade (serialization) performance in a noticeable way. Is there an existing serialization unit test we can profile?

joehni
Offline
Joined: 2004-04-12
Points: 0

> Remi,
>
> You bring up a good point but I can't help but wonder
> whether we're missing something. Why does
> http://xstream.codehaus.org/objectstream.html read
> "XStream provides alternative implementations of
> java.io.ObjectInputStream and
> java.io.ObjectOutputStream, allowing streams of
> objects to be serialized or deserialized from XML.
> This is useful when processing large sets of objects,
> as only one needs to be in memory at a time."

This is a completely different case. XStream implements a custom OOS and the internal caches of the OOS are not setup. Additionally XStream's use case allows only to write complete (sub-)graphs, references between the different writeObject calls are not tracked.

> What you say is probably true for reading or writing
> a single object with many recursive references but it
> isn't necessarily true for reading or writing
> multiple objects when there is no need to keep them
> alive for long once they have been read or written.
> This would apply mostly for intermediate values.
> Maybe the XStream author is also thinking of another
> use-case we are missing.
>
> I'm not sure I buy tackline's argument that weak
> references would actually degrade (serialization)
> performance in a noticeable way. Is there an existing
> serialization unit test we can profile?

In XStream's standard mode we use meanwhile weak pointers for the object cache. If no-one refers an object anymore, we will not see it again also.

However, I cannot verify that weak pointer are so expensive. I did several benchmarks and I had quite no difference (see the xstream-benchmark package). Maybe the XML parsing (resp. the object to string conversion) is much more expensive compared to read/write the byte data of an object, so the weak reference management does not make a difference, but personally I would verify such a statement with a real profiler.