Skip to main content

FetchType.EAGER behaviour

11 replies [Last post]
danielcroth
Offline
Joined: 2007-07-11

Hi,

We have a fairly complicated tree structure where we would like toplink essentials to fetch the whole tree as fast as possible. The structure is something like this:
Level1 has a Collection
Level2 has a Collection
Level3 has a Collection
etc.

The levels above are entities and the FetchType for the children is EAGER.

When we look at the actual sql toplink generates db(Oracle), we see that the eager doesn't seem to affect the queries as expected. It makes two identical selects and not a join. Hence, the children aren't fetched and are (later) retrieved lazily with 1 select per child (slooow).

We use static weaving, which works fine. (and doesn't make any difference when disabling it)

What are we doing wrong?

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
danielcroth
Offline
Joined: 2007-07-11

This is more exactly what I have (using cnnordins terminology):

entity Level1
Collection level2s (oneToMany)
String name;

entity Level2
Collection level3s (oneToMany)
Level1 parent (manyToOne)
String name;

entity Level3
Level2 parent (manyToOne)
String name;

Lets say I have the following in the database tables:
Level1: id,name
1, "myLevel1"

Level2: id,parent_id references Level1(id),name
1, 1, "myFirstLevel2"
2, 1, "mySecondLevel2"

Level3: id,parent_id references Level2(id),name
1, 1, "myFirstLevel3"
2, 1, "mySecondLevel3"
3, 2 "myThirdLevel3"
4, 2, "myFourthLevel3"

All children collections are set to fetchtype.eager. If everything worked as I thought it would, toplink would produce an sql looking something like this, when getting the Level1 (id=1) object:
select "*" from level1 t1 join level2 t2 on t1.id=t2.parent_id join level3 t3 on t3.id=t2.parent_id where t1.id=1
resulting in:
1, "myLevel1",1, 1, "myFirstLevel2",1, 1, "myFirstLevel3"
1, "myLevel1",1, 1, "myFirstLevel2",2, 1, "mySecondLevel3"
1, "myLevel1",2, 1, "mySecondLevel2",3, 2 "myThirdLevel3"
1, "myLevel1",2, 1, "mySecondLevel2",4, 2, "myFourthLevel3"
and from that build the level1 object with all its children.

However, it doesn't. It simply makes one select for level1 and then one for each level2 object and then one for each level3 object. The same as if I use lazy where the childobjects will be fetched when accessing them.

With the fetch join I can get Level1 with its child objects level2, but not deeper. Toplink will fetch the level3 children of level2 with one select each.

If this is working as intended, how is one supposed to create tree structures with toplink without having to wait minutes for all the selects to be ready?

grammar...

Message was edited by: danielcroth

Marina Vatkina

Toplink is optimized for caching, i.e. for the assumption that they won't need
to go to the database at all. Their assumption is that there is a very good
chance that your data has been previously fetched or created by the application
and when you navigate from Level1 to Level2 or down further, they will find all
related instances in the global (1 per PU) cache, clone them, and you'd be ready
to go much faster than actually doing a join on 3 tables and parsing the result.

HTH,
-marina

glassfish@javadesktop.org wrote:
> This is more exactly what I have (using cnnordins terminology):
>
> entity Level1
> Collection level2s (oneToMany)
> String name;
>
> entity Level2
> Collection level3s (oneToMany)
> Level1 parent (manyToOne)
> String name;
>
> entity Level3
> Level2 parent (manyToOne)
> String name;
>
> Lets say I have the following in the database tables:
> Level1: id,name
> 1, "myLevel1"
>
> Level2: id,parent_id references Level1(id),name
> 1, 1, "myFirstLevel2"
> 2, 1, "mySecondLevel2"
>
> Level3: id,parent_id references Level2(id),name
> 1, 1, "myFirstLevel3"
> 2, 1, "mySecondLevel3"
> 3, 2 "myThirdLevel3"
> 4, 2, "myFourthLevel3"
>
> All children collections are set to fetchtype.eager. If everything worked as I thought it would, toplink would produce an sql looking something like this, when getting the Level1 (id=1) object:
> select "*" from level1 t1 join level2 t2 on t1.id=t2.parent_id join level3 t3 on t3.id=t2.parent_id where t1.id=1
> resulting in:
> 1, "myLevel1",1, 1, "myFirstLevel2",1, 1, "myFirstLevel3"
> 1, "myLevel1",1, 1, "myFirstLevel2",2, 1, "mySecondLevel3"
> 1, "myLevel1",2, 1, "mySecondLevel2",3, 2 "myThirdLevel3"
> 1, "myLevel1",2, 1, "mySecondLevel2",4, 2, "myFourthLevel3"
> and from that build the level1 object with all its children.
>
> However, it doesn't. It simply makes one select for level1 and then one for each level2 object and then one for each level3 object. The same as if I use lazy where the childobjects will be fetched when accessing them.
>
> With the fetch join I can get Level1 with its child objects level2, but not deeper. Toplink will fetch the level3 children of level2 with one select each.
>
> If this is working as intended, how are one supposed to create tree structures with toplink without having to wait minutes for all the selects to be ready?
> [Message sent by forum member 'danielcroth' (danielcroth)]
>
> http://forums.java.net/jive/thread.jspa?messageID=226294
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
> For additional commands, e-mail: users-help@glassfish.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net
For additional commands, e-mail: users-help@glassfish.dev.java.net

danielcroth
Offline
Joined: 2007-07-11

The BIG problem is that the cache must get the objects from somewhere too. And getting objects with single selects is somewhat pointless. Remember, getting many objects at once will NOT remove the usage of cache, insteed it will improve the performance retrieving the objects to it.

Just to clarify with a small example:

em.createQuery("SELECT level1 FROM Level1 AS level1").getResultList();
will result in the following toplink output
[TopLink Fine]: 2007.07.13 01:14:04.782--ServerSession(24147539)--Connection(29931163)--Thread(Thread[http-8080-Processor24,5,main])--SELECT ID, NAME, LASTMODIFIEDDATE, LASTMODIFIEDBY_ID, CCREVISION_ID, REVISION, REVISIONDATE FROM LEVEL1
which is fine. Now we have every single Level1 object in the cache

em.createQuery("SELECT level2 FROM Level2 AS level2").getResultList();
will result in the following toplink output
[TopLink Fine]: 2007.07.13 01:14:07.586--ServerSession(24147539)--Connection(1692531)--Thread(Thread[http-8080-Processor24,5,main])--SELECT ID, NAME, LASTMODIFIEDDATE, LASTMODIFIEDBY_ID, LEVEL1_ID, REVISION, REVISIONDATE FROM LEVEL2
which is fine. Now we have every single Level2 object in the cache.

Level2 level2 = em.find(Level2.class, new Long(123));
will result in toplink retrieving the Level2 object from the cache.

Level1 level1 = level2.getParent();
will result in toplink retrieving the Level2 object from the cache.

Collection level2s = level1.getChildren();
level2s.size();
will result in
[TopLink Fine]: 2007.07.13 01:14:31.468--ServerSession(24147539)--Connection(561973)--Thread(Thread[http-8080-Processor24,5,main])--SELECT ID, NAME, LASTMODIFIEDDATE, LASTMODIFIEDBY_ID, LEVEL1_ID, REVISION, REVISIONDATE FROM LEVEL2 WHERE (LEVEL1_ID = ?) bind => [123]
and this is the big trouble. For each level2 children list, another select will be executed and then further down on level4 another n selects will be executed.

This will result in 1+n selects per exra level which on a three level tree will be 1+n(1+m).
On a binary tree (2 nodes per level) we will have 1+2(1+2(1+2)) selects for three level tree and we can see that for an k level tree we will
have n^k selects, which is far beyond acceptable.

In this specific case with 3 levels with ~ 20nodes per level we get 30^3 = 27000 selects which takes approx 5minutes to execute versus 3 seconds for a single select with joins..

In other words, unless anyone can show how to do this in a different (correct?) way, any application having a tree structure should avoid toplink essentials and use any other JPA implementation which supports this e.g Hibernate.

chris_delahunt
Offline
Joined: 2005-07-06

This is a rather harsh statement for something that could be an object model design issue. Are you certain that you want to do a join 3 or more levels deep each and ever time you execute a simple find on an Entity? Part of the reason 1:M and M:M mapping use lazy by default is to avoid bringing in your entire object tree unneccessarily even on simple find queries.

As Marina mentioned, caching allows TopLink just to return that object if it already exists. So you will get your N+1 queries for eagerly fetched relations on your first read of the entity, but afterward you will get the object from the cache - the joins would not be neccessary. You can avoid some of the load process using fetch join on your initial query, but as mentioned earlier, there is a limit on how deep it can go. If you really need to load each and every one of the 3rd level relations into memory (ie eager), fetch join drops your 30^3 to 30^2.

TopLink Essentials does allow setting mappings to always use joining, but this is configurable through customizers rather than being on by default.

Best Regards,
Chris

danielcroth
Offline
Joined: 2007-07-11

So how is all this achieved with Toplink Essentials?

We have a tree that is supposed to be modified in a web application. Getting the nodes as the tree is traversed by the user is out of the question, it is to slow according to the users. They want to have the whole tree, traverse it a la javascript, make changes to the tree and then save (post back).

The actual tree we have has five levels and there are ~10-20 nodes per level. However, the first level defines which tree, so the displayed tree has 4 levels. The different levels in the tree are totally different so they cannot be of same entity (except for some usual tree properties like parent, children etc).

Given the maths above, even with fetch join it would take users > 3-4 seconds, which is how long one usually wants to wait for a page. (actually it takes > 3-4 minutes...). A join by id between the levels takes Oracle less than a second to perform and send to the web server. Parsing that super table into objects also takes less than a second. (Tested with hibernate).

$100 question is: How is this solved with Toplink Essentials in less than 3-4 minutes? I'd be glad to hear of any approach less than ~100% slower than hibernate, (i.e. 4 seconds) but no matter how we fetch join level by level / force objects in the cache etc. it takes minutes.

If Toplink Essentials could fetch join multiple times, this issue would not even be an issue...

danielcroth
Offline
Joined: 2007-07-11

We resolved this issue by creating the query with toplinks api:

ReadAllQuery raq = new ReadAllQuery(TopLevel.class);
ExpressionBuilder eb = raq.getExpressionBuilder();

Expression level1 = eb.anyOf("level1List");
raq.addJoinedAttribute(level1);

Expression level2 = level1.anyOf("level2List");
raq.addJoinedAttribute(level2);

Expression level3 = level2.anyOf("level3List");
raq.addJoinedAttribute(level3);

Expression level4 = level3.anyOf("level4List");
raq.addJoinedAttribute(level4);

Query jpaQuery = new EJBQueryImpl(raq, (EntityManagerImpl)em.getDelegate());
TopLevel tl = (TopLevel) jpaQuery.getSingleResult();
return tl;

chesterxgchen123
Offline
Joined: 2007-07-05

If you are using Oracle Database, then I think simply use the native query with "select * from myTable start with ... connect by prior ..." is much simpler and faster to query the tree.

gyorke
Offline
Joined: 2005-06-21

Except SQLResultSetMappings do not support FETCH JOINED attributes.
--Gordon

gyorke
Offline
Joined: 2005-06-21

In the case of FetchType.EAGER TopLink is optimised for cache usage. If the related objects are in the cache (which is usually the case if the data is actually being used by the end user) then an in memory query is much faster than processing a FETCH JOIN especially if the join in on a collection as a FETCH JOIN can be quite burdensome on a database and include massive amounts of data especially on multilevel collection joins. If the data is not in the cache then more queries must be performed which can be taxing on a system but in a general application not joining is usually better. Be aware that if the multi-level relationships are joined at the mapping level [i]every[/i] query for the root class will require the multi-level join to be processed.
If there is a particular use case in your application where the multi-level join is required then a JPQL query can be performed that will JOIN the required attributes.
What is your use case for mapping this attributes as always JOINED.
--Gordon

cmnordin
Offline
Joined: 2007-07-11

Sorry for hijacking this thread, but since our problems are so similar I hope you don't mind...

The cache thing is interesting, but not working. And even if it did, wouldn't toplink still have to do a select on the join table between the two entities in more or less all cases where not every row of the join table is cached? And this would be very impractical if you have, say, 10 million rows in that table. And that's assuming you even have a join table, which in itself might be impossible if you are working with a schema that you don't own.

Example code:
(With toplink.cache.type.default set to Full, or anything else)

level2List = em.createQuery("SELECT x FROM Level2 x").getResultList();
level1List = em.createQuery("SELECT x FROM Level1 x").getResultList();

for (Level2 level2 : level1.getLevel2List()) {
....
}

If fetch type is set to LAZY one DB select on Level2 will be issued for each pass through the loop, if fetch type is EAGER all those selects will be issued when level1List is queried above. Setting the cache refresh hint on the queries doesn't make a difference.

As for use cases I would say any use case where you have a tree structure is a valid candidate, and if there is a JPQL query that solves this problem I would love to see it, everything I've tried yields similar results = huge amounts of selects. (Except JOIN FETCH but that only works for trees with 2 levels)

Also, to me this looks as if eager fetching is pointless in most general applications (those not doing very strange things with detached entities), as there is no performance gain in using it over lazy which behaves in exactly the same way when it actually loads the data, but at least has a chance of not selecting data you don't need.

cmnordin
Offline
Joined: 2007-07-11

Hi.

I have almost exactly the same problem. FetchType.EAGER just issues another SELECT instead of using a JOIN.

This behaviour seems very strange and I can't imagine that this is working as intended? If eager fetching refuses to use joins there is _no_ way of loading object trees of more than 2 levels using Toplink essentials, the number of queries to the database is simply unacceptable.