Search |
|||
Mark Lam's blogCVM Object AllocationPosted by mlam on June 16, 2008 at 10:29 PM PDT
In a previous comment, Jamsheed asked, ...
Jamsheed, I presume that you are referring to the piece of allocation code that requests that all threads to reach a GC safe state. You're probably thinking that this is a rather slow operation, and that there are cheaper alternatives. So, why do this? Here's why ... Some background ... In a lot of common GC algorithms (as is the case with the current CVM GC), the GC needs to scan all object pointers in the entire VM in order to determine which objects are still alive and cannot be collected. In order to do this, it needs to ensure that the threads which are doing work won't be doing any work that can move these pointers around in a way that the GC won't know about. These threads are called mutator threads because they mutate (i.e. change) the state of pointers in threads. This is an over-simplified description of what the mutation is, but it's enough to illustrate the point. A GC safe state is a state in which the thread agrees to NOT mutate any object pointers. Hence, when we need to GC, we must first ask all threads to reach a GC safe point. When the threads reach their respective GC safe points, they are said to have entered a GC safe state. And by definition, they won't be mutating the thread (at least, not in any way that gets in the way of the GC). So, effectively, the GC has "stopped the world" ... at least, stopped it from doing anymore mutation until further notice. The threads can still run and do work ... just not any work that mutates object pointers. If it needs to do any mutation, it will block until the GC gives it permission to proceed. So, what has this got to do with object allocation? Fast Allocation However, that only works if there's only one thread that does all the allocation. If you can have more than one thread, then we need to make sure that those threads don't try to bump the pointer at the same time. In order to do this, CVM uses a spinlock microlock (in most target platforms). The spinlock is implemented using a single atomic swap instruction. The atomic swap is used to check a flag and at the same time mark the flag as being locked. Most of the time, different threads aren't trying to allocated at the same time. Hence, the thread who wants the microlock flag will almost always succeed in acquiring it. That thread then quickly bumps the top of heap pointer to do its allocation, and thereafter, release the microlock flag. OK, that sounds nice ... but what happens in the case when the threads do try to allocate at the same time (infrequent as it may be)? The Slow Path One way to do this is to have T2 request a "stop the world" on all threads. After initiating this request, T2 will be blocked waiting for all other threads to reach their GC safe points. Meanwhile T1 is doing its memory allocation in a GC unsafe state. After its allocation is done, T1 sees the "stop the world" request, and responds by entering a GC safe state. Eventually, all threads whould have entered their respective GC safe states, and this will wake T2 up. At this point, T2 can safely proceed with its memory allocation without having to worry about contention from other threads because ... they are all "stopped". To recap: the fast case for memory allocation can only occur while a thread is in a GC unsafe state. If T2 requested a "stop the world" that puts all threads into a GC safe state, then it is guaranteed that no one else will be trying to do a fast allocation at the same time. And since T2 is the one who successfully requested a "stop the world", no other threads can request a "stop the world" at the same time. This guarantees that T2 will be the only one who can do the slow path of the memory allocation. Hence, the "stop the world" request is used in this case as a mechanism to synchronize threads to handle contention for heap resources during memory allocation. So, back to Jamsheed's question ... Why not just use a mutex? Hence, we uses the spinlock flag instead of a real mutex in order to get better allocation performance for JIT compiled code. And as I've pointed out above, most of the time, there will be no contention and we can continue to use the fast path. In the more rare case when contention occurs, the JIT code will transition out to C code to run the slow path which uses a "stop the world" request to synchronize all threads. In Summary ... I hope that helps. =) Regards, »
Related Topics >>
Virtual Machine Comments
Comments are listed in date ascending order (oldest first)
Submitted by jamsheed_mohammed on Thu, 2008-06-19 00:43.
great design !!!
We were using mutex lock in place of spin lock ,so this design aspect was hidden for me :). But we have some issue with Object allocation design with mutex lock. Our stack was getting severely slowed down on long run due to frequent GC ,All these frequent GC was happening due to heap lock contention and GC was holding heap lock at time of contention .Almost all object allocation was happening through GC at this point of time. so i think its better to go with one of the below two approaches when mutex lock comes to play 1.Taking heap lock in GC safe window when contention occur and do the object allocation or, 2.To use separate lock other than heap lock (lock used by GC) for simulataneous object allocation,as Object allocation code is already protected from GC with unsafe window . One more doubt Why dont we use blocking micro lock for contention case? Waiting for your valuable inputs Thanks in advance |
CategoriesArchives |
||
|
|
can't read the whole articles