Posted by mlam
on February 2, 2007 at 1:41 AM PST
Continuing with the volatile field access bug fix case study.
The problem with having a real job is that I don't always have time to blog. =p And I am also looking forward to wrapping up this thread of discussion so that I can move on to some other topics as well. Unfortunately, because of the sheer amount of information, it will take a few more entries. While I'm still very busy, this discussion will never end if I keep putting it off. So, here's a bit more for today. Let's dive in ...
1. Part I: A Field Get Experience
2. Part II: In a bit of a Volatile Fix!
3. the JIT Architecture Map (1024x768 JPG format)
4. the JIT Architecture Map (PDF format)
By the way, I call this entry Part III (as opposed to Part IV) because I didn't count my last entry on the JIT Architecture Map as being directly about to this bug fix (though its content is relevant).
Status of the fix
As of 2 weeks ago (relative to this writing), I have actually completed this fix and committed the changes to the phoneME Advanced repository already. You should be able to find the details if you check the repository 's commit log for revision 1270. I'll discuss that revision log here.
Summary of changes
Here is the summary of changes exactly as it appears in the revision log:
1. Fix for CR#5080490:
Added support for compiling with volatile 64bit field accesses (includes
potential volatile 64bit field accesses due to unresolved CP entries).
The IR is changed to mark the fieldref nodes with a VOLATILE flag when
appropriate. The JIT backend is changed to emit calls to helpers for the
cases of potential or known 64-bit volatile field accesses. The current
implementation uses CCM helper functions to achieve field atomicity in the
same way that the interpreter does it i.e. using a microlock.
The above essentially describes the strategy that I used to fix this bug. I will discuss this in greater detail in my next entry. There were also a few other related items that I took care of while working on this fix:
2. Added -Xjit:trace=error support for reporting JIT compilation errors to
help with tuning JIT parameters. Some error messages previously reported
under -Xjit:trace=status is now moved under error. New traces are also
added for the other types of JIT errors that can occur.
While working with this code, I discovered that the pre-existing JIT didn't already provide a convenient way to know about why the JIT would fail compilation attempts. One example of causes such failures is the 64-bit volatile field access condition that we're trying to fix here. So, I took the opportunity to add a new tracing option to help with this and future JIT bug-fixing and tuning work. I will talk about the JIT tracing facility and this change in an upcoming entry.
3. Added more test cases to the DoResolveAndClinit JIT unit testsuite.
DoResolveAndClinit is a unit testsuite that I wrote back in the early days when the CVM team was bringing up the JIT for the first time. The suite tests the JIT generated version of VM bytecode instructions that access the constant pool where this access may require constant pool resolution and/or class initialization. These bytecode instructions include all field accesses. However, back then, I had not considered the volatility of the fields, and therefore did not add test cases for volatile fields. The above change is basically to add the volatile field cases. This also enables us to use this testsuite to test the changes to be made for this bug fix.
4. Resolved JCS ambiguity warnings for x86 JIT compilation.
This last change came as a by-product of adding this fix to CVM's x86 JIT as well as its RISC versions (I started the work with the RISC version as my focus). The majority of the changes were in shared code. Hence, the x86 JIT was also affected. Its backend had to be updated in order for it to work with the newly fixed JIT front-end. While doing this, I took the opportunity to resolve some JCS warnings that show up when building the x86 JIT. JCS (Java Code Select) is the compiler compiler that we use for compiling the JIT grammar rules of the CVM JIT.
Here are the changes in greater detail:
- Added tests for volatile putstatic, getstatic, putfield, and getfield.
- Added tests for NULL pointers on putfield, and getfield.
The above are the DoResolveAndClinit testsuite changes.
- Change put/get JNI functions to handle volatile 64-bit fields atomically.
The above jni_impl.c changes were previously discussed in Part II .
- Introduced CVMtraceJITError to trace all the errors that occur during
compilation. This tracing will be enabled with -Xjit:trace=error.
- Moved printing of COMPILATION FAILED status to the place after the
various JIT error cases have been handled. Also, retry later cases
would now cause COMPILATION FAILED to be printed for trace=status.
Previously, it would silently fail.
These are the JIT tracing changes.
- Added flags for marking volatile field accesses.
- Remove old code which throws JIT errors when volatile field accesses
or unresolved fields are encountered. Instead, these field accesses
are marked as potentially volatile using the flag above. There is
no harm (in terms of correctness) in using a
The above are the changes to the JIT front-end. We will discuss these changes in the my next blog entry.
- Added CCM runtime helpers for accessing volatile 64-bit fields.
The ee is needed by the CCM helpers to synchronize for atomicity.
However, the ee is not passed in because we should keep the number of
arguments words within 4 to avoid the complicated argument passing.
Since accesses to volatile 64-bit fields aren't critical to performance,
the CCM helpers can just get the ee themselves.
These are CCM runtime helper changes. The JIT generated code for volatile 64bit field accesses will delegate to the CCM helpers, and these helpers will do the actual work of accessing those fields in an atomic fashion.
- Added support for defining backend specific node typetags.
- Changed massageOpcodes() to map volatile 64bit field accesses to the node
typetag of CVMJITCG_TYPEID_64BITS_VOLATILE so that the grammar rules can
emit synchronized accesses for these cases.
These are changes made in the first part of the JIT back-end: the Opcode Mapper. This will be discussed in a later entry.
- Added grammar rule for handling 64-bit field accesses.
getfield and putfield for 64bit volatiles need to do explicit NULL
checks because the CCM helpers can't throw exceptions without the ccee.
This explicit NULL check is emitted by using a sub-routine refactored from
the NULLCHECK rule's action.
The NULL check is done in the JIT backend because the backend is the
one who determines which getter/setter need special handling in order to
These are the JIT production rule changes needed for this bug fix. These rules will emit code that calls on the CCM helpers. We will also discuss these in a later entry.
- Added case in lexical parser to report illegal characters that were
scan.l is a Lex source file that makes up part of JCS. While cleaning up the JCS warnings for the x86 JIT port, I discovered that some warnings were being generated due to illegal symbols. However, JCS was not forthcoming in telling me where in the grammar source those illegal symbols were located. So, I decided to fix JCS too to make my life easier (and hopefully yours too eventually). If time permits, I will discuss JCS internals briefly in a later entry.
- Added the x86 equivalent of the above portlib changes.
- Also resolved all ambiguity in the x86 grammar rules, and resolved
broken comments that causes jcs to output strange messages during
the build process.
Lastly, these are the x86 JIT changes. I probably won't go into these because they are similar to the RISC versions. However, I reserve the right to change my mind if I think of something interesting to talk about regarding the x86 JIT.
Note: the reason there are different grammar rules for the x86 JIT (in contrast with the RISC grammar rules) is that the x86 instruction architecture is adequately different that the grammar had significant differences. Rather than try to strong-arm the x86 JIT into a RISC framework, it was allowed to exist on its own, thereby allowing for greater deviation as needed. This is not to say that we won't find a way to make it share more with the RISC back-end in the future. But for now, the 2 JIT back-ends are distinct.
Regardless, while there are a lot of differences, there are also a lot of conceptual simularities. Hence, the changes for this fix are essentially the same for the x86 port.
And, of course, any bug fixes or code modifications would not be complete without testing to verify the correctness of the changes, and undergoing code (and/or other types of) reviews. Here are some that were done for this fix:
Tested with DoResolveAndClinit, ... and
the CDC TCK. Also inspected IR and codegen output from running
DoResolveAndClinit. Reviewed by Chris.
Amongst other tests, I ran the newly enhanced DoResolveAndClinit testsuite on the fixed JIT, as well as the CDC TCK to make sure that CVM is still spec compliant.
If you recall, IR (Intermediate Representation) is the output of the JIT front-end, and codegen (code generator) is a common term we used to refer to the JIT back-end. In the above uses of these terms, I meant that I inspected the trace output from the JIT front-end and back-end.
And Chris is my esteemed colleague who did me the favor of the code review.
In my next entry (hopefully soon), we will dig into the IR (JIT front-end) changes. And in subsequent entries, we'll work our way all the way to the back. If you're feeling adventurous, you're welcomed to go inspect the code diffs yourself from the code repository (before I write about them), and see if you can figure it out on your own.
Have a nice day. :-)