This part contains the technical documentation of the JNode virtual machine.
Arrays are allocated just like normal java objects. The number of elements of the array is stored as the first (int) instance variable of the object. The actual data is located just after this length field.
Bytecodes that work on arrays are do the index checking. E.g. on the X86 this is implemented using the bound instruction.
Each class is represented by an internal structure of class information, method information and field information. All this information is stored in normal java objects.
Every object is located somewhere in the heap. It consists of an object header and space for instance variables.
At the start of each method invocation a frame for that method is created on the stack. This frame contains references to the calling frame and contains a magic number that is used to differentiate between compiled code invocations and interpreted invocations.
When an exception is thrown, the exception table of the current method is inspected. When an exception handler is found, the calculation stack is cleaned and code executing continues at the handler address.
When no suitable exception handler is found in the current method, the stackframe of the current method is destroyed and the process continues at the calling method.
When stacktrace of an exception is created from the frames of each method invocation. A class called VmStackFrame has exactly the same layout as a frame on the stack and is used to enumerate all method-invcocations.
JNode uses a simple mark&sweep collector. You can read on the differences, used terms and some general implementation details at wikipedia. In these terms JNode uses a non-moving stop-the-world conservative garbage collector.
About the JNode memory manager you should know the following: There is org.jnode.vm.memmgr.def.VmBootHeap. This class manages all objects that got allocated during the bootimage creation. VmDefaultHeap contains objects allocated during runtime. Each Object on the heap has a header that contains some extra information about the heap. There's information about the object's type, a reference to a monitor (if present) and the object's color (see wikipedia). JNode objects can have one of 4 different colors and one extra finalization bit. The values are defined in org.jnode.vm.classmgr.ObjectFlags.
At the beginning of a gc cycle all objects are either WHITE (i.e. not visited/newly allocated) or YELLOW (this object is awaiting finalization).
The main entry point for the gc is org.jnode.vm.memmgr.def.GCManager#gc() which triggers the gc run. As you can see one of the first things in gc() is a call to "helper.stopThreadsAtSafePoint();" which stops all threads except the garbace collector. The collection then is divided into 3 phases: markHeap, sweep and cleanup. The two optional verify calls at the beginning and end are used for debugging, to watch if the heap is consistent.
The mark phase now has to mark all reachable objects. For that JNode uses org.jnode.vm.memmgr.def.GCStack (the so called mark stack) using a breadth-first search (BFS) on the reference graph. At the beginning all roots get marked where roots are all references in static variables or any reference on any Thread's stack. Using the visitor pattern the Method org.jnode.vm.memmgr.def.GCMarkVisitor#visit get called for each object. If the object is BLACK (object was visited before and all its children got visited. Mind: This does not mean that the children's children got visited!) we simply return and continue with the next reference in the 'root set'. If the object is GREY (object got visited before, but not all children) or in the 'root set' the object gets pushed on the mark stack and mark() gets called.
Let's make another step down and examine the mark() method. It first pops an Object of the mark stack and trys to get the object type. For all children (either references in Object arrays or fields of Objects) the processChild gets called and each WHITE (not visited yet) object gets modified to be GREY. After that the object gets pushed on the mark stack. It is important to understand at that point that the mark stack might overflow! If that happens the mark stack simply discards the object to push and remembers the overflow. Back at the mark method we know one thing for sure: All children of the current object are marked GREY (or even BLACK from a previous mark()) and this is even true if the mark stack had an overflow. After examining the object's Monitor and TIB it can be turned BLACK.
Back at GCManager#markHeap() we're either finished with marking the object or the mark stack had an overflow. In the case it had an overflow we have to repeat the mark phase. Since many objects now are allready BLACK it is less likly the stack will overflow again but there's one important point to consider: All roots got marked BLACK but as said above not all children's children need to be BLACK and might be GREY or even WHITE. That's why we have to walk all heaps too in the second iteration.
At the end of the mark phase all objects are either BLACK (reachable) or WHITE (not reachable) so the WHITE ones can be removed.
The sweep again walks the heap (this time without the 'root set' as they do not contain garbage by definition) and again visits each object via org.jnode.vm.memmgr.def.GCSweepVisitor#visit. As WHITE objects are not reachable anymore it first tests if the object got previously finalized. If it was it will be freed, if not and the object has a Finalizer it will be marked YELLOW (Awaiting finalization) else it will be freed too. If the object is neither WHITE nor YELLOW it will be marked WHITE for the next gc cycle.
The cleanup phase at the end sets all objects in the bootHeap to WHITE (as they will not be swept above) as they might be BLACK and afterwards calls defragment() for every heap.
Some other thoughts regarding the JNode implementation include:
It should be also noted that JNode does not know about the stack's details. I.e. if the mark phase visits all objects of a Thread's stack it never knows for a value if it is a reference or a simple int,float,.. value. This is why the JNode garbage collector can be called conservative. Every value on the stack might be a reference pointing to a valid object. So even if it is a float on the stack, as we don't know for sure we have to visit the object and run a mark() cycle. This means on the one hand that we might mark memory as reachable that in reality is garbage on the other hand it means that we might point to YELLOW objects from the stack. As YELLOW objects are awaiting finalization (and except the case the finalizer will reactivate the object) they are garbage and so they can not be in the 'root set' (except the case where we have a random value on the stack that we incorrectly consider to be a reference). This is also the reason for the current "workaround" in GCMarkVisitor#visit() where YELLOW objects in the 'root set' trigger error messages instead of killing JNode.
There is some primilary code for WriteBarrier support in JNode. This is a start to make the gc concurrent. If the WriteBarrier is enabled during build time, the JNode JIT will include some special code into the compiled native code. For each bytecode that sets a reference to any field or local the writebarrier gets called and the object gets marked GREY. So the gc will know that the heap changed during mark. It is very tricky to do all that with proper synchronization and the current code still has bugs, which is the reason why it's not activated yet.
This chapter covers the Java security implemented in JNode. This involves the security manager, access controller and privileged actions.
It does not involve user management.
The Java security in JNode is an implementation of the standard Java security API. This means that permissions are checked against an AccessControlContext which contains ProtectionDomain's. See the Security Architecture for more information.
In JNode the security manager is always on. This ensures that permissions are always checked.
The security manager (or better the AccessController) executes the security policy implemented by JNodePolicy. This policy is an implementation of the standard java.security.Policy class.
This policy contains some static permissions (mainly for access to certain system properties) and handles dynamic (plugin) permissions.
The dynamic permissions are plugin based. Every plugin may request certain permissions. The Policy implementation decides if these permissions are granted to the plugin.
To request permissions for a plugin, add an extension to the plugin-descriptor on connected to the "org.jnode.security.permission" extension-point.
This extension has the following structure:
class | The full classname of the permission. e.g. "java.util.PropertyPermission" |
name | The name of the permission. This attribute is permission class dependent. e.g. "os.name" |
actions | The actions of the permission. This attribute is permission class dependent. e.g. "read" |
Multiple permission's can be added to a single extension.
If you need specific permissions, make sure to run that code in a PrivilegedAction. Besides you're own actions, the following standard PrivilegedAction's are available:
gnu.java.security.actions.GetPropertyAction | Wraps System.getProperty |
gnu.java.security.actions.GetIntegerAction | Wraps Integer.getInteger |
gnu.java.security.actions.GetBooleanAction | Wraps Boolean.getBoolean |
gnu.java.security.actions.GetPolicyAction | Wraps Policy.getPolicy |
gnu.java.security.actions.InvokeAction | Wraps Method.invoke |
Multithreading in JNode involves the scheduling of multiple java.lang.Thread instances between 1 or more physical processors. (In reality, multiprocessor support is not yet stable). The current implementation uses the yieldpoint scheduling model as described below.
Yieldpoint scheduling
Yieldpoint scheduling means that every thread checks at certain points (called "yieldpoints") in the native code to see if it should let other threads run. The native code compiler adds yieldpoints into the native code stream at the beginning and end of a method, at backward jumps, and at method invocations. The yieldpoint code checks to see if the "yield" flag has been set for the current thread, and if is has, it issues a yield (software-)interrupt. The kernel takes over and schedules a new thread.
The "yield" flag can be set by a timer interrupt, or by the (kernel) software itself, e.g. to perform an explicit yield or in case of locking synchronization methods.
The scheduler invoked by the (native code) kernel is implemented in the VmProcessor class. This class (one instance for every processor) contains a list of threads ready to run, a list of sleeping threads and a current thread. On a reschedule, the current thread is appended to the end of the ready to run thread-list. Then the sleep list is inspected first for threads that should wake-up. These threads are added to the ready to run thread-list. After that the first thread in the ready to run thread-list is removed and used as current thread. The reschedule method returns and the (native code) kernel does the actual thread switching.
The scheduler itself runs in the context of the kernel and should not be interrupted. A special flag is set to prevent yieldpoints in the scheduler methods themselves from triggering reentrant yieldpoint interrupts. The flag is only cleared when the reschedule is complete
Why use yieldpoint scheduling?
JNode uses yield point scheduling to simplify the implementation of the garbage collector and to reduce the space needed to hold GC descriptors.
When the JNode garbage collector runs, it needs to find all "live" object references so that it can work out which objects are not garbage. A bit later, it needs to update any references for objects that have been moved in memory. Most object references live either in other objects in the heap, or in local variables and parameters held on one of the thread stacks. However, when a thread is interrupted, the contents of the hardware registers are saved in a "register save" area, an this may include object references.
The garbage collector is able to find these reference because the native compiler creates descriptors giving the offsets of references. For each class, there is a descriptor giving the offsets of its reference attributes and statics in their respective frames. For each method or constructor, another descriptor gives the corresponding stack frame layout. But we still have to deal with the saved registers.
If we allowed a JNode thread to be interrupted at any point, the native compiler would need to create descriptors all possible saved register sets. In theory, we might need a different descriptor corresponding to every bytecode. By using yield points, we can guarantee that "yields" only occur at a fixed places, thereby reducing the number of descriptors that that need to be kept.
However, the obvious downside of yieldpoints is the performance penalty of repeatedly testing the "yield" flag, especially when executing a tight loop.
Thread priorities
Thread can have different priorities, ranging from Thread.LOW_PRIORITY to Thread.HIGH_PRIORITY. In JNode these priorities are implemented via the ready to run thread-list. This list is (almost) always sorted on priority, which means that the threads with the highest priority comes first.
There is one exception on this rule, which is in the case of busy-waiting in the synchronization system. Claiming access to a monitor (internals) involves a busy-waiting loop with an explicit yield. This yield ignores the thread priority to avoid starvation of lower-priority threads, which will lead to an endless waiting time for the high priority thread.
Classes involved
The following classes are involved in the scheduling system. All of these classes are in the org.jnode.vm package.
All methods are compiled before being executed. At first, the method is "compiled" to a stub that calls the most basic compiler and then invokes the compiled code.
Better compilers are invoked when the VM detects that a method is invoked often. These compilers perform more optimizations.
JNode has now two different native code compilers for the Intel X86 platform and 1 stub compiler.
STUB is a stub compiler that generates a stub for each method that invokes the L1 compiler for a method and then invokes the generated code itself. This compiler ensures that method are compiled before being executed, but avoids compilation time when the method is not invoked at all.
L1A is a basic compiler that translated java bytecode directly to decent X86 instructions. This compiler uses register allocation and a virtual stack to eliminate much of the stack operations. The focus of this compiler is on fast compilation and reasonably fast generated code.
L2 is an optimizing compiler that focuses on generating very fast code, not on compilation speed. This compiler is currently under construction.
All X86 compilers can be found below the org.jnode.vm.x86.compiler package.
Optimizing compilers use an intermediate representation instead of java bytecodes. The intermediate representation (IR) is an abstract representation of machine operations which are eventually mapped to machine instructions for a particular processor. Many optimizations can be performed without concern for machine details, so the IR is a good start. Additional machine dependent optimizations can be performed at a later stage. In general, the most important optimizations are machine independent, whereas machine dependent optimizations will typically yield lesser gains in performance.
The IR is typically represented as set of multiple operand operations, usually called triples or quads in the literature. The L2 compiler defines an abstract class org.jnode.vm.compiler.ir.quad.Quad to describe an abstract operation. Many concrete implementations are defined, such as BinaryQuad, which represents binary operations, such as a = x + y. Note that the left hand side (lhs) of the operation is also part of the quad.
A set of Quads representing bytecodes for a given method are preprared by org.jnode.vm.compiler.ir.IRGenerator.
The L2 compiler operates in four phases:
1. Generate intermediate representation (IR)
2. Perform second pass optimizations (pass2)
3. Register allocation
4. Generate native code
The first phase parses bytecodes and generates a set of Quads. This phase also performs simple optimizations, such as copy propagation and constant folding.
Pass2 simplifies operands and tries to eliminate dead code.
Register allocation is an attempt to assign live variable ranges to available machine registers. As register access is significantly faster than memory access, register allocation is an important optimization technique. In general, it is not always possible to assign all live variable ranges to machine registers. Variables that cannot be allocated to registers are said to be 'spilled' and must reside in memory.
Code is generated by iterating over the set of IR quads and producing machine instructions.
All new statements used to allocate new objects are forwarded to a HeapManager. This class allocates & initializes the object. The objects are allocated from one of several heaps. Each heap contains objects of various sizes. Allocation is currently as simple as finding the next free space that is large enough to fit all instance variables of the new object and claiming it.
An object is blanked on allocation, so all instance variables are initialized to their default (null) values. Finally the object header is initialized, and the object is returned.
To directly manipulate memory at a given address, a class called Unsafe is used. This class contains native methods to get/set the various java types.
Synchronization involves the implementation of synchronized methods and blocks and the wait, notify, notifyAll method of java.lang.Object.
Both items are implemented using the classes Monitor and MonitorManager.
Lightweight locks
JNode implement a lightweight locking mechanism for synchronized methods and blocks. For this purpose a lockword is added to the header of each object. Depending on the state of the object on which a thread wants to synchronize a different route it taken.
This is in principle how the various states are handled.
All manipulation of the lockword is performed using atomic instructions prefixed with multiprocessor LOCK flags.
When the lockcount part of the lockword is full, an inflated lock is also installed.
Once an object has an inflated lock installed, this inflated lock will always be used.
Wait, notify
Wait and notify(all) requires that the current thread is owner of the object on which wait/notify are invoked. The wait/notify implementation will install an inflated lock on the object if the object does not already have an inflated lock installed.