In this part of the documentation, new feature, design & architecture proposals are worked out.
Each proposal should contain at least to following topics:
Anyone is invited to comment on the proposals.
A while ago I had a suggestion for garbage collection that relied on the MMU units of modern processors. EPR didn't like this for want of a simple/generic solution. So this is a second attempt.
Deep breath, here goes....
Introduction
There are two goals of a OS Java GC:
I hope people agree with me so far. So what is the state of play with current algorithms? They mostly seem to be based on the Generational GC principle. That is only GC the memory allocated since the last GC unless the system is out of memory, in which case collect the generation before that and so on.
The problem with this approach, is it delays the inevitable. Eventually a full system GC is required. This halts the system for a large amount of time. Also a simple write barrier is required on pointer writes. This is so the roots to the lastest generation can be detected.
Having said this, generational GC is a good system that works in most environments, it's efficient and modern VM's use generational GC. In my opinion generational GC would work very well with a OS VM if the pause time for the occasional full memory GC could be mitigated. This is what my proposal concerns.
Overview
So lets interleave a slow running full system GC with normal program operation. The problem is the directed connectivaty graph of the program is constantly changing. By requiring to much co-operation between the mutator (running threads) and the GC slows down the mutator. You might end up with a fine grained GC algorithm with no pause times but the whole system would run slowly.
I see the answer as a compromise. Break the memory into chunks. The GC can halt the entire system while it collects the chunk. The bigger the chunk the bigger the pause time but the more efficient the overall system. The advantage of this approach is the integration of the mutator with the GC is very small. In fact no larger than would be required with a traditional generational GC algorithm.
Trapping intra-block pointer writes
So elaborating on the chunk idea, what is required is that we trap all pointer references between chunks. by doing this we have a set of all possible roots to a chunk. For efficiencies sake lets assume all chunks are the same size and are a power of 2. There are no gaps between the chunks and the first starts at byte n * chunksize. Location 0 is memory is a bad page to trap null pointer exceptions. what we're essentially talking about is memory blocks.
Its possible to trap intra-block pointers with three instructions on every pointer write. A smart compiler can cut down the times even this check is done by using the notion that pointer copying local to an object can never trigger the case. This assumes that objects never cross block boundaries. There are exceptions for this, for instance large objects bigger than the blocksize but these can be handled separately.
The code to trap intra block pointer writes looks like the following:
xor sourcePointer, destPointer
or result, blockSizeMinus1
jnz codeToHandleIntraBlockPointers
As people can see, including the jump its only three instructions on the x86 (I think!!!)
This only has to be triggered when a member pointer of one class is set. Not for intermediate local variables.
Storing intra-block pointer writes
Pointers that are detected as being intra-block need to be stored for later analysis. What this document proposes, is to have a system wide array of pointers with as many elements in it as blocks. The size of this array would be determined by the following equation:
size of array = amount of system memory / size of block
Each element in the array corresponds to a block in memory. Each array element contains a list of pointers pointing to elements held in the corresponding block.
The address of the source pointer is added to the linked list pointed to by the array element that corresponds to the block that contains the destination pointer. The effect of this is each block now has a set of addresses of pointers pointing into it. Of course there can be duplicates, but the critical thing is this list is time ordered. Thats very important.
Now the elements in these lists do not get modified by the mutator after they are created. This means that a thread running in parrallel can scan these lists and do the following:
This will trim the list down to the bare root set of a given block. The time taken to do the above is proportional to the size of the list which is directly proportional to the number of intra-block pointer writes. Essentially what we're doing is delaying the processing of the lists so we can process a longer list in one go. This increases the change of duplicates in the list and therefore can make the process a lot more efficient. We can also run the list scanning algorithms on different threads therefore processors and possibly schedule more at system idle time.
But how are duplicates in the list detected and obsolete references.
Firstly to detect duplicates. With modern architectures we generally cant locate an object across a machine word. This means for instance that on a 32 bit architecture which is 4 bytes the bottom 3 bits of all pointers will be zero. This could be used as an information store. When scanning the list the pointers can be marked. If a pointer is allready marked dont add it to the new list thats building.
Secondly what about obsolete references? This is simple, if the pointer points to some other block now, its obsolete.
So the algorithm so far is this. Run a paralled thread to the mutators to cut down the pointers into a certain block. This incoming pointer list for the block will keep growing so we can say as soon as a certain percentage of the list is processed all the mutators are halted. The next step is to process the remained part of the list for the block we are just about to garbage collect. Now contained in that list we should have a valid set of roots for the block. We can garbage collect the block and move all the reachable objects to the new heap thats growing. Fix up the roots to point to the new object locations and the old block is now ready to be recycled.
the more observant readers will have asked about intra-block cycles. The technique I use is to have two categories of reached objects:
The idea of strongly reached objects is that there was a proveable link to the object sometime during the last cycle. Weakly reached objects could either be junk or referenced. Strong reached objects are copied out to a new heap. Weakly reached objects can either be left of place or copied to a weakly reached heap. When we have no referenced from the strongly reached haep to anywhere else we know we can stop garbage collecting.
... more to come ...
This is where the graphics framework requirements & specification will come.
Current project members:
My current thoughts on the graphics framework is that the graphics driver is passed the basic hardware resources it needs. It cannot access any other hardware resources.
The driver needs to provide a list of supported video modes, resolutions and refresh rates. The driver might also provide infromation about the attached video display device. E.g. flat panel, resolution, make model refresh rate etc.
There needs to be a standard way to query the driver about the supported display modes. Either we have the display driver implementing an interface or have an abstract method on a base class.
The DisplayMode interface might have the following methods:
The above interface begs the question whether there should be two sub-interfaces, TextDisplayMode and GraphicsDisplayMode. Should the graphics driver export the two lists separately? Most graphics cards enter a text display mode by default I think. Some export their own special text display modes.
Coments from Valentin:
My Opinion is that we should pack this informations that the driver can send back into an external class called GraphichDriverInfo and that we should have 2 other interfaces called TextDisplayMode and GraphicsDisplayMode that extend DisplayMode interface. So the list of classes/interfaces(and there methods) should look like this:
Everyone's input on this is wellcomed.
Goal
Isolates are intended to seperate java programs from each other, in order to protect java programs from (intentional) errors in other programs.
Proposal (in concept)
I propose to make an Isolates implementation that is somewhere between full isolatation and full sharing. With the goal in mind, programs should be protected from each other, which means that they should not interfere each other in terms of resources, which are typically:
This means that everything that can be used to interfere other programs on these items must somehow be protected. Everything else must be as shared as possible, in order to minimize resource usage.
Think can be achieved by isolating certain resources, and making serveral services aware of the fact that there are isolates. E.g. a network stack should be aware that network bandwidth is shared nicely across all users, in order to prevent a single program from eating away all network bandwitdh.
What is Isolated v.s. Shared
Isolated:
Shared:
Explaination:
Static variables in JNode are implemented in statics tables. The index (of a static variable) in this table must be constant across all isolates, otherwise the compiled code needs to retrieve the index on every static variable get/set which is very expensive. Havig said this, this does imply that statics table will become pretty large. This can be solved by implementing a cleanup strategy that frees indexes when they are no longer used.
For the rest, the seperation is made in terms of publicly accessible versus internal. E.g. a java program can synchronize on a Class instance, but since these are isolated, this will not block all other programs.
Isolated java heaps will have significant implications on the code of the shared services, so this will something to do in step 2.
Isolating memory allocation will cause some problems that need to be dealed with.
Problems
When an object is allocated in one isolate, and this isolate is closed, the object is also removed. But what if this object is registered somewhere in another isolate?
Answer: A problem. Objects may not cross isolate bounderies.
The index.xml file will contains at least one entry for each installed module and all created clusters. This is should look pretty much like a database(maybe we could use here a small xml database engine). The index file is loaded at first use of the installer and should be cached in memory. The index.xml structure should be something like this:
The process of installing a new application should be no more than downloading the jnlp file, copy the files described there and than modify the index.xml file to add the new installed application. Here we could have problems if another application with the same name is installed. In this case I think we should just ask the user for an alternative name. Maybe in the index.xml we could store not just 1 name but 2: originalname="app1" and name="editor". In this way one could install applications with the same jnlp name. A trickier problem is version handling.
The most important operation that the installer must support is updating the packages which are already installed. This operation must care for all dependant packages and should start by downloading the new jnlp file into a temp folder. Let’s assume the following scenario:
The user runs in console the following command:
"/> installer –u system"
What the installer should do first is find what is associated with name system. Lets say that system is a cluster of modules than the installer should update all modules from “system†cluster. This means finding all the modules that belong to this cluster or sub-clusters read their updateURL entry and download the new JNLP files and execute them. In the process of installation of a new version the installer must check if the external resources of the new version have the version described in the JNLP file and if they don’t than this resources must be updated as well. The update of resources must be done only if no other module uses the current version of the resource. In our example lets say that module service-manager has a new version which needs ant version 1.6 but locally we have ant 1.5 installed than ant must be updated as well if no other modules use ant 1.5. If ant 1.5 is still used than the new version of ant must be installed separately and the old version kept.
Addendums
markhale: Details on use of JNLP files.
Three types of modules have been identified; applications/applets, libraries and system level plugins (includes drivers). The purpose of this note is to detail how each type of module is described by a JNLP file.
To allow for finer-grain control of security, introduce jnode:permission child element of jnlp security element.
jnode:permission will have the following required attributes: class, name, actions.
It is time for us to start thinking about how JNode should run from a normal harddisk based system, how it should be installed and maintained.
An essential part of this, is how and where system files should be stored, and what system files are. This page answers the question of what system files are and proposes a tree structure for system files to be stored.
What are system files
In JNode there are a few different types of system files. These are:
Tree structure
In JNode system files should be placed in a container in a structure that is "well known".
A container can be a filesystem, but can also be some kind of database. See the next paragraph for a discussion on this.
The proposed tree is as follows:
/jnode | The root of the container |
/jnode/system | Contains kernel and the initial jars |
/jnode/plugins | Contains the plugins |
/jnode/boot | Contains the bootloader files |
/jnode/config | Contains the configuration data |
System file container
Traditionally the system files are stored in a directory on a "root" filesystem. This filesystem is identified by a parameter, or some flag in a partition table.
This method is easy, because all normal filesystem tools can be used on them, but makes it harder to protect these files against virusses, ignorent users etc. Also this method limits the system files to being harddisk based. (E.g. look at the trouble Linux had to have an NFS root filesystem).
For these reasons, I propose a more generic method, that of an abstract container interface for system file access. This interface can have implementation ranging from a single file based container to a network loader. The essential part is that the actual implementation is hidden from the part of JNode that uses it (either to load files, or to install them).
This is all for now, please comment on this proposal.
Ewout
This is a branch for the proposals of the networking subsystem
Networking framework
The goals of this proposal:
- Flexibility, more capabilities
- Simplicity, better Object-Oriented design
The following are the 3 basic types of entities within this framework:
- Network devices
- Network filters
- Applications.
The network devices are the device drivers of the Network interfaces or virtual network drivers (like the local loopback). We assume that a device driver like that takes a packet stored in a memory buffer and write it on the media (wire, air or anything else). Also it receives a packet from the media and writes it to a memory buffer. The device drivers that do some job of the networking system, such as checksum calculation or crypto work, may be treated also as Network filters (they will be discussed later).
At the opposite hand are the applications. This is any software that creates packets and injects them, or receives a packet from the networking subsystem. For example: a ping command, or dns server software, and also the java SocketImpl system.
Both Network device drivers and applications are the endpoints of this model. From them the packets are coming in and out of the networking subsystem of the OS. Between them we have the Filters.
The filters take packets from any component and forward them to other components. A subcategory of them, are the protocol stacks. The filters are the components that are used to transform the packets or do other things with these packets. For example a TCP/IP stack is a filter that parses the incoming byte arrays (RawPacket from the device driver) to TCP/IP packets, or encapsulates data into TCP/IP packets to be sent later over another protocol. There is more that the TCP/IP stack will do internally, but it’s a matter of the TCP/IP stack implementation and not of this framework proposal.
These filters have one or more PacketStreams. Most components may have two packet streams. Any packet stream implements two interfaces, the PacketListener and the PacketFeeder. Any PacketListener may be registered to a feeder to receive packets from him. This way we can have Chains of packet streams, some of them may split (?)*. Also a filter may be just a PacketFeeder or PacketListener. For example a component that captures packets directly to the filesystem, a counter for some specific packets, a traffic generator, etc (but they may not be treated as endpoints).
For performance reasons we can use a “listening under criteria†system. And only when these criteria are matched the feeder will send the packet to this listener. We can have an interface ListenerCriteria with a simple listener specific method that reads some bytes or anything else from the incoming packet and return true or false. This method will be called from the packet feeder, before he sends the packet to the listener. For example an IPListenerCriteria will check the ethertype bytes if the packet feeder is EthernetLayer. Or another ListenerCriteria implementer may check the class of a packet to see if it is instance of the ICMPPacket. Listening under criteria will be a way to filter packets that will be passed from stream to stream.
The PacketFeeder’s will have a registry, where they store the currently registered listeners to them. For this registry I suggest to use a List of bindings, where every binding of this list will have as key a PacketCriteria type instance and as value a list of the PacketListeners that are listening under these criteria.
For performance reasons, a packet feeder may have two or more different registries of listeners, one for the high priority listeners and one for the others or one registry for the protocol to protocol submission and one for the others (and if they exist).
To avoid deadlocks between components and performance degradation, when a feeder passes a packet to a listener, the listener may use incoming queue and have its own thread to handle the packet. Except if the packet handling that he would do is something very quick.
Another issue, is how all this relations between the components will be managed. A command or a set of commands will be useful. This is a part of Ifconfig mainly.
The result of all these will be a web of networking components, where everyone can communicate to everyone. Think the possibilities, I have found many already.
This is an abstract of what I am thinking. The details is better to be discussed here.
Pavlos Georgiadis
Packet representation
The goal of this proposal is mainly to speed up the packet handling and to provide the developers a simpler and more Object Oriented packet representation. It aims to remove the current SocketBuffer and replace it with Packets.
Currently the packets are represented with the SocketBuffer class. The SocketBuffer is something like a dynamic buffer (to be more accurate it is more like a list of dynamic buffers). When a program wants to send a network packet, it creates the protocol headers, and then the headers are inserted in this buffer list with the packet payload (if there is a payload). Finally the NIC driver copies all the bytes of the packet into another fixed array to be used from the driver.
When we send a packet we move all the packet data 2 times (from the headers to the SocketBuffer and from it to the final fixed array). When we receive a packet, the SocketBuffer acts like a fixed array, which provides us with some useful methods to read data from him.
What I suggest is to represent the packets as…Packets. All the protocol packets are the same thing. They have a header and a payload (some of them have a tail too). Every packet is able to know its size and how to handle its data (including the payload).
So let’s say we have the interface Packet.
What we need from the Packet implementers:
- Set and Get methods for the data of the packets (class specific and wont be in the Packet interface)
- A way to store the data in an object and to have a method that will return the entire packet as a byte[] (when we send the packet).
- Methods that will parse a byte array to create the Packet object (when we receive a packet).
Any packet is represented with a class that implements the Packet interface. For example IPv4Packet, ICMPPacket, TCPPacket etc. Every component of the networking can access the data of this packet with the appropriate set and get methods (If a program uses a packet knows it’s specific assessors).
To accomplish the second and the third we need the following methods for the interface Packet:
public void getData(byte[] b);
public void getData(byte[] b, int offset);
public void getSize();
The getData(byte[] b) writes the packet in the given byte array. If the length of this array is not enough we can throw an exception. The getData(byte[] b, int offset) writes a packet to the array b, starting from the offset position. The getSize method will return the length of this packet, including the length of its payload.
These methods will be called mainly from the network drivers. This is the point where the packet is converted from object to a memory buffer and vice versa.
When the Ethernet protocol sends a packet to the driver the driver will call the getSize() of the Ethernet packet to determine how big will be the array that will store the entire packet. The Ethernet packet getSize() method will return for example 14+payload.getSize(). Remember that the payload is also a packet, let’s say an IP packet that may return for example 20+payload.getSize(). As the driver has determine the length of the entire packet, it will create the memory buffer b and it will call the getData(byte[] b) to the Ethernet packet, which will write the first 14 bytes and also it will internally call payload.getData(b, 14). This way we move the data only once, from the objects to the final byte array.
A common class that implements the Packet interface is the RawPacket, which is a packet that maps a byte array, or a portion of this array as packet.
The Raw packet will be mainly used:
- To store the data payload of a packet (for example the data payload of a TCP packet)
- To map a received packet before it will be parsed from the networking components.
A practical example for the second:
When a packet is received from a NIC, the driver will create a RawPacket with the array that stores the currently received frame. Later this RawPacket will be send for example to the Ethernet protocol, which will parse the first 14 bytes to its attributes and create another (or modify the same) RawPacket that will map the same byte[] from the position 15 to the end. This RawPacket will be the payload of this Ethernet packet, that later will be send to the IPv4 stack for example and so on.
Pavlos Georgiadis