Garbage collection expert

We're looking for an expert in the area of garbage collection to help us create a fast & safe garbage collector. Please contact Ewout

Potential GC algorithm

Sorry me again. I know a few people probably want me to shut up and get on with the graphics implementation, but if I could have my tupence worth... Smiling

I had an idea for Java GC. Of course minimum pause times and overall efficiency are the goals plus portability across platforms.

My idea is to break the memory into chunks and GC each chunk individually. While the chunk is being collected all the threads are halted. This simplifies the interaction between the mutator and the garbage collector. When collecting a block all we need to know are the roots. I suggest using a very thin pointer write barrier and having the block size as a power of 2. This way in three assembler intructions we can jump if the address being written is outside the current block the where the current object being written into is.

Consider the following fake x86 assembler:

XOR reg src, dest

SHR reg log2ChunkSize ; shift right

JNZ differentChunk ; jump not zero

Of course the bigger the blocksize the more likely the pointer write will be local. Care should be taken when compressing to preserve locality. The faster the computer the larger amount of memory it should be able to process in a given timeslice. The bigger the chunksize and the more efficient the Garbage Collection.

Hopefully I'm making sense so far. So why the write barrier across chunks. Well when a pointer is written across chunks, we have a new root into the said chunk. Each chunk has a stack of roots. We just point the source address of the pointer to the root stack.

The observant will notice that these stack can grow quick large and will contain a lot of duplicates and stale pointers. E.g. pointers that are now local or pointing to different chunks. I suggest having a separate low-priority thread that runs through this list prior to the chunk being duplicated and cuts out the chaff. This can be done when all threads are running as new pointers are added onto the end of the list. We can run through and mark all pointers that we've visited.

The more observant of you will note that chunks are compressed and objects moved on but a new object could take its place. The root stack has a location to a pointer, but the pointer is invalid, a new object is in its place. What I suggest for this is that when a chunk is compressed it pushes a special value to the root-stacks of all the other chunks. This says that in this point in time all preceding pointers to that chunk should be ignored. This also means that when pruning root-stacks we go from the newest to the oldest.

Now we have a GC algorithm we can run through memory compressing as we go. We can do full memory GC in parallel with generational GC.

Well we're still not out of the woods because as i'm sure you'll have spotted we cant trap cycles across chunks. With this scheme they'll never be collected.

I've found a way round this, but it'll have to wait to tomorrow as i'm tired and want to go bed. I know if i dont post this now it might be some time before I get round to posting.

If this post makes no sense at all (as I'm sure it doesn't) please feel free to flame and generally call me rude names.

night Nathan

Existing algorithms

There are a couple of existing algorithms that use this kind of idea. The first is
, which uses a very similar barrier and remset to your algorithm. The second is MC^2.

Another algorithm that MMTk implements right now that combines low pause times with good throughput is
Ulterior Reference Counting
(GenRC in MMTk).

For operating systems with paging, bookmarking collection could be of particular interest.

Robin Garner
ANU, Canberra


I have found it, i thinks it is a good reference

0) marked and visited bitset is reset
1) all users thread need mark object in his stack and local table.
GC task mark object referer from static field.
2) now we can restart all users therad (only if the stack is not full).
New object allocate need marcked (work make by the opcode "new object" or is the default value).
A GC task start on all object marked, all object in referance of this object is marked too.
We have an others bitset for mark object visited (for cyrcular ref.)
3) we can free or finnalize all objet no marked
4) sleep all thread and defragment the ram

it is a good way ?