Project: High Performance Dispatch in GC Tracing Loops

Project: High Performance Dispatch in GC Tracing Loops

Overview

The most performance-critical part of a tracing garbage collector is in its tracing loop. This loop works through a list of objects (a mark stack), and for each, takes it off the list and processes it. The processing of the object will typically involve first checking whether the object has already been visited, and then if it has not, it will do a number of different things depending on the algorithm. It may move the object (conditional on various things), and it will typically identify each of the references within the object (if any) and add those to the mark stack for subsequent processing. The details will vary from algorithm to algorithm and implementation to implementation. This step may be repeated millions of times for each garbage collection, which is why it is performance-critical.

The different treatment of individual objects can be thought of in terms of two distinct concerns: a) the shape of the particular object (how many pointers does this particular object have and where are they located?), and b) the policy to be applied to this particular object (for example, during a nursery garbage collection, non-nursery objects can be ignored completely). The paper below on scanning techniques evaluates options for efficiently dealing with the first of these (a) in some detail. The second (b) has not been studied in detail as far as I am aware.

The paper below contains a neat trick (there’s only a paragraph or so describing it), which removes a level of indirection. Normally, to identify the shape of the object in a language like Java, we must look at the object’s type, which means indirecting from a type pointer that is stored in the header of every object. The paper studies the different shapes and ranks them by importance, identifying that the most frequently encountered shapes (e.g. no pointers, all pointers, just the first word is a pointer, etc) can be encoded in just a few bits. This means that with say, four bits, we can encode 15 different shapes (plus one bit pattern to identify all other cases, which are treated using the standard method), and this will cover the vast majority of objects encountered (near 100%). The clever part of the trick is that we encode those four (N) bits in the address of the type. We do this at the time we allocate the type object (which of course must be fore any object of that type is allocated). We calculate the shape of objects of that type and figure out its N-bit encoding, then when we request space in which to store the type object, we use a special allocator that allocates the space in such a way that the shape encoding is reflected in the address. Having done that, object tracing is faster because when we look at an object we can simply look at the type pointer, mask out the shape encoding and then trace the object (unless the encoding says that this is the exceptional case, in which case we use the standard method of dereferencing the type to find out how to trace the object).

This project is different and complementary to the above, but it shares some of the same spirit.

Garbage collectors often colocate objects that are subject to the same policy (eg young objects are collocated). In this project, we will use special allocators to ensure the high order bits of the address of each object reflect the policy that should be applied to the object. This means that when we trace an object, we can simply look at the high order bits of the object’s address to determine how to trace it. This could be done a number of different ways. It could be done using a switch statement, or an array of function pointers (with the high order bits indexing the switch or the array).

Themes

  • Garbage collection

Requirements

  • Coding This project will be conducted in MMTk, which is written in Rust. Prior experience with Rust will be helpful but is not essential.

  • Analysis We will use our labs performance analysis frameworks to evaluate the performance of different implementations.

Project Length

This could be a one or two semester project, or an initial project for a performance-focussed PhD.

Project Outputs

The project will result in code for the open source MMTk project. If the analysis goes well, we’ll write the work up in a similar style to the paper below.

References