Project: Garbage Collector Fast Path Correctness

Project: Garbage Collector Fast Path Correctness

Overview

The fast path describes a performance-critical, frequently-executed portion of code. On the other hand, the slow path is used to describe an important by infrequently taken path in the code. Two places where this pattern is visible in garbage collection implementation is on the boundary between the garbage collector and the runtime, specifically the allocation sequence and the code for write barriers (and read barriers). The former is executed every time the running program allocates a new object, while the latter is executed every time the application write (reads) a pointer. These are very frequent operations, and can often be expressed in just a few machine instructions. Therefore it is highly desirable for these past paths to be inlined into the application code. On the other hand the fast paths will conditionally call a slow path. For example, a common allocator implementation will allocate from a large buffer by simply incrementing a pointer into a preallocated region of memory and checking that the region has not been exhausted (the fast path). Once the region is exhausted, then a relatively expensive operation must be conducted to acquire another region of memory (the slow path). Because the slow path is executed orders of magnitude less frequently and is orders of magnitude more instructions, it makes sense for the slow path to be an out-of-line function call.

MMTk is written in Rust, which offers nice guarantees about safety. However, the VMs that use MMTk may be written in a variety of languages (C++ is one of the most common choices). This language boundary presents a problem for these fast path implementations. The C++ runtime won’t be able to inline the MMTk Rust source code, which leaves the runtime with the choice of either calling the Rust-implemented fast path out of line or re-implementing the fast path in C++ and calling the slow path in Rust. The former option simple and maintains Rust’s safety guarantees, but will perform poorly. The latter option raises two issues. First, the safety guarantees offered by Rust are lost at this critical element of the garbage collector implementation., and second, the fast path source code is now duplicated, possibly numerous times. Right now MMTk assumes runtimes will duplicate the fast path, but considers the Rust implementation of each fast path to be the definitive, canonical implementation, while the others are strictly derivative.

This project will explore how to maximize safety guarantees for the fast path implementations, considering both the change of language and the duplication of code. Many solutions are possible, including a framework for comparing the semantics of a non-Rust fast path with its Rust counter part, and more ambitiously, automatically generating fast paths using the Rust implementation as the template.

Themes

  • Garbage collection
  • Programming Languages

Requirements

  • Coding This project will not involve a large amount of coding, but will depend on deeply understanding the semantics of a few lines of performance and safety critical code in Rust, C++, Java, and perhaps other languages.

  • Analysis This project will require a student who is interested in semantics and programming languages and can think creatively about this problem.

Project Length

This project is very open-ended and could be undertaken as a single semester undergraduate project, or as the basis of a PhD.

Project Outputs

The project may produce tools for helping ensure safety guarantees, and it may produce a published paper.

References

This is a fairly novel problem. I have to do more work to gather some references…