Typing Theory

Announcing Shredder 0.2.0!

Others — Fri, 13 Aug 2021 23:23:22 GMT

Introduction

I'm happy to announce a new release of shredder! In case you don't know, shredder is a garbage collection library for Rust. It's safe, easy to use, and concurrency friendly. Here is a small example of it in action:

use std::cell::RefCell;

use shredder::{
    number_of_active_handles, 
    number_of_tracked_allocations, 
    run_with_gc_cleanup, 
    Gc, Scan,
};

#[derive(Scan)]
struct Node {
    data: String,
    directed_edges: Vec>>,
}

fn main() {
    // Using `run_with_gc_cleanup` is good practice, since it helps ensure destructors are run 
    // (but this is not required for correctness)
    run_with_gc_cleanup(|| {
        let a = Gc::new(RefCell::new(Node {
            data: "A".to_string(),
            directed_edges: Vec::new(),
        }));

        let b = Gc::new(RefCell::new(Node {
            data: "B".to_string(),
            directed_edges: Vec::new(),
        }));

        // Usually would need `get` for `Gc` data, but `RefCell` is a special case
        a.borrow_mut().directed_edges.push(b.clone());
        b.borrow_mut().directed_edges.push(a);
        // We now have cyclical data!
    });
    // Everything was cleaned up!
    assert_eq!(number_of_tracked_allocations(), 0);
    assert_eq!(number_of_active_handles(), 0);
}

And you can read more on Github or in my previous blog post!

While most of the feature work for this was done a while ago, I'm bad at actually writing notes and releasing things, so this announcement/crates.io release is a bit delayed.

New Features

Trait System Refactor (@Others)

The main goal of this release was to enable some sort of Deref support. After thinking on this for a while, I came to the conclusion this was impossible with the original trait design. In 0.1, there were two critical traits: GcSafe and Scan. GcSafe captured every invariant required for data to be stored in a Gc. But in 0.2, with multiple smart pointers, that design falls apart.

That means that we needed to break down into more marker traits. Now in 0.2 we have:

GcSafe = captures being Send + 'static, or enough like Send + 'static to be safely scanned and managed in the background with exclusive access
GcDrop = can safely run the destructor of this data in the background (allowable to access this data in a Drop implementation run by the collector)
GcDeref = Sync + can be safely stored in a DerefGc (see below)
Scan = captures the scanning logic, as previously

These are of course imprecise definitions of unsafe traits--if you're curious, the exact rules for implementing these can be found in the crate documentation.

DerefGc (@Others)

Now for the most exciting new feature! The new smart pointer DerefGc.

First, a discussion of the issues with deref in the context of shredder. shredder lets you store !Sync data in Gc, so it's not possible to simply implement Deref for Gc (since data is scanned in the background). Further complicating this is the issue of Drop and interior mutability. shredder being able to run Drop implementations in the background is only possible due to the get guard we use to prevent access of data inside a dead Gc. This guard is also needed to prevent interior mutability being used to modify the object graph while the collector is running.

All that said, it is still possible to implement a DerefGc type. However we need to set the following restrictions:

The data stored in DerefGc must be Sync
We can't naively run the Drop implementation of the interior data, or access the DerefGc in any Drop implementation run by the collector
The data inside the DerefGc must be immutable through &T (no interior mutability), except if that interior mutable data is nested in a Gc

To assure these conditions, we utilize the new trait system. DerefGc is !GcDrop, assuring (2). Then, for any T in DerefGc, T must be GcDeref, a new trait capturing condition (1) and (3).

While these restrictions (especially 3) are strict, it still leaves us with as useful a Deref type as I believe is possible with shredder's current restrictions. The nice thing is that accessing data inside a DerefGc can never block (unlike calling get on a regular Gc)

AtomicGc (@Others)

Along with enabling DerefGc, implementing the new trait system also gave me the tools to implement an atomic version of Gc. This is still pretty experimental, but it is pretty similar to crossbeams Atomic type (but with completely different performance characteristics). This type is !GcDrop for similar reasons to DerefGc.

This feature is pretty raw, and I'm pushing it for experimentation. It needs a bit more performance optimization to be truly useful.

Finalize (@Others)

With these new smart pointer types, just using Drop is no longer good enough for shredder. While Gc itself will continue to default to Droping things, many new types are !GcDrop. Thus, like all gc crates before us, we are introducing a Finalize trait.

This is quite similar to Drop, but unsafe to implement, and with many restrictions to prevent unsoundness. (see crate documentation) This is what you'll need for cleanup logic for DerefGc. Gc also has support if you'd prefer not to use Drop.

Derive Improvements (@Others)

The Scan derive has been updated to support enums, and implementing the new traits described above. GcDrop is opt-out, and GcDeref is opt-in. (This is for backwards compatibility reasons.) Please read the crate documentation for details.

Also in this release is a new Finalize derive, which eases implementation of Finalize for structs with finalizable fields.

Both derives also now support enums.

Scan for fXX, bool and BTreeMap and BTreeSet (@alekratz, @DrGabble)

In the realm of "this shows someone else has run this code." Some new types got Scan implementations in this release! There are tons more standard library types that need traits implemented, and these are easy contributions if you want to get started with the codebase.

Unsized Types (@alekratz)

In the most sizable contribution from another human so far, @alekratz contributed support for storing unsized types in Gcs! Since CoerceUnsized is nightly only (with no obvious route to stabilization?), this feature is a lot more useful if you are on the dark side, where you can now store whatever unsized type you want in a Gc:

let data: Gc = Gc::new(0u32);

That being said, we also have janky stable support using the ToScan trait. This comes with a few asterisks. In order to get a Gc on stable you basically need to control the definition of D and your data needs to be 'static. Until we have better ways of working with unsized data on stable, or someone contributes a cleaner hack, that is.

Downcast (@alekratz, @Others)

Along with being able to store trait objects, you also can now downcast them! While it's easy to forget the Any trait, for some applications a level of dynamic reflection is a must. With Gc now supporting unsized types and downcasting, there is definitely some cool interpreter to be built here!

Collector Refactoring (@Others)

As part of onboarding other contributors to the codebase, the core collector code was refactored and better documented. This should helpfully make it easier to read and work on this slightly thorny code. (Which is ultimately just a simple mark and sweep collector, with some Rust specific tweaks.)

Also as part of this work, the concurrent HashMaps we were using behind the scenes were removed. They were replaced with a nifty chunked-linked-list plus arc-swap combo. Huge shoutout to the arc-swap crate, which is a beautiful piece of engineering, and also indirectly powers the DerefGc mentioned above.

Soundness Fixes (@Others)

Also while working in the collector, Inoticed a subtle soundness hole in the collector. The issue was related to how garbage is destructed in the background. Here is a diagram:

The essential problem is that destructors for garbage could hold handles to other garbage data (which would be otherwise unreachable), and send them to other threads. The solution is to mark all garbage as "dead" before running any destructors. (Which is slightly less efficient, but sound.) This allows "zombie" Gcs to exist, but since dead Gcs can't be turned into references, there is no soundness hole.

The Future (0.3.0?)

In the future, I hope to spend some more time optimizing the crate. While both memory usage and CPU usage have been vastly reduced since initial versions, I have more ideas to try and optimize, and I don't want to leave performance on the table. ( I wanted to save this work until 0.2 has baked in the wild .) The two major optimizations are: (1) switching from tracking handles individually to simple handle counts, and (2) shortening the "stop the world" phase of collection with a more greedy approach.

I also feel shredder is reaching the point where a major limiting factor is lack of use cases. This supports my initial hypothesis that garbage collection is not needed in 99% of Rust programs. I do think there is room for me to experiment in places it could be useful. (For example, it'd be awesome to have a concurrent hash-map backed by AtomicGc rather than crossbeam's epoch based reclamation, if only just to compare performance!) And of course if you have a use-case well served by garbage collection, please get in touch!

In terms of feature work, I'd quite like the crate to support no-std use-cases where alloc is available. While there is no fundamental reason why this should be impossible, the pervasive use of rayon, parking-lot and threading internally poses an issue. I think solving this properly requires writing a shim layer to abstract away the concurrent aspects from the collector implementation.

If you're interested in helping with this project (either to work on something mentioned in this post, or a smaller issue), feel free to have a look at the Github issues, where I've documented what known issues and features need some love!

Shredder: Garbage Collection as a Library for Rust

Others — Thu, 11 Jun 2020 04:41:48 GMT

I'm excited to announce shredder, a library for Rust that provides a garbage collected smart pointer, Gc. This smart pointer is useful in the same sorts of situations as Rc, but can handle reference cycles. It's available on crates.io and Github. Here's how you could use it:

use std::cell::RefCell;

use shredder::{
    Gc, Scan,
    run_with_gc_cleanup,
    number_of_active_handles,
    number_of_tracked_allocations,
};

#[derive(Scan)]
struct Node {
    data: String,
    directed_edges: Vec>>,
}

fn main() {
    // Using `run_with_gc_cleanup` is good practice
    // (it helps ensure destructors are run)
    run_with_gc_cleanup(|| {
        let a = Gc::new(RefCell::new(Node {
            data: "A".to_string(),
            directed_edges: Vec::new(),
        }));

        let b = Gc::new(RefCell::new(Node {
            data: "B".to_string(),
            directed_edges: Vec::new(),
        }));

        // Usually would need `get` for `Gc` data,
        // but `RefCell` is a special case
        a.borrow_mut().directed_edges.push(b.clone());
        b.borrow_mut().directed_edges.push(a);
        // We now have cyclical data!
    });
    // Everything was cleaned up!
    assert_eq!(number_of_tracked_allocations(), 0);
    assert_eq!(number_of_active_handles(), 0);
}

In this example we had #[derive(Scan)] on our struct. By implementing this trait, we let shredder know how it can find what Gc handles this data contains. This is an essential part of the tracing garbage collection that shredder implements. After we've done that, using the smart pointer is basically just plug and play!

The only ergonomic issue to note is the need for guard objects. In shredder, you always need a guard object when accessing data inside a Gc. Usually you need to use .get to get a GcGuard. (Guarded access is key to implementing shredder's guarantees in a performant way.) Now, notice how we didn't have to do this in the above example. In general, Gc is only useful with interior mutability, so Gc>, Gc> and Gc> have special methods that let you skip this step (see documentation for more details).

People have experimented with garbage collection in Rust before, but I wasn't happy with existing libraries. I do not believe there is an existing Rust library providing an easy to use garbage collection solution that can robustly work in concurrent programs, as well as handle non-'static data. shredder does that and more. Let me break down what I consider the best features of shredder. (If you're interested in a deep dive into how it all works, I hope to provide a follow post on the design.)

1 Safety First) It should be impossible to cause unsafe behavior in safe code using shredder. The only unsafe parts of the API are writing manual implementations of the core traits: Scan, GcSafe and Finalize. I've done my best to ensure they have complete documentation explaining how to implement them safely.

2 Ergonomics Second) I believe a lot of the existing solutions are fairly hard to grok, requiring complex macros or weird setups with lifetimes. In shredder, Gc is basically just a fancy Arc, and I hope that makes the library easier to use. (And in the future I even want to make the guard object optional, assuming I can make it safe and performant!)

3 Ready for Fearless Concurrency) shredder was always designed to deal with multithreaded programs. You can send a Gc around just as if it was an Arc, and I think that's really cool! Keeping with this, I've designed the collector so that even if a thread holds onto a ton of GcGuards, collection can still proceed as normal. The only operation that should implicitly block on collection in shredder is calling .get (or one of the helper methods wrapping .get).

4 Seamless Destruction) By default, Gc::new only takes 'static data, and when it is found to be unreachable, it is dropped and its destructor is called. (There are safety checks in place to prevent you from exploiting cyclical data in the destructor.) This saves users the pain of having to deal with Finalize in the common case. The run_with_gc_cleanup method is also provided, and helps ensure destructors are run at the end of your program.

5 Clean Finalization) Sometimes, however, you need non-'static data. Unfortunately, since we may scan data after lifetimes have expired, you cannot use raw references. Instead you must replace &'a T with R<'a, T> and &'a mut T with RMut<'a, T>. Outside of Scan and Finalize implementations, these can be used as if they were regular references. Speaking of Finalize, if you implement it, and use Gc::new_with_finalizer, then you can use unsafe code to define code to be run right before your data is deallocated.

6 Concurrent Collection and Destruction) Finally, as a nice side effect of shredders design, automatic collection always happens in a background thread, and hopefully doesn't implement your regular processing too badly. Same goes for running destructors, which might actually be a performance win if your destructors are expensive.

shredder isn't perfect. It needs a lot more performance optimization, and I think the background collection can become even more concurrent in the future. It also has a non-trivial memory overhead. And of course it is very new and immature.

However, I think that's okay for now. I designed shredder to be another tool in the Rust programmer's toolbox. Gc is a very extreme escape hatch that lets you escape usual ownership rules. It's something that should be used carefully in small doses, where giving up the superior performance of RAII memory management is worth it. (Perhaps for user provided data with unpredictable loops, or interestingly shaped graphs.) I'm excited to see what people can do with it!

It's available on crates.io and Github. (If you've read this far, maybe you're interested in contributing? I've put some issues on the Github, and that might be a good place to start!)