Back

The Case for WeakMap

For quite a long time I hesitated to use WeakMap and WeakSet - they just seemed to be some sort of sorcery only needed for some niche scenarios: The term weak always brought up memories of strong retain cycles which can be an issue in Objective-C and Swift - but why would I need WeakMap in JavaScript? JavaScript is generally garbage-collected: All objects which can't be accessed anymore are removed from time to time, reference cycles are also taken care of. As I got used to JavaScript and its APIs I began to understand what the "weak" part really was about and nowadays I really see opportunities to use WeakMap everywhere: They enable us to implement type-safe good-looking APIs without making it too difficult.

The API

The API surface which WeakMap is providing is pretty small, there are only four methods defined on the prototype of WeakMap: delete, get, has and set (clear is planned, but its browser-support is lacking at the moment). This makes WeakMaps pretty similar to normal Maps, Maps have a couple of additional methods, the most important ones being forEach, keys, values and entries. The common property of these methods is that they allow you to get information about the entries without having a key, whereas all methods of WeakMap require a key.

The methods itself are pretty self-explanatory: get(key) returns the value which was saved for that key, set(key, value) sets value as the new value for key, delete(key) removes `key from the mapping and has(key) returns true if and only if a value is set for key and false otherwise.

Garbage Collection

To understand why WeakMaps are useful we have to understand how memory is accessed in garbage-collected programming languages:

Conceptually objects are just pieces of data stored in some collection, they also each have a unique name, which are just some sort of identifier. If we just consider the programming language they could be anything: A number, a name like Hans or some alphanumeric string - the only important thing is that when I have the name (also called reference) of the object I can access the object itself, reading from it or modifying the piece of data that is stored in that object. In JavaScript many things are objects: Arrays are a sort of object, class instances are objects and we can also create new plain objects with the object literal syntax ({}). When I write

const x = { a: 42 };

in a program what happens conceptually is that a new object is created when the {} expression is evaluated - let's call that object bq42 - the property a of that object is set to the value 42 and the local variable x is set to bq42. Notice that variable doesn't store the object itself, but just its name. When x.a = 13 is executed we'll have to first check which object x refers to (bq42 in this case) and then we can set the property of that object to 42. This point of view also makes it easier to think about shared references:

const x = { a: 42 };
const y = x;
y.a = 13;
console.log(x.a);

What does this program print? And why? Let's go through the program: Initially x and y both are empty and our heap (the collection of objects) is empty as well. The first assignment statement creates a new object (let's call it tf13) in the heap and assigns that reference to x. Our current state is thus x = tf13, y = _ and we have a heap object called tf13. The assignment statement in the next line simply copies the value of x to y. The value of x is tf13 so we simply copy that over, resulting in our local state to be x = tf13, y = tf13. y.a = 13 will look up which object y refers to (it refers to the one called tf13) and will set its property a to 13. There's still only one object so console.log(x.a) will also result in 13. It should be obvious why this program didn't print 42 - with only one object there's no way it could be 42 after we set its a property to a. y = x didn't copy the object, it just copied the reference.

So why do we need a "garbage collector"? I previously mentioned that every time an object literal expression is encountered a new object is created and there are several other cases where this happens: Every time a function is defined, every time an object is constructed, every time a constructor is called. All of these cases add up and all of these objects add up. Computer memory is still finite so at some point no new object can be created. To solve this problem we can make a useful observation: When no one has the name of the object it doesn't make a difference whether it still exists or it doesn't. This is where garbage collectors come into play: They analyze all variables, all places where references could be found and remove objects which aren't required anymore. Modern garbage collectors use sophisticated algorithms so I won't go into detail about their internals.

Let's get back to WeakMap: Like all class instances WeakMaps are objects and it stores values which can be accessed using the get method. Therefore the garbage collector would normally need to preserve the values. If I have two objects a, b, a WeakMap with the mapping a -> b the garbage collector can't remove b yet: There's still a way to access b through the reference of the WeakMap, using the get method. But let's suppose that there's no reference to a anymore and we just have a reference to the WeakMap itself. Now there's no way to access b because no method allows us to get b without having a. Due to the API we always need both a reference to the key and a reference to the WeakMap to get the value corresponding to the key. This also makes it clear why the restriction that WeakMap keys have to be objects exist: Primitives like strings aren't objects and we could construct new instances easily, but then our garbage collection trick doesn't work anymore. If we were to store something in a WeakMap with a key 42 (the number) we would always have a way to get the corresponding value by simply calling get(42) - we can always do that and therefore the garbage collector can only remove the WeakMap entry when there's no reference to the WeakMap itself anymore.

Where's the difference to normal Map? There are methods of Map instances which allow the user to access the list of values without having to have a reference to a key. The fact that these methods are missing is what makes WeakMap so powerful: It can get special treatment by the garbage collector. If our access pattern already allows us to use the WeakMap API we don't even have to think about having unnecessary references.

Use cases

Caching

Let's construct a hypothetical use case: We want to write a function which computes the sum of elements of an array:

function sum(array: number[]) {
  return array.reduce((acc, elem) => acc + elem, 0);
}

So far so good, but when this function is called repeatably with the same array it will always compute the sum again. We could apply caching here, storing the result of previous calls and returning the cached value when there is one:

const sumCache = new Map<number[], number>();

/**
 * Return the sum of the elements. Assumes that if the array was passed to `sum` once
 * its content won't change.
 */
function sum(array: number[]) {
  const existingResult = sumCache.get(array);
  if (existingResult !== undefined) {
    return existingResult;
  }
  const result = array.reduce((acc, elem) => acc + elem, 0);
  sumCache.set(array, result);
  return result;
}

This will work as well, but we have to be careful as the contract of the function changed. Due to the way arrays work in JavaScript this implementation could actually return the wrong value in some cases:

const x = [12, 13];
console.log(sum(x)); // 25
x.push(42);
console.log(sum(x)); // Still 25, cached

This is specified in the documentation of the sum function so it's just something we have to be aware of. So whats the problem now with the sum implementation presented above? It actually holds references to all arrays that have been passed to sum in the lifetime of the program. The array instances are used as keys and in theory we could access the .keys() of the map to get a list of all arrays the function was called on. Although it's very likely that we would want to do this, the arrays are all held in memory without them being used. This is what is called a memory leak: We have objects floating around which won't ever be used. although they could be accessed in theory the garbage collector can't remove them. This is where WeakMap can come in handy: We can simply replace Map with WeakMap and get the expected result: When we have the array we can get the cached value, but without having the array there's no way to get the cached values. The garbage collector can also happily remove the arrays and the cached values when we no longer have a reference to that array.

Membership

Another useful situation where WeakSet can be useful is when we only need to know whether an object has a certain property or not. If we for example would want to compute the set subtraction between two arrays we could use WeakSets to accomplish the task:

const subtract = <T>(x: T[], y: T[]) => {
  const set = new WeakSet<T>();
  for (let i = 0; i < y.length; i++) {
    set.add(y[i]);
  }
  return x.filter((elem) => !set.has(elem));
};

We don't need to iterate over the set here so a WeakMap works fine. A Map works fine as well and has similar performance characteristics in this example, although I would have expected WeakMap to have a slight advantage but it seems like browser optimizations are currently tailored to Map more than to WeakMap.

Implementation

Sometimes I find it interesting to look at the internals of JavaScript implementations to get a feeling for how things work. Engines like v8 are very complex programs, but exploring them a bit can already give you insights into their inner workings: A majority of the code for Map / Set / WeakMap / WeakSet is located here: chromium.googlesource.com, but there's also some code scattered in the src/objects folder, where the Torque code can be found - torque is a language which v8 uses to generate fast code with their JIT compiler.

In v8 both WeakMap and Map are generated from C++, they mostly share the same source code, but they use a different class to store their internal data: The weak variants utilize a EphemeronHashTable whereas the non-weak variants require a sorted hash table variant. The ephemeron part of EphemeronHashTable benefits from the garbage-collection benefits mentioned previously: This is outlined by the following comment: EphemeronHashTable is similar to ObjectHashTable but gets special treatment by the GC. The GC treats its entries as ephemerons: both key and value are weak references, however if the key is strongly reachable its corresponding value is also kept alive.

The source code of the gc scavenger of v8 contains the following comments, giving these objects their special gc treatment:

// Clear ephemeron entries from EphemeronHashTables in new-space whenever the
// entry has a dead new-space key.
void ScavengerCollector::ClearYoungEphemerons(EphemeronTableList* ephemeron_table_list)
...
// Clear ephemeron entries from EphemeronHashTables in old-space whenever the
// entry has a dead new-space key.
void ScavengerCollector::ClearOldEphemerons()

This enables the garbage collector to remove the entry when there is no strong reference to the key. The non-weak variants don't get this special treatment and instead have to rely on a sorted list: In JavaScript the order in which entries, values and keys return their elements is determined by the order in which the elements where inserted, so they need to keep that order stored somewhere to make them work properly.

That's it for today, I hope you learned something and that you'll consider using WeakMap or WeakSet in your next project. Cheers!