For quite a long time I hesitated to use WeakMap
and WeakSet
- they just seemed to be some sort of sorcery only needed for some niche scenarios: The term weak always brought up memories of strong retain cycles which can be an issue in Objective-C and Swift - but why would I need WeakMap in JavaScript? JavaScript is generally garbage-collected: All objects which can't be accessed anymore are removed from time to time, reference cycles are also taken care of. As I got used to JavaScript and its APIs I began to understand what the "weak" part really was about and nowadays I really see opportunities to use WeakMap everywhere: They enable us to implement type-safe good-looking APIs without making it too difficult.
The API surface which WeakMap
is providing is pretty small, there are only four methods defined on the prototype of WeakMap
: delete
, get
, has
and set
(clear
is planned, but its browser-support is lacking at the moment). This makes WeakMaps
pretty similar to normal Map
s, Map
s have a couple of additional methods, the most important ones being forEach
, keys
, values
and entries
. The common property of these methods is that they allow you to get information about the entries without having a key, whereas all methods of WeakMap
require a key.
The methods itself are pretty self-explanatory: get(key)
returns the value which was saved for that key, set(key, value)
sets value
as the new value for key
, delete(key)
removes `key
from the mapping and has(key)
returns true
if and only if a value is set for key
and false
otherwise.
To understand why WeakMaps are useful we have to understand how memory is accessed in garbage-collected programming languages:
Conceptually objects are just pieces of data stored in some collection, they also each have a unique name, which are just some sort of identifier. If we just consider the programming language they could be anything: A number, a name like Hans or some alphanumeric string - the only important thing is that when I have the name (also called reference) of the object I can access the object itself, reading from it or modifying the piece of data that is stored in that object. In JavaScript many things are objects: Arrays are a sort of object, class instances are objects and we can also create new plain objects with the object literal syntax ({}
). When I write
const x = { a: 42 };
in a program what happens conceptually is that a new object is created when the {}
expression is evaluated - let's call that object bq42 - the property a
of that object is set to the value 42
and the local variable x
is set to bq42. Notice that variable doesn't store the object itself, but just its name. When x.a = 13
is executed we'll have to first check which object x
refers to (bq42 in this case) and then we can set the property of that object to 42
. This point of view also makes it easier to think about shared references:
const x = { a: 42 };
const y = x;
y.a = 13;
console.log(x.a);
What does this program print? And why? Let's go through the program: Initially x
and y
both are empty and our heap (the collection of objects) is empty as well. The first assignment statement creates a new object (let's call it tf13) in the heap and assigns that reference to x
. Our current state is thus x = tf13, y = _
and we have a heap object called tf13
. The assignment statement in the next line simply copies the value of x
to y
. The value of x
is tf13 so we simply copy that over, resulting in our local state to be x = tf13, y = tf13
. y.a = 13
will look up which object y
refers to (it refers to the one called tf13) and will set its property a
to 13
. There's still only one object so console.log(x.a)
will also result in 13
. It should be obvious why this program didn't print 42
- with only one object there's no way it could be 42
after we set its a
property to a
. y = x
didn't copy the object, it just copied the reference.
So why do we need a "garbage collector"? I previously mentioned that every time an object literal expression is encountered a new object is created and there are several other cases where this happens: Every time a function is defined, every time an object is constructed, every time a constructor is called. All of these cases add up and all of these objects add up. Computer memory is still finite so at some point no new object can be created. To solve this problem we can make a useful observation: When no one has the name of the object it doesn't make a difference whether it still exists or it doesn't. This is where garbage collectors come into play: They analyze all variables, all places where references could be found and remove objects which aren't required anymore. Modern garbage collectors use sophisticated algorithms so I won't go into detail about their internals.
Let's get back to WeakMap
: Like all class instances WeakMaps are objects and it stores values which can be accessed using the get
method. Therefore the garbage collector would normally need to preserve the values. If I have two objects a
, b
, a WeakMap
with the mapping a -> b
the garbage collector can't remove b
yet: There's still a way to access b
through the reference of the WeakMap
, using the get
method. But let's suppose that there's no reference to a
anymore and we just have a reference to the WeakMap
itself. Now there's no way to access b
because no method allows us to get b
without having a
. Due to the API we always need both a reference to the key and a reference to the WeakMap
to get the value corresponding to the key. This also makes it clear why the restriction that WeakMap
keys have to be objects exist: Primitives like strings aren't objects and we could construct new instances easily, but then our garbage collection trick doesn't work anymore. If we were to store something in a WeakMap
with a key 42
(the number
) we would always have a way to get the corresponding value by simply calling get(42)
- we can always do that and therefore the garbage collector can only remove the WeakMap
entry when there's no reference to the WeakMap
itself anymore.
Where's the difference to normal Map
? There are methods of Map
instances which allow the user to access the list of values without having to have a reference to a key. The fact that these methods are missing is what makes WeakMap
so powerful: It can get special treatment by the garbage collector. If our access pattern already allows us to use the WeakMap
API we don't even have to think about having unnecessary references.
Let's construct a hypothetical use case: We want to write a function which computes the sum of elements of an array:
function sum(array: number[]) {
return array.reduce((acc, elem) => acc + elem, 0);
}
So far so good, but when this function is called repeatably with the same array it will always compute the sum again. We could apply caching here, storing the result of previous calls and returning the cached value when there is one:
const sumCache = new Map<number[], number>();
/**
* Return the sum of the elements. Assumes that if the array was passed to `sum` once
* its content won't change.
*/
function sum(array: number[]) {
const existingResult = sumCache.get(array);
if (existingResult !== undefined) {
return existingResult;
}
const result = array.reduce((acc, elem) => acc + elem, 0);
sumCache.set(array, result);
return result;
}
This will work as well, but we have to be careful as the contract of the function changed. Due to the way arrays work in JavaScript this implementation could actually return the wrong value in some cases:
const x = [12, 13];
console.log(sum(x)); // 25
x.push(42);
console.log(sum(x)); // Still 25, cached
This is specified in the documentation of the sum
function so it's just something we have to be aware of. So whats the problem now with the sum
implementation presented above? It actually holds references to all arrays that have been passed to sum
in the lifetime of the program. The array instances are used as keys and in theory we could access the .keys()
of the map to get a list of all arrays the function was called on. Although it's very likely that we would want to do this, the arrays are all held in memory without them being used. This is what is called a memory leak: We have objects floating around which won't ever be used. although they could be accessed in theory the garbage collector can't remove them. This is where WeakMap
can come in handy: We can simply replace Map
with WeakMap
and get the expected result: When we have the array we can get the cached value, but without having the array there's no way to get the cached values. The garbage collector can also happily remove the arrays and the cached values when we no longer have a reference to that array.
Another useful situation where WeakSet
can be useful is when we only need to know whether an object has a certain property or not. If we for example would want to compute the set subtraction between two arrays we could use WeakSet
s to accomplish the task:
const subtract = <T>(x: T[], y: T[]) => {
const set = new WeakSet<T>();
for (let i = 0; i < y.length; i++) {
set.add(y[i]);
}
return x.filter((elem) => !set.has(elem));
};
We don't need to iterate over the set
here so a WeakMap
works fine. A Map
works fine as well and has similar performance characteristics in this example, although I would have expected WeakMap
to have a slight advantage but it seems like browser optimizations are currently tailored to Map
more than to WeakMap
.
Sometimes I find it interesting to look at the internals of JavaScript implementations to get a feeling for how things work. Engines like v8 are very complex programs, but exploring them a bit can already give you insights into their inner workings: A majority of the code for Map
/ Set
/ WeakMap
/ WeakSet
is located here: chromium.googlesource.com, but there's also some code scattered in the src/objects
folder, where the Torque code can be found - torque is a language which v8 uses to generate fast code with their JIT compiler.
In v8 both WeakMap
and Map
are generated from C++, they mostly share the same source code, but they use a different class to store their internal data: The weak variants utilize a EphemeronHashTable
whereas the non-weak variants require a sorted hash table variant. The ephemeron part of EphemeronHashTable
benefits from the garbage-collection benefits mentioned previously: This is outlined by the following comment: EphemeronHashTable is similar to ObjectHashTable but gets special treatment by the GC. The GC treats its entries as ephemerons: both key and value are weak references, however if the key is strongly reachable its corresponding value is also kept alive.
The source code of the gc scavenger of v8 contains the following comments, giving these objects their special gc treatment:
// Clear ephemeron entries from EphemeronHashTables in new-space whenever the
// entry has a dead new-space key.
void ScavengerCollector::ClearYoungEphemerons(EphemeronTableList* ephemeron_table_list)
...
// Clear ephemeron entries from EphemeronHashTables in old-space whenever the
// entry has a dead new-space key.
void ScavengerCollector::ClearOldEphemerons()
This enables the garbage collector to remove the entry when there is no strong reference to the key. The non-weak variants don't get this special treatment and instead have to rely on a sorted list: In JavaScript the order in which entries
, values
and keys
return their elements is determined by the order in which the elements where inserted, so they need to keep that order stored somewhere to make them work properly.
That's it for today, I hope you learned something and that you'll consider using WeakMap
or WeakSet
in your next project. Cheers!