Dynamic Programming in the Wild

May 2, 2021

Around a year ago I rewrote the frontend of community-solutions, a project of my student association (VIS) that I maintain in my free time. The frontend - written in React - didn't get much attention previously and was still using React class components exclusively. As a major refactoring was definitely necessary I started mostly from scratch, without looking at the old source code too much. As the rewrite grew I quickly lost track of which features I still had to implement and which ones were finished. As can be expected I messed up something, in this case it was (among other small things) the filter feature on the main page where users could search for courses. Instead of implementing a fuzzy filter function I filtered the course list based on a substring match with the search term. Not to long after the initial deploy first messages reached me, an issue was opened and at first I thought of this as an easy issue: I quickly looked through the git history of the project to find the code that was previously running and found out that it was using a pretty standard Leventshtein distance implementation, using DP to get a performance in O(n * m). Happy about my findings I swiftly copied the code over to the rewrite, spun up a development environment, started my browser and tried it out: It worked. Everything worked as expected, the Leventshtein implementation was correct, but I wasn't satisfied - the search results weren't as good as I expected.

Fuzzy searching / autocompletion is something very subjective - for some use-cases a substring match makes more sense than a ranking by editing distance, for others it's the other way around. But I understood that leaving it as it is wasn't an option either. I sometimes like to overengineer things and this problem seemed like a perfect opportunity to pour a couple of hours in. At first I tried a couple of existing solutions, Fuse.js being the most promising at first, but I also tested fuzzyset.js and several other packages, but none of these really felt right. They either were thought for full-text search on bigger data sets or simply returned unsatisfactory results. As this is something so subjective I needed to make sure I knew what I wanted and wrote down the characteristics that I was looking for:

it should be typo-tolerant
it should prefer prefix matches
abbreviations should match with their unabbreviated counterparts

The last one was pretty specific to the problem: "DM" and "DiskMath" should both be optimal matches for the course with the name "Diskrete Mathematik". Students often use these kinds of abbreviations: Nobody wants to type "Formal Methods and Functional Programming", but searching for "Formal" results in too many matches - it makes much more sense to simply search for "FMFP" in this case. I then tried to refine my specification: The matching algorithm should be allowed to "jump" directly to the next word in case that gives the result a better score. At the same time there shouldn't be any penalty for only writing a prefix of the name: "Diskr" should match "perfectly" with "Diskrete Mathematik". This matches the way most abbreviations are constructed.

A Baseline

To properly be able to compare the different fuzzy filter functions we need to establish a baseline, both for performance and for the quality of the results. As a first example we can use a simple substring search, similar to the one that I previously implemented. We can simply start off with an array of course names and return an array of objects representing the matched strings. I also wanted the matching parts to be highlighted, a simple interface for this is to simply return an array of substrings, each one annotated with a boolean indicating whether the part matches the filter or it doesn't.

Our simple substring matching function then would look like this:

const lowercaseFilter = filter.toLowerCase();
return categories
  .map((category) => {
    const index = category.toLowerCase().indexOf(lowercaseFilter);
    if (index < 0) return undefined;

    const match = [
      { matches: false, substring: category.substr(0, index) },
      { matches: true, substring: category.substr(index, filter.length) },
      { matches: false, substring: category.substr(index + filter.length) },
    ];
    return {
      name: category,
      match,
    };
  })
  .filter((x) => x !== undefined);

At first we turn our filter into a lowercase string and then perform a simple map on our array of categories, which returns undefined whenever the filter string can't be found and otherwise return an object with the matching parts stored in a property. If we now utilize this function in a nice-looking UI we get the following result:

Filter

Advanced Algorithms

Advanced Computer Networks

Advanced Machine Learning

Advanced Operating Systems

Advanced Systems Lab

Advanced Topics in Communication Networks

Algorithmen und Datenstrukturen

Algorithmen und Wahrscheinlichkeit

Algorithmik für schwere Probleme

Algorithms Probability and Computing

Analysis

Angewandte Computer Architektur

Applied Cryptography

Big Data

Compiler Design

Complexity Theoretic Cryptography

Complexity Theory

Computational Biology

87 other courses matched.

Results appear near-instantaneous - indexOf is implemented as a native function in the JavaScript engine and as such is faster than any matching function you can write on your own. The quality of the results is definitely suboptimal: No typos are being accepted, the matching isn't fuzzy at all, abbreviations aren't taken into account. As such the indexOf based implementation isn't really a good starting point for our custom solution.

Dynamic Programming

Dynamic Programming is a technique that helps to accelerate the computation of functions that are made up of recursive function calls. In its essence it applies memoization on the recursive function, but instead of calling the function with the parameter for which we want the result, we actually call it last and prepopulate the memoization cache (called DP-array) with the results that the last invocation requires. DP is often used to speed up an algorithm because memoization can improve the asymptotic runtime and compared to a memoized recursive function DP eliminates some overhead due to no call stack being required and the access patterns being more optimized.

The DP algorithm for the Leventshtein distance (this specific algorithm is called Wagner-Fischer algorithm) metric is taught in most first-year algorithms lectures and is pretty simple to implement. The definition is pretty simple: Deletions and insertions of arbitrary characters cost 1 unit, replacements also cost 1 unit and there is no cost associated when the characters in the two different strings match up. The DP solution is as simple as a DP program can be: A two dimensional array which can be filled row by row using the following equation:

for (let i = 0; i < a.length; i++) {
  for (let j = 0; j < b.length; j++) {
    let cost = 1;

    if (a[i] === b[j]):
      cost = 0;

    const insert = dp[i-1][j] + 1;
    const remove = dp[i][j-1] + 1;
    const replace = dp[i-1][j-1] + cost;
    dp[i][j] = Math.min(insert, remove, replace);
  }
}

In a working implementation we'd have to initialize the array first, but I'll leave out the details here as they can be found in numerous other resources. Apart from that this already works: To get the total cost we can simply read the array element with the highest index. Performance wise this solution performs okaish, but is far worse than the indexOf solution, both in terms of asymptotic time complexity and in real-life performance.

Result-wise this is already closer to the intended behavior than our initial indexOf solution: We get typo-tolerance, prefix matches however aren't preferred - they cost as much as removing every character from the end of one string. But some of these properties can be fixed rather easily: We can simply read all values in the bottom row (or column depending on the order of the parameters) and return the minimum as the result to allow prefix matches. For the rest we can tweak our recurrence relation a bit.

The Metric

It's obvious that we have to make some modifications to the aforementioned DP algorithm to make it suitable for our use-case.

As abbreviations should be allowed it would be sensible of our algorithm to ignore jumps to the next word or straight up starting with a word other than the first word. We can implement this by applying two modifications: Making jumps to an arbitrary start location free and by increasing the cost for matches that don't start at the beginning of a word. The latter is a bit harder to implement as there is no easy way to check whether a match is non-continuous, but we can fix this by adding another array to our DP: A lastMatch DP array which has the same size as the cost DP array and stores the index of the last match that produces the result stored in the cost array at the corresponding location. We initialize it with zeroes, set it to the index of the string we match against whenever a match occurred and otherwise just copy it from the result we used (i.e. if our Math.min used the "insert" case we'll also use the lastMatch result from the same location). Now we can simply checked wether the lastMatch of the "replace" case is one less than the index we are currently checking. The cost of the "replace" operation is then set to zero. Other replace operations aren't free, we are deviating from the Leventshtein metric here: Instead of counting all the inserts / removes in between we simply have one large cost for a non-continuous match. The insert / remove operations are still there: Ignoring a character in our target string is (temporarily) free because the cost gets applied when the next character matches. If there is no next matching characters we don't have a problem either because we actually want that behavior - it is totally acceptable to ignore characters at the end. Ignoring characters in our filter text however should be expensive: Typos can happen and we want to allow them, but at the same time we don't want to bloat our search results with matches that don't make sense at all.

I tried to formalize these metrics and came up with the following:

We initialize the cost for replacement with a rather expensive cost, "normal replacements" are typos in the middle of a text and as such we make them expensive. Note that due to the "non-continuous matches result in an addition cost" rule the cost of typos is actually split into two parts: The typo replacement itself and the next matching character that isn't a continuous match.

let replaceCost = 3;

When the characters match we lower the replacement cost to 2 - we assume a non-continuous match here:

const matched = a[i] === b[j];
if (matched) {
  replaceCost = 2;
}

We then check whether this is a continuous match and in the case that it is we lower the cost to 0. We also make matches at the start of the string and directly after a "space" free:

const t = lastMatch[j - 1][i - 1];
if (matched && t === i - 1) {
  replaceCost = 0;
}
if (matched && (i === 1 || a[i - 1] === " ")) {
  replaceCost = 0;
}

Now that we have our custom replaceCost calculation in place, we can simply calculate the cost of the two alternatives and take the min of them:

const sa = cost[j][i - 1];
const sb = cost[j - 1][i] + 6;
const sc = cost[j - 1][i - 1] + replaceCost;
cost[rowIndex + i] = Math.min(sa, sb, sc);

We still have to take care of the new lastMatch array we introduced:

if (sa < sb && sa < sc) {
  lastMatch[j][i] = lastMatch[j][i - 1];
} else if (sb < sc) {
  lastMatch[j][i] = lastMatch[j - 1][i];
} else {
  lastMatch[j][i] = matched ? i : lastMatch[j - 1][i - 1];
}

This is basically all the code required for the scoring part of the matching algorithm. We simply have to initialize the first row and column such that jumping in the filter string is expensive, but jumping in the other string is cheap and we can simply apply our recurrence relation to every cell of our DP array. The result can be extracted just like previously by accessing the bottom-right element.

Extracting the Path

Having a score is great, but having some sort of visual feedback is always a good idea - in this case I'd like to see which characters matched the filter text: As a user this makes it easier to find the filter text and as a developer it's always nice to have a way to visualize the internals of your program. Just like the regular DP solution for Leventshtein we simply have to keep track of which rule (replace / insert / delete) was applied to get the value that is stored in the cost DP array. We could also store the position from where the value comes from, but in this case it is easier to simply store a 0, 1, 2 or a 3 to indicate which rule was applied. We differentiate between matches and replacements / typos here so that it is easier to extract the solution. We can simply extend the code we used for lastMach above and store the rule that was applied.

To get the indices which matched we simply start in the bottom right corner, convert the integers back into rules and stop when we reach the top left. While doing that we keep track of the indices which matched by checking where the "match" rule was applied and adding that index to an array.

Implemented in TypeScript that looks like this:

let bj = b.length - 1;
let bi = a.length - 1;
const res: number[] = [];
while (bi >= 0) {
  // What kind of action was applied here?
  const p = pred[bj][bi];
  if (p === 0) {
    bi--;
  } else if (p === 1) {
    bj--;
  } else {
    bi--;
    bj--;
    if (p === 2) res.push(bi);
    // We found a match
  }
}

These indices would be in the wrong order - we appended the last index here first. Performance-wise there's not much to gain here so we can just call res.reverse(); on our array once we added all the indices.

Filtering

So far we only implemented a scoring function, but what we actually want is a filter function: A function that takes an array and returns a new array. To filter an array we simply apply the scoring function to each element and check whether it is under a certain limit. This gives us a new parameter we can customize: The maxScore doesn't even have to be a constant, we can actually dynamically generate it. In my experience it makes sense to adjust the maxScore in relation to the length of the input: The more characters our filter text has the more typos we want to allow. A linear function works fine for this use-case.

Optimizations

A pretty easy improvement can be made here by using one of the TypedArray classes of JavaScript, they make it easier for engines to optimize the code and the memory layout can be more efficient due to the guarantee that we will never store anything else in the array elements. We don't really need any big numbers either - the maxScore of elements which should be displayed is mostly below 16, so we don't need to worry about the numbers we actually need getting too big. Overflows however could be a problem if our range is too small and as a result Uint8ClampedArray is optimal for our use-case: The Clamped part means that when we insert a value below zero it will simply write a zero and if we'd overflow the value the engine automatically "clamps" the value to 255.

Using a Uint8ClampedArray for each single row would result in a lot of individual small allocations and they could slow our code down, making the delay more noticeable to the user. Instead we can save our entire 2D-DP array (we actually need multiple DP arrays) in a single array by thinking of each row as a group of n continuous elements. The rows are laid out sequentially without any gap so that array[i][j] in a 2-D array would be translated to array[(i * arrayStride) + j] (the stride being the length of the second-level arrays) in our 1-D array. Using 1-D elements also improves cache-locality - in JavaScript there's no guarantee that a 2-D array (which is just an array of arrays in JS) is laid out in a particular way in memory. Especially for smaller matrices we get some benefits here.

Caching

There are several different types of caching that can be applied here: Most of our computing time is spent in the inner loop of our DP calculation and as such we want our memory accesses to be as fast as possible. We can ensure this by making all of the memory accesses as spatially local as possible. As we took control over the array layout by using a 1-D array we can optimize the way our loops are nested: From an algorithmic perspective it really doesn't matter whether we go through the rows first and then through the columns or do it the other way around. Specifically we want the inner loop variable to be "scaled" as little as possible so that the next loop iteration can use the data we already loaded into the cache with our accesses.

for (let i = 0; i < 100; i++) {
  for (let j = 0; j < 100; j++) {
    sum += array[i * stride + j];
  }
}

would be preferred over

for (let j = 0; j < 100; j++) {
  for (let i = 0; i < 100; i++) {
    sum += array[i * stride + j];
  }
}

The only difference here is the order of the loops, but this tiny modification can make a significant difference in performance. In our case we only have to ensure that our 1-D array layout and our loop variable order match uo - we are doing the same amount of accesses in a different row as in a different column: We only need the element one row above the current cell and the element in the cell left of the current cell.

There's also another type of caching we can use here to speed our algorithm up: When a user tries to filter the list of courses they will usually start with a single character and append to that search string. Occasionally they'll also delete a character at the end, but what rarely happens are edits in the middle - maybe in the rare case of a typo. How does that help us? If you look at the DP tables of the two comparisons "Dis" against "Diskrete Mathematik" and "Disk" against the same search term you'd notice that they largely match up. Due to the way our DP table is defined a single character influences a single row or a single column (with the exception of a space) in the DP table. It turns out that we can easily reuse the DP-table of a previous function call. Though there's one drawback: We can only do that if the array we allocated is big enough. We previously decided on using TypedArrays which can't easily be enlarged and as a result we manually have to allocate additional space.

It turns our that we can cache exactly the first n rows if the first n characters of the strings match up. This check is pretty cheap, we can simply iterate through both strings with a single for loop:

function prefixLength(a: string, b: string) {
  const maxIndex = Math.min(a.length, b.length);
  let i = 0;
  while (i < maxIndex && a[i] === b[i]) i++;

  return i;
}

The caching wrapper around our DP function first checks if the cached DP array would be big enough to fit the new word. If that is the case we call our DP function, telling it to skip the first prefixLength(a, b) rows and return the result, updating our cache. If the arrays aren't big enough we have to create a new entry, update our cache and call our DP function.

Benchmark

All of these optimizations would be pretty useless if they don't work - when trying to optimize a function we always have to test whether the performance benefit is worth the added complexity. In this case we are comparing three implementations with each other. The first one is our custom DP function with all optimizations: 1-D typed arrays, cache-friendly indexing and DP table caching. The second implementation has DP table caching disabled - I just pass an empty cache to the function every time. The third implementation is a standard Leventshtein distance implementation using a standard 2-D array. This benchmark actually favors the Leventshtein distance implementation pretty heavily: It doesn't need to keep track the last matching index and therefore only has two DP-tables instead of 3 (score, lastMatch, pred). I also don't actually extract the matching indices and only compute the pred matrix. The indexOf variant is excluded here - it's just unbeatably fast.

The test case tries to emulate real-world usage: The filter text is written character by character, a typo is being made, the search text gets deleted and a second word is entered character by character.

The Time per search is the time that a single update of the search field would take, evaluating the DP function for all >100 courses, effectively being equivalent to everything that has to be done when you change the filter text once.

With Caching

Time per search: -

Without Caching

Time per search: -

Standard Levenshtein Distance Implementation

Time per search: -

On my machine I can experience a speedup of around 2x over the non-caching implementation, which I'd like to call a success. DP-table caching really makes the biggest difference here as it reduces the asymptotic time complexity of some appends and deletions (at the end) to O(n). Occasionally however we need to allocate a new array and have to start from scratch and as such our average time complexity is still O(m * n) even when a user only appends - deletions are fine because they need less space. In theory we could make the arrays twice as big in each step, but that doesn't really make sense for our use-case - considering asymptotic performance is only marginally useful if one of your variables has to be manually entered by a user.

The Final Result

This blog post would be pretty unsatisfying if it didn't include a demo of the finished filter function, I thus included it here: