--- ---

Bogado.net

Tree node growth and the Fibonacci sequence.

I was doing the LeetCode problem called "Decode Ways" when I met a very neat property of growth on trees. The problem statement is you have a message in lowercase letters you encode it by converting each letter to a number that represents it’s position on the alphabet, 1 for 'a', 2 for 'b' and so on until you get to 26 for 'z'. Given a concatenation of the codes in how many ways can you translate this message into a set of letters.

At first it does not look like it has anything to do with trees, much less the Fibonacci sequence. If you’re curious keep reading.

Let’s count

First let’s do some rough counts, "1" can only be "a", but "11" can be both "aa" or "k". But take "4" for "d" instead, there’s no number you can add after that "4" that will change that this is indeed a "d" followed by something else. Right of the bet we can divide the digits into two categories, the ones that can change depending on who is showing after it and another group that will not change in function of numbers after them.

To the reader that is an expert in Dynamic Programing might recognize that this is starting to look a little bit about a DP problem. You can simply solve the problem somewhat recursively by noting that the number of solution can be calculated in function of the number of future solutions after removing the initial character. Look at the "11" from before, you and think that the 2 possible decode ways are two from the first "1" multiplied by the number of solutions on the rest of the string, a boring "1" in this case.

Things get interesting once you have more than one "change-able" number in a single string.

Bifurcations.

Lets imagine that each possible decode is a path in a tree, this tree start with no node before the beginning of the input string of numbers. Then it will be linked to a possible interpretation for the first letter. So in our trivial example we have the following tree.

G 0 a a 0->a k k 0->k aa aa a->aa

Now lets try with something more interesting, lets say "114411".

G cluster_0 cluster1 cluster2 cluster3 0 a a 0->a k k 0->k aa aa a->aa kd kd k->kd aad aad aa->aad aadd aadd aad->aadd aadda aadda aadd->aadda aaddk aaddk aadd->aaddk aaddaa aaddaa aadda->aaddaa kdd kdd kd->kdd kdda kdda kdd->kdda kddk kddk kdd->kddk kddaa kddaa kdda->kddaa

From now on we will colapse the single path parts of the graph, the ones shown in blue.

G 0 aadd aadd 0->aadd kdd kdd 0->kdd aaddaa aaddaa aadd->aaddaa aaddk aaddk aadd->aaddk kddaa kddaa kdd->kddaa kddk kddk kdd->kddk

Observe how the 4, the class of characters that do not deal with the state before it lead to identical subtrees on both sides of this. By looking at this we can simply derive that this problem only needs to be solved on sequences that creates bifurcations on the selection tree. This means that we can simply ignore all sequences that have a single meaning. By burning a few more neurons one can trivially verify that we only need to look at the length of such sequences to count the number of leaf nodes we will have at the end.

Fibonacci tree

Since we now only care about the length of sequence that generates bifurcations on the tree lets look at the sequence that is composed of 'n' times "1". The tree bellow is showing the case where 'n = 4'.

G cluster1 1 cluster2 11 cluster3 111 cluster4 1111 0 a a 0->a k k 0->k aa aa a->aa ak ak a->ak ka ka k->ka kk kk k->kk aaa aaa aa->aaa aak aak aa->aak aka aka ak->aka aaaa aaaa aaa->aaaa kaa kaa ka->kaa

The figure is getting very clear now, each level has exactly one contribution of the level just above it and another contribution of the level that is two steps up from that one. The number of leaf nodes on this tree is following the Fibonacci sequence that start with 1 for the input "1" and 2 for the input "11".

Solution

Now we can get to a nice solution that scans the string only once. Starting with a count of "1", go through each character in the input and check if it is a forking point. If it is a forking point measure how long the sequence of forking points is and then multiply the corresponding Fibonacci number on the previous count. Return the count when you reached the end.

Bellow is an implementation of the solution in C++.

unsigned fibonacci(unsigned);

unsigned count_decoding(string encoded) {
    unsigned fork_length = 0;
    unsigned count = 1;
    for (unsigned i = 0; i < encoded.size(); i++) {
        char current = encoded[i];
        char next = (i+1 < encoded.size())?encoded[i+1]:'X';
        if ( // is a forking point:
            (current == '1' && next != '0') ||
            (current == '2' && next > '0' && next < '7')
        ) {
            fork_length++;
        } else if (fork_length > 0) {
            // first non-forking after a sequence of forks.
            count *= fibonacci(fork_length);
            fork_length = 0;
        }
    }
    if (fork_length > 0) {
        count *= fibonacci(fork_length);
    }
    return count;
}

unsigned fibonacci(unsigned n) {
    static unordered_map<unsigned, unsigned> cache;
    if (n == 0 || n == 1) {
        return 1;
    }

    auto it = cache.find(n);
    if (it == cache.end()) {
        unsigned result = fibonacci(n - 1) + fibonacci(n - 2);
        cache.emplace(n, result);
        return result;
    }
    return it->second;
}