String Compression in Python, Imperative & Functional Ways

1. Motivation

I was given this challenge at the coding part of interview for one of the companies. There I was solving in C#. However, I would never miss the opportunity to do that in Python! So here we go.

The assignment was to write a function to count similar letters in the input string and give the output in compressed form {number}{letter}. Below are several examples in the format
input => output:

  • AAAA => 4A
  • AABB => 2A2B
  • AABA => 2A1B1A

There are many ways to implement this requirement in code. Below I will give two possible solutions. The first is more ‘standard” if you will, because that’s most likely how you can find it online. The other solution is the one I would most likely favor. But it depends on many other circumstances. So, let’s take a look.

2. "More traditional" way to implement string compression

I am using words “more traditional” loosely because based on what I see online, function recursion is not the first choice for many solutions. However, that would most likely be for me. Having PROLOG as my first “real” language many years ago, my whole programming paradigm was built on recursion. And I like implementing it in my code today whenever I see it to be a god fit.

I called the function parse, which perhaps is not the best name. Bit it is what it is. Let’s discuss what we have here.

  1. The very first “if” check serves as a guardian clause to avoid None or empty string, and it also stops the recursion by returning an empty string.
  2. Now we can set counter to 0, and set first equal to the first character of the string. Since it passed through guardian clause, it is not empty.
  3. The “for” loop simply goes over the letters in the string comparing each next letter to the first one and increasing the counter. As soon as it sees the first letter different from the current block of letters, the loop stops.
  4. Finally, the function returns f'{counter}{first}’ + result of calling itself recursively. However, as an argument it passes the slice of the original string, dropping the number of characters equal to the counter. I used string interpolation here, but I could just convert integer value to string.

Hopefully this simple algorithm is easy to figure out. Let’s take a look at another possible solution. I will call it “functional” way.

3. Functional way to implement string compression

This is more functional way to solve the challenge.The whole return value is basically a single line of code split in 3 lines using “\” at the end. This may be not as easy to follow as the first solution. Since this was a quick “proof of concept” solution, I don’t expect it to be the best, and certainly not the only way to solve it. But that’s a start.

You can click on the picture to see it full size.

Lets look at each function used in the code above

  1. Right after return you can see Python variation of ternary operator – [value_true] if [condition] else [value_false].
    So line return ” if not s is identical to the guardian clause in the first algorithm. The else return value is implementing the rest of the logic.
  2. The loop is implemented as takewhile(lambda c: c == s[0], s)
    This will iterate over characters of the string argument for as long as the letter equal to the first letter of the string argument (s[0]). I used “s” just to make it shorter on the screen. However, “takewhile” returns a generator, therefore we need the next function
  3. And the next function in this chain is list([generator]). Generator returns one item at a time, and list() function gets all items out of it.
    However, it returns a list, so string “AAA” will become [‘A’, ‘A’, ‘A’]
  4. By using len([list]) function we get the number of elements found in this substring. By combining it together with the first element of the argument, we get “3A” out of [‘A’, ‘A’, ‘A’]. And this creates a “head” of every recursive call. Now we need to create a tail.
  5. The tail is created by using dropwhile(lambda c: c == s[0], s) function, which drops the number of elements already taken by the “head”.
  6. Since dropwhile is also a generator, we use list() as well.
  7. But we cannot pass a list with recursive call, so “”.join() function will join elements of the list into a string, and the resulting string will be passed as a new argument into recursive call.
  8. The recursion will stop when we take all characters out of the string and pass in an empty string.

I hope this all makes sense. There will definitely be more examples of using functional style, list comprehensions, lambdas etc. So stay tuned.

Make a wonderful day happen!
Gene