String Compression Algorithm in C# Using LINQ

1. Motivation

Yesterday I wrote another post about string compression using Python. You can find it here. But my original code was done in C#. So I thought I need to put it in a separate article. It will repeat the Python version, but obviously it will be done using C# and LINQ.

The assignment was to write a function to count similar letters in the input string and give the output in compressed form {number}{letter}. Below are several examples in the format
input => output:

  • AAAA => 4A
  • AABB => 2A2B
  • AABA => 2A1B1A

There are many ways to write this algorithm. Since I like using recursion wherever possible, I used it here in two ways. One is using foreach loop, and another – using LINQ.

2. Basic Recursion

Loops have been around since.. Oh well, at least for as long as I have been programming. And they still exist. Here I will show how I solved the string compression challenge

  • The first line in this method a guardian clause. It is protecting the code from Null values, and also it is working as a STOP switch for recursive calls. As soon as an empty string comes in, the recursion stops. Of course, if we call it with an empty string to begin with, the recursion will net even start.
  • After that we set counter to 0 and start looping over the input argument like we would iterate over the collection of characters; we increase the counter with each iteration
  • If next character does not equal to the first character, the loop stops
  • After that we call the method recursively passing remainder of the string argument as an argument

There’s one interesting comment about C#. Recently it embraced the idea of string slicing. Just what Python has been using for a long time. I thought I put it here for a change, and to demonstrate how it works. Look at the line

MakeSubst(str[counter..]);

The parentheses around counter (in the code slide) are just a redundancy. The notation [counter..] tells that we need to take a portion of our variable str, starting at position counter up to the end of the string. Since string is treated as 0-based array, we will be taking substring starting at the position after what we already processed. Hope this makes sense.

3. Now we can compress string using LINQ

I personally like LINQ, and I use it quite extensively in my code. It may be somewhat confusing sometimes, but when properly used, it makes code much cleaner.

The method is just one line of code split into several lines for readability.

Now let’s go and review this code line by line.

  1. Guardian clause is the first element in our turnery operator. Just what we had in the previous version
  2. Using ToList() converts the string into a list of characters. Now LINQ will treat it just like a regular collection
  3. TakeWhile() will use filter to only take those character that we need. It return IEnumerable<char>
  4. GroupBy() will group elements by the character. And since char is the only type it holds, we specify l => l
    Keep in mind that after grouping function we get IEnumerable<IGrouping<>> as a result
  5. Select() function will work with collection of IGrouping<> objects.
    And since Takewhile() only selected similar characters, we will only have a single IGrouping object in the collection, from which we need Count() and First() element to add to the final output string
  6. The recursive call is coming from inside Select() function. As a new argument we send our input argument, first removing Count() characters starting from position 0.
  7. The recursion stops when we reach the end, meaning sending empty string as an argument

I hope this is not too confusing.

Stay positive, keep coding!
Gene