Programming for COWARDS: Algorithms

Showing posts with label Algorithms. Show all posts

Thursday, January 21, 2010

A Misleading Name of Computer Science Concept: Dynamic Programming

Have you heard of these words before?
Probably yes!
First let's make a clear (well, not so clear) difference, clearer. Dynamic Programming is not about Dynamic Typing. Dynamic typing is a property of a particular programming language. Languages like Lisp, Python, and Ruby are all dynamically typed languages. Unlike those, languages like C/C++, Java, and C# are statically typed language. The difference relies on whether the compiler checks certain things (types, overload resolution, etc.) before running the program or not.
Dynamic Programming on the other hand is something very different. It's an ancient method for solving problems by basically dividing these problems into smaller, and easier to solve problems.

It's mainly focused on addressing these two issues:

Overlapping subproblems
Optimal Substructures

I think the best way to explain these two fuzzy concepts is by using examples. To start let's see an example of using a famous dynamic programming technique called Memorization. Assume that we need to implement a function that calculates a Fibonacci sequence. I know this is easy, but let's look at this first implementation:

static int FibClassic(int n, ref int numberOfStepsTaken)
        {
            numberOfStepsTaken += 1;
            if (n <= 1)
                return 1;
            Console.WriteLine("FibClassic called with: {0}", n);
            return FibClassic(n - 1, ref numberOfStepsTaken) + FibClassic(n - 2, ref numberOfStepsTaken);
        }

Note: on the above code I used two counter variables to count the number of times the method executed. I also printed the input on which the method is called every time, just to give you a hint of how dividing a problem can cause the same subproblem to be computed more than once --Subproblem Overlapping, remember! Now let's run the code given the number 6 as input and see what happens:

int z = 0;
            int x = FibClassic(6, ref z);
            Console.WriteLine("{0}: {1}", x, z);

And here's the result of this call:

As you can see the method has been called first with input 6 which is the initial input we passed in, then 5, 4, 3, 2, 2, ... oops!! Can you spot it? the method is being called on the same input more than once! Think about this for a while, pretty logical, yeah!? Our strategy is based on dividing the problem into simpler subproblems. For example to get the 6th item in the Fibonacci sequence we divide the problem into two smaller problems, getting the 5th item, and 4th item and adding them together, then to get the 5th, you should get the 4th and 3rd, etc. The next figure shows how is this working.

As you notice the overlapping happens when solving one part of the problem includes solving another part of the problem, in such a case we can take advantage of this and simply memorize the solution for the overlapped problem and each time we need that result, we don't have to compute it again, we just supply it from wherever we stored it. Here's the modified method to do it:

static int FibFast(int n, ref int numberOfStepsTaken, Dictionary store)
        {
            numberOfStepsTaken += 1;
            if (n <= 1)
                return 1;
            if (!store.ContainsKey(n))
                store[n] = FibFast(n - 1, ref numberOfStepsTaken, store) + FibFast(n - 2, ref numberOfStepsTaken, store);
            return store[n];
        }

If you run that same method on the same input (6) you should get 13 as the result (the same old result) but the number of iterations would be 11 which is about half the number of iterations the first method take. This doesn't seem to be a very huge enhancement, but let's see how the two methods act for bigger numbers. Running the first method of input equals to 30 we get the result 1346269 and number of iterations 2692537. Now running the second method on the same input (30) we get the result 1346269 -which is the same result- and number of iterations 59! That's a HUGE difference!

Now back to core, what does this have to do with the term Dynamic Programming??
Actually, it's a very misleading term, historically it was invented by a mathematician called Bellmen and he was at the time being paid by the US defense department to work on something else. He didn't want them to know what he was doing, so he made up a name that he was sure it has no clue what it actually meant. Now we have to live with this forever :)

Thursday, January 14, 2010

A Quick Tip: The Different Classes Of Algorithms

Algorithms are the heart of computer science. They are the thoughts, the ideas, and the most fun part of this industry. Scientists categorize the various known algorithms into 4 classes: Logarithmic, Linear, Quadratic and Exponential. Let's look at those briefly:
1- Logarithmic Algorithms:
This type is the fastest of those 4 classes. It has a run curve that looks something like this:

Where the x here is the number of items to be processed by the algorithm, and y is the time takes by the algorithm. As you can see from the figure, the time taken increases slowly when the number of items to be processed increase. Binary Search is a perfect example for a logarithmic algorithm. If you recall, binary search, divides the array into two halves and excludes one half each time it does a search. Here's a code example of binary search implementation in C#:

bool BSearch(int[] list, int item, int first, int last)
{
 if(last - first < 2)
             return list[first] == item || list[last] == item;
             
        int mid = (first + last)/2;
        if(list[mid] == item)
             return true;
             
        if (list[mid] > item)
                return BSearch(list, item, first, mid - 1);
 
 return BSearch(list, item, mid + 1, last);
}

2- Linear Algorithms:
Runs in time, linear to the input items. It's curve looks like the following:

A linear search is a typical example of that, where one would traverse the array or, whatever data structure, item by item. An implementation of linear search looks like this:

bool LinearSearch(int[] list, int item)
{
     for(int i = 0; i < list.Length; i++)
          if(list[i] == item)
              return true;
        return false;
}

3- Quadratic Algorithm: This one runs time grows to to the power of 2, with each increase in the input sizes, which means while processing 2 items the algorithm will do 4 steps, 3 items will take 9 steps, 4 items will take 16 steps, etc. The next figure shows how a quadratic curve might look like:

A famous example of an algorithm with quadratic growth is Selection Sort:

void SelectionSrot(int[] list)
{
            int i, j;
            int min, temp;

            for (i = 0; i < list.Length - 1; i++)
            {
                min = i;

                for (j = i + 1; j < list.Length; j++)
                {
                    if (list[j] < list[min])
                    {
                        min = j;
                    }
                }

                temp = list[i];
                list[i] = list[min];
                list[min] = temp;
            }
}

4- Exponential Algorithm:
This is the super slow of this list of four. It grows exponentially (that is, for 2 items it takes 2 ^ 2, for 3 items it takes 2^3, for 4 items it takes 2 ^ 4, etc.
Again algorithms are the key to computer science, the most fun part of programmer's job. Choose your algorithms carefully and always try to improve.

Hope this helps.

Wednesday, January 13, 2010

Remove Duplicate Items From an Array - A Classic Puzzle

Today I cam a cross a kinda cool problem. A friend of mine who is at the same time a colleague working with me at the same office was trying to remove a duplicate item from an array of positive integers.
The array has n items all unique except only one item. We need to come up with an algorithm that tells us which item is duplicated in maximum time of
O(n) where n is the length of the array.

To do that, let's first come up with some (less efficient) working algorithms. Here's the first one:

We loop over the array starting from the item at index 0, then we traverse the remaining part of the array (1 : n-1) searching for the first item
The C# code for this algorithm looks like the following:

  static int FirstDuplicate(int[] arr)
         {
             for(int i = 0; i < arr.Length - 1; i++)
             {
                 for(int j = i + 1; j < arr.Length; j++)
                 {
                     if (arr[i] == arr[j])
                         return arr[i];
                 }
             }
             return -1;
        }

As you can see, this algorithm is pretty bad. Foreach item in the array an inner loop is initiated to linerally look for that specific element in the
rest of the array. If you do the math, you shall find that this algorithm runs in O(n2) order of growth.

One way to improve this is to use an extra HashSet to store the items, and then look up each item in the HashSet. This is considered improvement as the
lookup inside the HashSet is really fast.
Here's the code in C#:

 static int FirstDuplicateWithHashSet(int[] arr)
        {
            HashSet hashHset = new HashSet(); 
            for(int i = 0; i < arr.Length; i++)
            {
                if (hashHset.Contains(arr[i]))
                    return arr[i];
                hashHset.Add(arr[i]);
            }

            return 0;
        }

This is pretty good, but still not O(n).
The next algorithm is quite tricky. The idea simply is to create a second array and insert each elemnt in the first array at an index equivalent to its
value in the second array.
For example if we have a list of 5 items [2, 4, 5, 2, 6], where the item 2 is duplicated at the 0 index and third index, and the maximum value in this
array is 6. Now to find the duplicates in this list we create a second list with length 6 ( the length of the second array equals the maximum value in
the first array). After creating this second list we loop over the first array take the first item (2 in our case) and insert it at the index 2 in
the second array, then we take the second item (which is 4) and insert it at the 4th index in the second array, and so on. Each time we try to insert
an item in the second array we check if it has a value first, if it is, then this item is duplicated.
here's how the code would look like:

static int FirstDuplicate(int[] arr, int maxVal)
        {
            int[] temp = new int[maxVal+1];
            for(int i =0; i < arr.Length; i++)
            {
                if (temp[arr[i]] == arr[i])
                    return arr[i];
                temp[arr[i]] = arr[i];
            }
            return 0;
        }

Note: The method expects the maximum value as an input, however, if you don't get the maximum value you can create the temp array with a size
equal to int.MaxValue (which is not a good idea)

This algorithm is probably not realistic but it doesn run in O(n) time.
One more trick to add here, if you know the range of the items in the array (e.g. from 1 to 10) you can get the sum of the numbers of the array
then subtract from it the sum of the numbers from 1 to 10, the remainder is the duplicated value.

That was a quick tip that I thought is cool and wanted to share with ya! So what do you think dear fellows? Do you know of any possibly better
algorithms? Do you suggest any optimization to the current ones?