Thursday, January 21, 2010

A Misleading Name of Computer Science Concept: Dynamic Programming

Have you heard of these words before?
Probably yes!
First let's make a clear (well, not so clear) difference, clearer. Dynamic Programming is not about Dynamic Typing. Dynamic typing is a property of  a particular programming language. Languages like Lisp, Python, and Ruby are all dynamically typed languages. Unlike those, languages like C/C++, Java, and C# are statically typed language. The difference relies on whether the compiler checks certain things (types, overload resolution, etc.)  before running the program or not.
Dynamic Programming on the other hand is something very different. It's an ancient method for solving problems by basically dividing these problems into smaller, and easier to solve problems.

It's mainly focused on addressing these two issues:

  1. Overlapping subproblems 
  2. Optimal Substructures
I think the best way to explain these two fuzzy concepts is by using examples. To start let's see an example of using a famous dynamic programming technique called  Memorization. Assume that we need to implement a function that calculates a Fibonacci sequence. I know this is easy, but let's look at this first implementation:
static int FibClassic(int n, ref int numberOfStepsTaken)
        {
            numberOfStepsTaken += 1;
            if (n <= 1)
                return 1;
            Console.WriteLine("FibClassic called with: {0}", n);
            return FibClassic(n - 1, ref numberOfStepsTaken) + FibClassic(n - 2, ref numberOfStepsTaken);
        }
Note: on the above code I used two counter variables to count the number of times the method executed. I also printed the input on which the method is called every time, just to give you a hint of how dividing a problem can cause the same subproblem to be computed more than once --Subproblem Overlapping, remember! Now let's run the code given the number 6 as input and see what happens:
int z = 0;
            int x = FibClassic(6, ref z);
            Console.WriteLine("{0}: {1}", x, z);

And here's the result of this call:

As you can see the method has been called first with input 6 which is the initial input we passed in, then 5, 4, 3,  2, 2, ... oops!! Can you spot it? the method is being called on the same input more than once! Think about this for a while, pretty logical, yeah!? Our strategy is based on dividing the problem into simpler subproblems. For example to get the 6th item in the Fibonacci sequence we divide the problem into two smaller problems, getting the 5th item, and 4th item and adding them together, then to get the 5th, you should get the 4th and 3rd, etc. The next figure shows how is this working.



As you notice the overlapping happens when solving one part of the problem includes solving another part of the problem, in such a case we can take advantage of this and simply memorize the solution for the overlapped problem and each time we need that result, we don't have to compute it again, we just supply it from wherever we stored it. Here's the modified method to do it:
static int FibFast(int n, ref int numberOfStepsTaken, Dictionary store)
        {
            numberOfStepsTaken += 1;
            if (n <= 1)
                return 1;
            if (!store.ContainsKey(n))
                store[n] = FibFast(n - 1, ref numberOfStepsTaken, store) + FibFast(n - 2, ref numberOfStepsTaken, store);
            return store[n];
        }

If you run that same method on the same input (6) you should get 13 as the result (the same old result) but the number of iterations would be 11 which is about half the number of iterations the first method take. This doesn't seem to be a very huge enhancement, but let's see how the two methods act for bigger numbers. Running the first method of input equals to 30 we get the result  1346269 and number of iterations  2692537. Now running the second method on the same input (30) we get the result 1346269 -which is the same result- and number of iterations 59! That's a HUGE difference!

Now back to core, what does this have to do with the term Dynamic Programming??
Actually, it's a very misleading term, historically it was invented by a mathematician called Bellmen and he was at the time being paid by the US defense department to work on something else. He didn't want them to know what he was doing, so he made up  a name that he was sure it has no clue what it actually meant. Now we have to live with this forever :)

The Zen Of Python

I found these fantastic guiding principles of the design of Python published here by Tim Peters, and  once I saw them, I was like, Man! I ought to share these, so look them up on the site, or if you don't want to leave programming for COWARDS just yet, I'm quoting them here for you ;)  :

The Zen of Python


Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Those quotes are pure gold. One could think of writing them down on a board and keep that board hanged in front of his eyes, right above the computer screen!

Awesome, awesome stuff!

Tuesday, January 19, 2010

Into Git and Loving It

As it is gaining popularity day by day,and as everyone I know likes it, I thought .. well, let's give it a try.
Git is a powerful source control system. It's free, and open source. It's pretty simple to use, it's intended to handle both small and large projects, and it's so freakingly FAST.



Unlike SVN and TFS, Git is not centeralized. This means that every Git clone is a fully fledged repository with all versions and history information available.

Git is written in C (well, mostly) and is available for many platforms. If you are a linux guy you can download Git from here and if you are a Windows fellow then download msygit.

This post is aimed to be a tutorial introduction to Git. Together we will walk through the process of creating the repository, adding initial files to the repository, committing those files, changing them, creating branches, viewing differences, resolving conflicts, merging and finally pushing to the centeralized repository.

Now to get started let's assume that you want to create a simple project which is basically an html file, a bunch of javascript files and a css file. Pick whatever directory you wish to create your initial files. After you're done right click the directory containing your files and select "Git Bash Here" (assuming that you already installed msygit). A command window will show up, to initialize your repository enter:
git init-db
This will create the repository for you (and will also create a default "master" branch). Now enter:
git add -a
to add all the files in this directory to the repository. You can select specific files by specifying the required file name like so:
git add filename.css
The selected files are now tracked by Git to your repsoitory, to commit those files to the repository (the LOCAL repository, remember?) :
git commit -a
This will open vim for you to enter a commit message. Enter your message, and quite vi, then Git will commit your changes to the DB.

Note: 
If you're not familiar with vim or vi, to quite the editor saving the changes type ":wq" -that's colon wq- or ":q" to quit without saving. Or you can specify -m to the git commit command followed by the message you want in double quotes, like so: 
git commit -a -m "my initial commit"

Now try editing some files and then, to view the changes, type:
git diff
This will highlight the changes between your uncommitted version and the last committed version. If you want to see the history of your changes at any time, user:
git whatchanged
or use:
git whatchanged -p
to see the complete differences at each change.

The very cool thing about Git is how easily it enables you to create new branches. For example, if you want to create an expreimental function inside a class file you're working on or inside a javascript file but you want it away from the master branch, you can easily create a new branch, checkout this branch, edit your file, and all the changes will be completely isolated from the master branch. To create a new branch:
git branch testBranch
git branch
The first command will create testBranch for you and the second command will list all the available branches on your repository (by far there should be only testBranch and the master branch "master").

Note: The selected branch has an * displayed before it. 


Now enter :
git  checkout testBranch
to switch to the newly created branch. Try editing somefiles, and show the diffs.
git diff
You should now see the changes you made, commit those changes. And switch to the master branch:
git commit -a
git checkout master
If you look at the files you just edited when you were on testBranch -after switching to the master branch- you would notice that the changes you made while you were on testBranch are gone. However, if you switch to the testBranch again you will see your changes there. If you're happy with the changes you made on testBranch, you can merge those changes to the master branch by using the following command:
git merge testBranch
If there is no confilcts, you're allset. If there are conflicts you will have to resolve them manually and then commit the file. Git will show you which files have conflicts.
Now, if you are done with the test branch and want to delte it, use:
git branch -d testBranch

If you used to be a TFS and VSS guy like myself, take a deep sigh of relief and enjoy Git.
I will try to blog more on Git on the next posts, stay tuned!

Saturday, January 16, 2010

The Selected ORM and Isolation Framework

A while ago I posted two little polls about the favorited isolation framework, and ORM. The results of the two posts were as follows

Which ORM?
1- NHibernate 49%(44 votes)
2- LLBLGEN 34%(31 votes)
3- Entity Framework 6%(6 votes)
4- LINQ to SQL 6%(6 votes)
5- Subsonice 2%(2 votes)

Which Isolation Framework?
1- Typemock Isolator 50%(10 votes)
2- Moq 45%(9 votes)
3- NMock2 5%(1 votes)
4- RhinoMocks 0%(0 votes)
5- Stubs 0%(0 votes)

For myself I voted for NHibernate and Moq. I know all of those tools are purely awesome, though.


So, dear reader, do you have any other suggestions for an ORM or an Isolation Framework?

Thursday, January 14, 2010

A Quick Tip: The Different Classes Of Algorithms

Algorithms are the heart of computer science. They are the thoughts, the ideas, and the most fun part of this industry. Scientists categorize the various known algorithms into 4 classes: Logarithmic, Linear, Quadratic and Exponential. Let's look at those briefly:
1- Logarithmic Algorithms:
   This type is the fastest of those 4 classes. It has a run curve that looks something like this:




Where the x here is the number of items to be processed by the algorithm, and y is the time takes by the algorithm. As you can see from the figure, the time taken increases slowly when the number of items to be processed increase. Binary Search is a perfect example for a logarithmic algorithm. If you recall, binary search, divides the array into two halves and excludes one half each time it does a search. Here's a code example of binary search implementation in C#:
bool BSearch(int[] list, int item, int first, int last)
{
 if(last - first < 2)
             return list[first] == item || list[last] == item;
             
        int mid = (first + last)/2;
        if(list[mid] == item)
             return true;
             
        if (list[mid] > item)
                return BSearch(list, item, first, mid - 1);
 
 return BSearch(list, item, mid + 1, last);
}

2- Linear Algorithms:
  Runs in time, linear to the input items. It's curve looks like the following:



A linear search is a typical example of that, where one would traverse the array or, whatever data structure, item by item. An implementation of linear search looks like this:

bool LinearSearch(int[] list, int item)
{
     for(int i = 0; i < list.Length; i++)
          if(list[i] == item)
              return true;
        return false;
}
3- Quadratic Algorithm:    This one runs time grows to to the power of 2, with each increase in the input sizes, which means while processing 2 items the algorithm will do 4 steps, 3 items will take 9 steps, 4 items will take 16 steps, etc. The next figure shows how a quadratic curve might look like:


A famous example of an algorithm with quadratic growth is Selection Sort:

void SelectionSrot(int[] list)
{
            int i, j;
            int min, temp;

            for (i = 0; i < list.Length - 1; i++)
            {
                min = i;

                for (j = i + 1; j < list.Length; j++)
                {
                    if (list[j] < list[min])
                    {
                        min = j;
                    }
                }

                temp = list[i];
                list[i] = list[min];
                list[min] = temp;
            }
}

4- Exponential Algorithm:
This is the super slow of this list of four. It grows exponentially (that is, for 2 items it takes 2 ^ 2, for 3 items it takes 2^3, for 4 items it takes 2 ^ 4, etc.
Again algorithms are the key to computer science, the most fun part of programmer's  job. Choose your algorithms carefully and always try to improve.

Hope this helps.

Tips For Web Developers: Minify Javascript Using Google's Closure Compiler

A faster web site is a goal for every develoepr. We spend a lot of time optimizing server code and processes, parallelizing stuff, indexing
and trying to enhance database performance. The goal behind all of these complicated actions is the ultimate goal: A Faster Site.
These actions are necessary and handy to decrease your site response time, however, there are a lot of stuff that we, developers, usually ignore. Those are the optimizations required to happen on the client side. These client side optimizations are as important (probably more imprtant) as the server side optimizations.
There are a lot of techniques to optimize the client side performance. These techniques include:


  • Making Less HTTP Requests
  • Optimizing JavaScript
  • Optimizing CSS


And many more. For full details about all the possible techniques, check out Google's page speed initiative.
In this post I will focus on optimizing javascript by minifying js files. For this I leverage Closure Compiler, which is a very awesome tool to optimize Javascript developed by Google.
Now suppose that I have an html document that looks like the following:




This document shall do a very little job actually. Simply a user keys his name in the textbox and clicks salute me which we will display a simple alert saying hello to this user.

Below are two buttons -hide, and show. As the name of each button implies, the hide button will hide the area including the label, text box, and the Salute me button.

To do this I'm gonna make use of jQuery 1.3.2. Here's the code needed to achieve the required functionality:
/*
 The follwoing code is not part of jqueyr framework
*/
$(document).ready(function() { 
 $('#hiFiver').click(function() { 
  var userName = $('#txtName').val();
  alert ("Hello " + userName); 
 });
 
 $('#showButton').click(function() { 
  $('#hidden').show();
 });
 
 $('#hideButton').click(function() { 
  $('#hidden').hide(); 
 });
});
 For the sake of this demo I will append my code at the end of the actual jQuery code itself. Now my app is working as required, and my Javascript file  size is 122 KB.

 Now let's run Closure Compiler and try to minify this.
 Closure Compiler is a java application, which means that you will need jre (Java Runtime Engine) to run it. You can download it from here

 Now I got my files (sample.html, script.js, compiler.jar) in one directory. To run the compiler, launch your terminal, and enter the follwoing command:

 java -jar compiler.jar --js script.js --js_output_file scriptMini.js

 Check your directory. You should find a new file with the name "scriptMini.js" created. The new script file is 55 KB in size, which is less the half size  of the original file. To make sure it's working, change the script src attribute in the sample document to point to the new file. You should see that  the app is still functioning  properly.

 If you examin the newly created file, you will find that it doesn't contain any comments or spaces, all the optional semi-colons are removed, variable names are  changed to shorter names (usually one character long).

 The closure complier is a fascinating tool, it's really handy and easy to use.
 From now on you should develop the habit of always minifing your Javascript files.

Wednesday, January 13, 2010

Why is O(n) Is Pronounced "Big Oh" of n And Why Is "Geek" So Close To "Greek"?



This one is really short, Today, some guy on the internet asked why is the asymptotic form O(n) is pronounced Big Oh? I instantly answered "because we are using the capital letter O to write it".

This answer is wrong, the letter used is the Greek letter Omicron.
So that's why it's pronounced Big Oh!




And the same reason applies to Big Theta, and Big Omega, because of using the Greek letters Theta and Omega represented in the above picture (the second and third figure respectively). It's all coming from the Greeks buddy!
And probably this is also why the word Geek is so close to the word Greek in spelling, I'm not sure about this one though.


Remove Duplicate Items From an Array - A Classic Puzzle

 Today I cam a cross a kinda cool problem. A friend of mine who is at the same time a colleague working with me at the same office was trying to remove a duplicate item from an array of positive integers.
 The array has n items all unique except only one item. We need to come up with an algorithm that tells us which item is duplicated in maximum time of
 O(n) where n is the length of the array.

 To do that, let's first come up with some (less efficient) working algorithms. Here's the first one:

 We loop over the array starting from the item at index 0, then we traverse the remaining part of the array (1 : n-1) searching for the first item
 The C# code for this algorithm looks like the following:
  static int FirstDuplicate(int[] arr)
         {
             for(int i = 0; i < arr.Length - 1; i++)
             {
                 for(int j = i + 1; j < arr.Length; j++)
                 {
                     if (arr[i] == arr[j])
                         return arr[i];
                 }
             }
             return -1;
        }
 As you can see, this algorithm is pretty bad. Foreach item in the array an inner loop is initiated to linerally look for that specific element in the
rest of the array. If you do the math, you shall find that this algorithm runs in O(n2) order of growth.

One way to improve this is to use an extra HashSet to store the items, and then look up each item in the HashSet. This is considered improvement as the
lookup inside the HashSet is really fast.
Here's the code in C#:

 static int FirstDuplicateWithHashSet(int[] arr)
        {
            HashSet hashHset = new HashSet(); 
            for(int i = 0; i < arr.Length; i++)
            {
                if (hashHset.Contains(arr[i]))
                    return arr[i];
                hashHset.Add(arr[i]);
            }

            return 0;
        }
This is pretty good, but still not O(n).
The next algorithm is quite tricky. The idea simply is to create a second array and insert each elemnt in the first array at an index equivalent to its
value in the second array.
For example if we have a list of 5 items [2, 4, 5, 2, 6], where the item 2 is duplicated at the 0 index and third index, and the maximum value in this
array is 6. Now to find the duplicates in this list we create a second list with length 6 ( the length of the second array equals the maximum value in
the first array). After creating this second list we loop over the first array take the first item (2 in our case) and insert it at the index 2 in
the second array, then we take the second item (which is 4) and insert it at the 4th index in the second array, and so on. Each time we try to insert
an item in the second array we check if it has a value first, if it is, then this item is duplicated.
here's how the code would look like:

static int FirstDuplicate(int[] arr, int maxVal)
        {
            int[] temp = new int[maxVal+1];
            for(int i =0; i < arr.Length; i++)
            {
                if (temp[arr[i]] == arr[i])
                    return arr[i];
                temp[arr[i]] = arr[i];
            }
            return 0;
        }
 Note: The method expects the maximum value as an input, however, if you don't get the maximum value you can create the temp array with a size
 equal to  int.MaxValue (which is not a good idea)

 This algorithm is probably not realistic but it doesn run in O(n) time.
 One more trick to add here, if you know the range of the items in the array (e.g. from 1 to 10) you can get the sum of the numbers of the array
 then subtract from it the sum of the numbers from 1 to 10, the remainder is the duplicated value.

 That was a quick tip that I thought is cool and wanted to share with ya! So what do you think dear fellows? Do you know of any possibly better
 algorithms? Do you suggest any optimization to the current ones?

Wednesday, January 6, 2010

Scalability Tip In ASP.NET And The MachineKey Element

Yesterday, I came a cross a pretty annoying problem with a web application I'm working on nowadays.
In short, the app is an ASP.NET MVC 1 app, that uses forms authentications to handle users logins.
The app was working just fine, but when I started to scale the app and deploy it to more servers in the server farm, the weired behavior started to show up. When a user logs in to the application, the application preforms the operation successfully, logs in the user to the system and takes him to his personalized page. That sounds normal, however, if the user navigated to another part of the application (or just refreshes the current page), the application no longer recognizes him as a logged in user! If he refreshes two or three times, the app will see him as a logged in user again, a few more refreshes and he's nor more logged in and so on.

Similar Problem: 
I encountered  a similar problem before, but the other one was because of accidental session expiration, and this happened because I was saving SessionState InProc, which (as you may have guessed) will be stored on the server memory (i.e. will not be shared between all server in the web farm).

This Problem: 
This problem is different because I'm not using SessionState at all. I'm just using cookies and you know that cookies are stored on the client, and is sent to the server with every request, so it will be sent to all servers (i.e. all servers should be able to read the cookie and determine if the user is logged in or not).

How Cookies Are Written: 
This got me thinking, the problem must be with the cookie itself. It seems like some servers can read the cookie successfully, and some can't. Why would that be?!!

Different Encryption/Decryption Key/Algorithm:
Aha .... Cookies are encrypted before they are written on the client and decrypted before they are read again by the server. When the server that served the login request wrote the cookie, it encrypted it first (using an AutoGenerated encryption key and its chosen encryption algorithm. So apparently the chosen encryption keys and/or algorithms are different across the severs!

The Solution: 
The solution is quite simple actually. All I need to do is to ensure that all the servers use the same encryption algorithms and keys.
This can be done by explicitly specifying the keys and algorithms in web.config inside the machineKey  tag. 
It should look something like this:



PS: The keys lengths depend on the algorithms selected

  • For SHA1, set the validationKey to 64 bytes (128 hexadecimal characters).
  • For AES, set the decryptionKey to 32 bytes (64 hexadecimal characters).
  • For 3DES, set the decryptionKey to 24 bytes (48 hexadecimal characters).


The keys can be generated whatever way you like. Here's a simple function for generating these keys: 




static string GenerateKey(int requiredLength)
    {
        byte[] buffer = new byte[requiredLength / 2];
        RNGCryptoServiceProvider rng = new
                                RNGCryptoServiceProvider();
        rng.GetBytes(buffer);
        StringBuilder sb = new StringBuilder(requiredLength);
        foreach (byte t in buffer)
            sb.Append(string.Format("{0:X2}", t));
        return sb.ToString();
    }





For more information about the tag see here, and here to how to configure it, and if you wanna scroll a full page see this for recommendations about deploying to server farms. 

Hope this helps.

Mix 2010 and My Chosen Sessions

Mix 2010 is open for public!



Microsoft is using a different strategy for a major conference (Mix) this year. Developers and designers can now submit their sessions and those sessions will be voted up by the community. The chooses sessions will be included in the conference.
Here's a list of all the sessions available for voting. The voting started yesterday, Jan 5th. and will last for for 10 days. The selected sessions will be announced Jan 18th.
Go ahead and vote for your session of choice. A lot of sessions out there? Would you like me to recommend some sessions to you?
O.K I will ....
Here are the sessions that I voted for:

Go ahead now, visit VisitMix.com and vote!