Translate

Monday, February 27, 2012

Algorithm Complexity



Runtime complexity of an algorithm is always a huge confuse for me when I was at the college. Further, the math of variously termed Big O notation was hard to realize in practice. Trying to find a definition from Google, I would agree “Big O notation is used in Computer Science to describe the performance or complexity of an algorithm”. Generally Big O describes the worst-case scenario, and can further be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm. I found the best way to understand Big O thoroughly was to produce some examples in code, which in fact not my own original thought. For Big O, I generally would compare the complexity of algorithm comparative to the input. Big O notation denotes the limiting factor of an algorithm. It’s a simplified expression of how run time of an algorithm scales with relation to the input.

For example (in Java):

/** takes an array of strings and concatenates them

* @param strings the array of strings to concatenate

* @returns a string that is a result of the concatenation of all the strings

* in the array

*/

public static String badConcat(String[] Strings){

String totalString = "";

for(String s: strings){

for(int i = 0; i < s.length(); i++){

totalString+=s.charAt(i);

}

}

return totalString;

}

Now think about what this is actually doing. It is going to throw every character of input and adding them together. This seems straightforward. The problem is that String is immutable. So every time you add a letter onto the string you have to create a new String. To do this you have to copy the values from the old string into the new string and add the new character. This means You will be copying the first letter n times where n is the number of characters in the input, you will be copying the character n-1 times... so total there will be (n-1)*n/2 copies. This is (n^2-n)/2 and for big O notation we use only the highest magnitude factor(usually) and drop any constants that are multiplied times it and we end up with big O(n^2). Using something like a StringBuilder will be along the lines of O(nLog(n)). If you calculate the number of characters at the beginning and set the capacity of the StringBuilder you can get it to be O(n). So if we had 1000 characters of input, the first example would perform roughly a million operations. The simple StringBuilder would perform 10,000, and the StringBuilder with set capacity would perform 1000 operations to do the same thing. This is rough estimate, but O(n) notation is about orders of magnitudes, not exact run times.

O(1)

O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set.

bool checkFirstBlock(String[] strings)

{

if(strings[0] == null)

{

return true;

}

return false;

}

O(N)

O(N) describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set. The example below also demonstrates how Big O favors the worst-case performance scenario; a matching string could be found during any iteration of the for loop and the function would return early, but Big O notation will always assume the upper limit where the algorithm will perform the maximum number of iterations.

bool ContainsValue(String[] strings, String value)

{

for(int i = 0; i < strings.Length; i++)

{

if(strings[i] == value)

{

return true;

}

}

return false;

}

O(N2)

O(N2) represents an algorithm whose performance is directly proportional to the square of the size of the input data set. This is common with algorithms that involve nested iterations over the data set. Deeper nested iterations will result in O(N3), O(N4) etc.

bool ContainsDuplicates(String[] strings)

{

for(int i = 0; i < strings.Length; i++)

{

for(int j = 0; j < strings.Length; j++)

{

if(i == j) // Don't compare with self

{

continue;

}

if(strings[i] == strings[j])

{

return true;

}

}

}

return false;

}

O(2N)

O(2N) denotes an algorithm whose growth will double with each additional element in the input data set. The execution time of an O(2N) function will quickly become very large.

Logarithms

Binary search is a technique used to search sorted data sets. It works by selecting the middle element of the data set, essentially the median, and compares it against a target value. If the values match it will return success. If the target value is higher than the value of the probe element it will take the upper half of the data set and perform the same operation against it. Likewise, if the target value is lower than the value of the probe element it will perform the operation against the lower half. It will continue to halve the data set with each iteration until the value has been found or until it can no longer split the data set.

This type of algorithm is described as O(log N). The iterative halving of data sets described in the binary search example produces a growth curve that peaks at the beginning and slowly flattens out as the size of the data sets increase e.g. an input data set containing 10 items takes one second to complete, a data set containing 100 items takes two seconds, and a data set containing 1000 items will take three seconds. Doubling the size of the input data set has little effect on its growth as after a single iteration of the algorithm the data set will be halved and therefore on a par with an input data set half the size. This makes algorithms like binary search extremely efficient when dealing with large data sets.

Oh, and do I use it?

Yes, all the time - when I'm figuring out how efficient my code is it gives a great 'back-of-the-envelope- approximation to the cost. Obviously, you may find 100 examples in internet for Big O. They may include polynominal, factorial, logarithm and other boring math. However many experienced developers I often use to consult confused at why they really need this information and could that be useful for them to save the day. Well, what the Big-O Is Good For? The good news is that the Big-O belongs to an entire family of notation. It also serves as a good indicator of what algorithm to use once you take your individual circumstances into consideration. I do use Big O notation on occasion to convey algorithmic complexity to fellow programmers. I use the underlying theory (e.g. Big O analysis techniques) all of the time when I think about what algorithms to use.

In a nutshell, the Big-O of a given algorithm combined with the specific problem knowledge is a great way to choose the best algorithm for your situation. I also agree that the Big-O lives in the land of theory and doesn't care very much about the real world.

So why is Big-O commonly associated with worst-case running times, and why is that imprecise?

It's because when considering the worst possible case, it is natural to give a limit on how bad that worst case can be, not how good it can me. That is, we want to give an upper bound on its degree of badness. Similarly, we often want to give a lower bound on how good the best-case is (i.e, even on good inputs, there's still a limit on how fast the algorithm can go; what is that limit?), so Big-Omega gets associated with best-case.

That's why Big-O gets associated with worst-case running times and Big-Omega with best-case. And it's true that if someone just says "the running time" is O(n^2), then n^2 is indeed "closer" to the worst-case running time than to the best-case running time, in the sense that n^2 is "bigger" than all possible running times, and the worst-case running time is "bigger" than the best-case running time. But O(n^2) doesn't mean that the worst-case running time actually is n^2, just that it is at most n^2.

Myths about Big O

You cannot use Big-O to compare the speed of two algorithms. Big-O only says how much slower an algorithm will get (approximately), if you double the number of items processed or how much faster it will get, if you cut the number in half.

However, if you have to entirely different algorithms and one (A) is O(n^2) and the other one (B) is O(log n), it is not said that A is slower than B. Actually with 100 items, A might be ten times faster than B. It only says that with 200 items, A will grow slower by the factor n^2 and B will grow slower by the factor log n. So if you benchmark both and you know how much time A takes to process 100 item and how much time B needs for the same 100 items, and A is faster than B, you can calculate at how many items B will overtake A in speed (as the speed of B decreases much slower than the one of A, it will overtake A sooner or later, this is for sure).

Math of Big O

'Big-O' notation is used to compare the growth rates of two functions of a variable (say n) as n gets very large. If function f grows much more quickly than function g we say that g = O(f) to imply that for large enough n, f will always be larger than g up to a scaling factor.

It turns out that this is a very useful idea in the analysis of algorithms, because we are often precisely concerned with the growth rates of functions which represent, for example, the time taken by two different algorithms. Very coarsely, we can determine that an algorithm with run-time t1(n) is more efficient than an algorithm with run-time t2(n) if t1 = O(t2) for large enough n which is typically the 'size' of the problem - like the length of the array or number of nodes in the graph or whatever.

This stipulation, that n gets large enough, allows us to pull a lot of useful tricks. Perhaps the most often used one is that you can simplify functions down to their fastest growing terms. For example n^2 + n = O(n^2) because as n gets large enough, the n^2 term gets so much larger than n that the n term is practically insignificant. So we can drop it from consideration.

However, it does mean that big-O notation is less useful for small n, because the slower growing terms that we've forgotten about are still significant enough to affect the run-time.

What we now have is a tool for comparing the costs of two different algorithms, and a short hand for saying that one is quicker or slower than the other. Big-O notation can be abused which is a shame as it is imprecise enough already! There are equivalent terms for saying that a function grows less quickly than another, and that two functions grow at the same rate.

Tuesday, February 14, 2012

Network Security !!!

At a time when Iran safely highjack a UAV, and China was accused on attacks for pentagon website, I was invited to do a presentation to the EDEXL – HNDE final year students. They were so curious to know what the attacks are ?, and how complex these attacks are?. Once been a security sustain engineer for Microsoft's R&D, it was a loving feel to share my experience with them. Many believe that different kind of those painful hacker attacks are based on some super brains and super technologies. But I believe it is not. Many attacks we faced today are just careful thoughts and skilled Phishing – I suggested.

My teaching approach was to develop an attack tree with the students. First we attack a single computer, then a point to point network of 2 machines - finally internet...



Friday, February 3, 2012

Agile Fad

Many businesses I am consulting these days are undergoing a severe wave of fad called "Agile methodology in to practice" at their either projects or non-project operations. My observation is that only a handful amount of small business organizations have been able to be success from the prospects of true Agile. Like almost similar to many other cases, many businesses start to practice this kind of revolutionary perceived new ideas, without a proper study due to the illusive faddy emotional appeal which compel to see Agile as a modern trend for success. However I believe this project management term called Agile was originally originated in the software industry due to various reasons, including

  1. High employee turnover: As a result since employees stay in one company for very short periods like one to two years, the project managers have less time to train the engineers and ramp up the new recruits.

  2. Enhancement in availability of industrial knowledge : Software developers are no longer fall in to a niche. Homogeneous level of basic skills and technical knowledge of the fresh graduates and recruited individuals due to high standardization and the huge competition in the industry. There fore companies don't have to worry much about acquiring the right talents and adopting them. Eg: An entry level graduate Java programmer with SCJP would have the basic skills to grasp the simple basic things n use his own skills to survive in an average project.

  3. Dynamic nature: From the requirements to the technologies, software industry is the industry which is most dynamically changing. Today’s technology may be obsolete tomorrow. Therefore employees have to adopt themselves to be self-learned or else employers have to recruit new employees with sufficient knowledge on the new technologies. Out of which latter is not feasible for many businesses, as it is unethical and in many countries according the laws you can’t fire employees to the reason that technology changes. Hence flexibility is required to train at every aspect of the business from including requirement identification, system development and test, delivery and maintenance etc.

At the early days of software business, different software development methodologies like waterfall, spiral, and iterative came in to practice. However no universal software development model came to cater the greatest ever changing challenges with in the industry identified above.
Therefore software organizations started to practice blend of above methods, to have the competitive edge over their rivalries. This is how agile emerge and evolved. Nowadays agile is the most passionate term in the project oriented or even in non-project oriented organizations. However as I realize most of these organization are perceived to follow agile due to their perceptual temptation and not with a visionary strategic relevance to the business. Therefore I have decided to discuss my experience in brief about what really agile is and how organization can actually benefit out.

In Sri Lanka Wesak is the festival which is celebrated greatly islandwide to celebrate most important event of the Sri Lankan Buddhist community – variously term birth, enlightment and death of Lord Buddha. Preparing a Wesak lantern for the sacred Wesak Poya day is the tradition of most Sri Lankans disregarding the religion.
The Wesak lantern they made out of eight bamboo square frames called "Ata Patama" is the most famous of all. Generally many Sri Lankans must have atleast seen and know how to make this 8 fold lantern called “Ata Pattama (අට පට්ටම)”. When the month of may reached, in every village, groups of both adults and kids come to the temple keeping few days ahead to the great Poya day and start to prepare these lanterns. Generally the established tradition in our temple is that nobody takes leadership to whole this program but may be the chief priest or somebody senior gentleman in the village will say that we are going to prepare this number of lanterns, from the budget they have. Labor and many other resources are freely volunteered from the families of the village. Villegers get togetther and naturally form few teams and will start preparing these eight fold lanterns. It is hard to find an actual leader who is delegated to this task. The task will not take more than two three days of work. Everybody is happy to contribute with their own ideas and eventually and situationally each individual may take the leadership of different tasks without any preassignment. People or team members generally obey the situational emergence of individual at the proper time. For example some may be good at finding bamboos, so he take the ownership and leadership for the task of collecting bamboos. Other may be good at cutting bamboos, He automatically rises to the leadership to the situational requirement and also trains others with basic knowledge to cut bamboos and have the frames with the correct length. Another may be good at making the bamboo frames. He was automatically kept to the ownership/leadership of preparing the frames. While one takes the implicit ownership of the task all others support him n participate in related activities. Beauty is that there is nobody to assign roles or control and monitor. Nobody to evaluate performance. Yet things happen smoothly and precisely like somebody coordinate and control from behind. In parallel the new commers (specially small kids) will develop the required skills in to them. These skills included preparation of gum, cutting the papers, decorating the papers, what to dos and what not to dos etc. No individual is let alone. Automatically each will get a chance to show their performance at some place. The performers will be recognized and rewarded with laughter and smiles during their success. Sometimes the failures are kindly humiliated at times. The positivity is that it discourage pad performance. If it is understood that somebody need further training/assistance a person who have experience will explicitly help without a call. If one is failed at a place the same may be very good at other works. So generally in a cyclical wave everybody are getting enough chances to learn, show and be recognized with what they are capable of. Decorating time is the time which is most crucial of the event. People who have the greatest experience and talents will eventually and automatically come to front. There will also be an implicit competition between the multiple teams as well. That will enhance the performance in terms of beauty and in terms of speed of completion. Knowledge and experience is effortlessly distributed among the participants, well communicated and shared among the teams. Limited Resources are getting equally and efficiently used. For example for 5 teams 21 people had only one line cutter and just one coil of aluminum wire. A solution came. One guy came forward. Start to cut the wires with the proper length and distribute it among others who use these wire pieces to bind the frames. Dynamic resources and requirement are managed as and when they arise. There are also automatic groups which form automatically at every other needs of these lantern teams. Eg; Food and refreshments. The nature of such teams is that, they eventually self organized yet properly managed by default. No pre assignments are there in the temple for Wesak lanterns campaign.

"Atapattama (අට පට්ටම)
However this doesn’t mean that there are no conflicts at all. But within very short most these conflicts are resolved and objectives are achieved. Chief priest (the product owner) often visit the site, enjoy the time with the lantern teams in the temple, obviously motivate them and will provide some requirements in a participative manner than a bureaucratic authoritarian. Look at his tone ... " Shall we have a lantern to the road? I want a lantern to the mango tree in front .., let’s put the other lantern on the Sermon room … " so n so. Whatever with everything, these team members are always participative decision makers.

Finally the great Poya day arrives. Lanterns are ready. Team members came to the temple with family members. Wow!!! Everybody is nicely and peacefully dressed. Time has up to celebrate the achievement. Happy teams light the lanterns and hook them nicely. All are happy. Time to go to the shrine rooms!!!


This is repeating in most temples in the Sri Lankan villages for generations. I believe the above provides a very good yet a practical example for agile from local culture.


Now let me draw the facts beside this scheme.


  1. It is obvious that everybody had a general idea about what is happening, how to do and what to do ... etc
  2. Each individual rose to the throne automatically and contingently base on the requirement and ability to cater the emerged requirement. Scrum master is the facilitator. Facilitator or the leader emerged situationally. (eg: those who good in the team for decoration will lead the decoration operation). But not self-appointed nor team appointed. Neither pre-assigned. Yet automatic.
  3. No explicit control. Otherwise motivation and positive feel is everywhere. There will be a product owner (Chief priest). But he is rather a motivator than a dictator or a requirement provider.
  4. Self-managed. Each individual feel what is to be done next? Not preplanned. Yet were flexible to take whatever the task it needs. Each individual learn things them self. Self-conducive environment is there. Everybody is reluctant to criticize. Instead motivated to participate and learn.
  5. Nobody is isolated. Sense of belonging to the task is everywhere. Every body felt it was there own work. Every body thought it is their duty which can not deny.
  6. Faults are tolerated, but not the careless faults. Self unimprovement will cause humiliation and demotivated. Learners draw others attention for help. Non participants were eventually dropped or find other places to move if they feel hard to get on. However nobody is punished and not very painfully hurt.
  7. Recognition is in air. Reward is appreciation.
  8. Strong trust and understanding about each individual.
  9. Finally achievements were celebrated by everybody with laughter and Serenity.
  10. After all, be careful !!! This schema is so Developer oriented. Hence application of agile should be carefully brought to the business where money matters.

Provided the above ten factors derived from practical rural village temple annual campaign for Atapattama (අට පට්ටම)Wesak lantern, the very similar kind of same shape model project management approach in western is now called Agile. Yes, there’s a big difference. In agile you are not volunteered. Instead you are paid. Money is partially or completely involved as a factor for motivation and it is indeed use to reward the recognized. Taking the above Atapattama same example, it must also know that the same model will not work if the Lantern is a shape of Lotus. Only a very few in the village could understand the complex nature of making a lotus shape lantern and very limited people have hands on experience with making such. This challenging nature could be a motivator to one but a demotivator to many others in the village lantern crew. There will be conflicts about what to do, when to do, how to do and why we do etc.? For sure time will not be properly managed. Teams will be working overtime through many sleepless nights. But they may not finish their work towards the end of Wesak night. That is why you see most these complex shaped lanterns in the village are lit at a day after the Wesak day. In contrast most the time what you see in the temple on the Poya day are these simple lantern models like Ata pattama, Star and buckets etc. However half-finished complex shaped Lotus lanterns are also not a very rare view of the Poya night in the village. The teams have to come across the complexity and unawareness of what to dos, how to dos etc. They should be explicitly practiced with mockups, or find the volunteer experts. Teaching the team and training them will require lot of time and resources. On the other hand lot of deformations and rework is there. Waste is high. Continuous troubles including sleepless nights could demotivate team members and may induce arguments and frictions. So it is evident that an agile team to be success, the individuals must have homogeneous levels of competencies, mentality and attitudes. The designs and solutions should not be extremely complex. From the beginning such teams must develop noncomplex solutions which everybody can easily absorb without a big effort to avoid fatigue at the latter phases of your project.

You see this lantern schema is very practical in co-operate world today. Not knowing what really is agile, people deform their organizations and loose the potential momentum at least at the level what they had before. One other great mistake is that from the term Scrum many get the bad view. This may be due their cultural and work related mentality from their previous work experience.
They tend to think Agile is like a pressured scrum you have seen in rugby union. In a rugby scrum: each member is allocated with a completely different task based to the place he stands. Eg. Two Prop forwards: loose head got a different role, while tight head keep tight. Hooker is expected to hook the ball through the correct channels; Second rowers are expected to jump at different places to get the ball. In scrum they have to correctly channel and control the ball while covering and supporting the big front rows. Likewise all members in the scrum have predefined roles and duties during the match. Don’t you think this is exactly what is happening in modern cooperation’s under the term agile? Nobody is self-managed; instead their role is concretely predefined and hard coded to their mentality by the scrum master. Scrum masters believe they are flexible. But just think!!! How can you be flexible when you hard code precise diverse directions and instructions to the heads of your scrum members? How are you going to bring the expected flexibility to the team? Therefore dear Scrum leaders, let it come automatically. Facilitate it to be automatic and eventual than prejudiced, predefined and judgmental. Please be patient and listen to see what happen. Then you will realize that enough nice ideas and simple solutions are pouring from your people than you thought before.