Coding, and use of Data Structures:

A general trend of CSE/IT students is in believing that coding is very different than learning basic c/c++. Clearly there is a difference but the fact remains that language that we use is only a “Language” that we use to work out the problem. The problem solving

is like puzzle solving, considering the fact that a computer can compute and iterate faster than us, so the best way that I personally prefer is to try to solve by hand for a very small test case, then generalize the algorithm for larger test case. Problem comes when we do not know how to apply what we used to solve on a computer. There comes the use of data structures.

People are often confused about when and which data structure to use in solving a problem. Especially we, Indian students have a kind of fear of using complex data structures because we think they are unmanageable and also typically by seeing the 2-3 page codes in Yashwant Kanetkar for implementing each data structure. The thing is the data structures are used to achieve 2 purposes, the way to store a data (or Information about data), and a way to sequence our execution. Hence comes the famous LIFO, FIFO

and priority based approaches.

Clearly the problem statement imposes several restrictions to the above approach, i.e. by increasing the size of a key variable so that the solution has time, or space constraints. Then the algorithm needs to be optimized. Sometimes whole new approach is needed. And lastly some problems are just to check whether we can solve a problem at all or not, there the BRUTE FORCE technique works. Quite surprisingly, many of the ICPC problems can be solved by this technique. We just need to present an algorithm that solves the problem, and constraints are minimal.

To show the power of data structures, just consider one of the most trivial algorithm, the MST(or minimum spanning tree) problem. One way we solve it is by start solving from 1 node and including a node per iteration (Prim’s Algorithm).

At each step we have a MST of given nodes, and we increment the number of nodes at each step considering the edge of current MST to the outside graph with minimum cost. Here the red line represent current choices of edges that can be taken, and the blue circle shows chosen minimum node. Now drawing this on paper is fairly easy to solve a problem, however while implementing it in a c/c++ program, we have to use some data structure to

represent the graph, the lines, the circles.

There are various methods to represent the graph, here we consider data structure for line and circle. The driver of the algorithm is the line because out of the edges the line cut, we chose the edge with minimum cost, (1)so if we only have to solve it without considering time much, we can just add a flag for each edge whether it is on the line or not. In each node that is added, we unflag all the edges of the node connecting to previous selected

nodes, and flag the edges connected for node to the nodes not yet included in the MST.

But for every iteration, we parse through all the edges to select minimum cost edge that is flagged. A better approach is (2)considering the fact that each outside node can have only one edge connected to inside graph with minimum cost on the line. So instead of flagging all the edges, we can add edge number,(or link) to each outside node, and update for every edge connected to node with less cost.

In this method, we parse through all the nodes for finding minimum cost. Instead, (3)we can maintain a list of all the edges on the line and add or remove the edges on each addition of node.Then in each iteration find the minimum.

This method can further be simplified by(4) using a priority queue to get minimum directly instead of parsing the complete list. Now we may note that the drastic effect of choice of data structure.

For each case we iterate N times(N=number of nodes), find minimum cost edge on the line, update the line and circles. Consider the time complexity of each method (E=number of edges, C=connected edges to the node added, L=number of edges on the line, can be reduced to min cost edge per outside node hence nearly n/2).

- N*(E + C)
- N*(N +C)
- N*(L +C)
- N*(1+C*log L)

Where range of E,L and C are

E=(N,N*N ),

L=(0,N), avg.N/2

C=(1,N)

That’s why David J. ha rightly said,

“Get your data structures correct first, and the rest of the program will write itself..”