Saturday, 19 March 2016

toposort using queue theory

Define the indegree of a vertex as the number of edges pointing to it.
  • A node of indegree 0 can be placed at the start of the sorted order, since there is clearly nothing that must precede it.
  • There must be at least one vertex of indegree 0.
    • If not, the graph has a cycle and no topological sort exists.
  • Once we have placed all nodes of indegree 0 in the list, we can then add all nodes whose indegree would be zero except for edges from the nodes already placed.
Repeating this process yields a topological sort.
Here is the code for a topological sort.
Let's consider it a step at a time.
In step 1, we get the indegrees of all the vertices, putting them into a map whose key type is Vertex and whose associated data isunsigned. I've chosen to use a hash_map for this code. hash_map is a hash-table implementation of the standard map interface.hash_map isn't actually part of the standard library, but it is a commonly available addition. If I wanted to stay entirely within the standard library, I could just use map, but at the cost of somewhat slower access. Alternatively, I could (and have, in the past) implemented a vector-based implementation of the map interface that simply uses the Vertex::id() function to index into the vector, which would give me true O(1) access time.
As we do this, we also add any vertices whose indegree is zero into a queue, q.
In step 2, we repeatedly remove vertices from the queue and add them to the sorted list output, sorted. We can do this because we know that there is nothing in the graph that needs to come before these vertices.
We then look at the outgoing edges of each vertex, and reduce the inDegree values of the neighboring vertices to simulate having removed v from the graph. If doing this causes any of their (simulated) indegrees to become zero, we add them to the queue, because we know that there is nothing remaining in the graph that needs to come before these vertices.
Finally, in step 3, we check to see if all the vertices have been removed from the graph and placed into the sorted list. If so, we have successfully found a topological sort. If not, then no topological sort is possible (the graph must have a cycle).
Try running a topological sort.

Analysis

In analyzing this algorithm, we will assume that the graph is implementing using adjacency lists, and that the inDegree map is implemented using a vector-like structure.
In step 1, therefore, we see that everything in the loop body is O(1). The loop itself goes around once for every vertex in the graph. Following our definition that says that a graph G=(V,E) is a set V of vertices and a set E of edges, we can say that the number of iterations of this loop is |V|, the number of vertices.
We therefore conclude that the entire step 1 loop is O(|V|).
(We could also write this as O(g.numVertices()), but |V| is shorter and is the usual way that people describe graphs.)
Looking at the inner loop of step 2, we see that everything in the body is O(1).
And the simple statements in the outer loop (everything except for the inner loop) are O(1).
Now, at this point our normal copy-and-paste approach breaks down. The number of iterations of the inner loop may be different for each vertex visited by the outer loop.
But, let's just stop and think about what's going on here.
  • Each vertex goes into the queue at most once
  • So, in a successful sort, the outer loop will execute |V| times, once for each vertex.
  • The inner loop simply visits the edges emanating from the vertex being visited by the outer loop.
  • So if the outer loop visits every vertex, and the inner one visits every edge leaving that vertex, over the course of all the outer loop iterations, the inner loop will visit every edge in the graph exactly once.
So the statements in the body of the inner loop get executed |E| times; the other statements in the outer loop get visited |V| times. Since all of these statements are O(1), the total cost is O(|V|+|E|).
In step 3, the only non-trivial operations is clearing the sorted list (done when we can't find a solution). Since this list is actually a list of vertices that we have successfully sorted, it contains at most |V| elements, and so the clear operation is O(|V|).
That makes the worst case for the step 3 if statement O(|V|) as well.
So the total cost of the topological sort is O(|V| + |E|)
(Note: this does not mean that topological sorts are faster than conventional sorts --- the number of edges can be as high as |V|2, so this is actually more comparable to the slowest of our conventional sorting algorithms. That's the penalty we pay for working with partial orders. We don't write the cost of this algorithm as O(|V|2), though, because the number of edges varies widely in practical problems, and we may sometimes know that |E| will be far less than that maximum, so O(|V| + |E|) is a more accurate portrayal of the behavior.

No comments:

Post a Comment