topological_sort: toposort using queue theory

Define the indegree of a vertex as the number of edges pointing to it.

A node of indegree 0 can be placed at the start of the sorted order, since there is clearly nothing that must precede it.
There must be at least one vertex of indegree 0.
- If not, the graph has a cycle and no topological sort exists.
Once we have placed all nodes of indegree 0 in the list, we can then add all nodes whose indegree would be zero except for edges from the nodes already placed.

Repeating this process yields a topological sort.

// A topological sort of a directed graph is any listing of the vertices
// in g such that v1 precedes v2 in the listing only if there exists no
// path from v2 to v1.
//
// The following routine attempts a topological sort of g. If the sort is
// successful, the return value is true and the ordered listing of
// vertices is placed in sorted.  If no topological sort is possible
// (because the graph contains a cycle), false is returned and sorted will
// be empty.
//
bool topologicalSort (const DiGraph& g, list<Vertex>& sorted)
{
  // Step 1: get the indegrees of all vertices. Place vertices with
  // indegree 0 into a queue.
  hash_map<Vertex, unsigned, VertexHash> inDegree;
  queue<list<Vertex> > q;
  for (AllVertices v = g.vbegin(); v != g.vend(); ++v)
    {
      unsigned indeg = g.indegree(*v);
      inDegree[*v] = indeg;
      if (indeg == 0) 
        q.push(*v);  
    }

  // Step 2. Take vertices from the q, one at a time, and add to sorted.
  // As we do, pretend that we have deleted these vertices from the graph,
  // decreasing the indegree of all adjacent nodes. If any nodes attain an
  // indegree of 0 because of this, add them to the queue.
  while (!q.empty())
    {
      Vertex v = q.front();
      q.pop();

      sorted.push_back(v);

      for (AllOutgoingEdges e = g.outbegin(v); e != g.outend(v); ++e)
        {
          Vertex adjacent = (*e).dest();
          inDegree[adjacent] = inDegree[adjacent] - 1; 
          if (inDegree[adjacent] == 0)
            q.push (adjacent);        
        }
    }
  
  // Step 3:  Did we finish the entire graph?
  if (sorted.size() == g.numVertices())
    return true; 
  else
    {
      sorted.clear();
      return false;
    }
}

Here is the code for a topological sort.
Let's consider it a step at a time.

bool topologicalSort (const DiGraph& g, list<Vertex>& sorted)
{
  // Step 1: get the indegrees of all vertices. Place vertices with
  // indegree 0 into a queue.
  hash_map<Vertex, unsigned, VertexHash> inDegree;
  queue<list<Vertex> > q;
  for (AllVertices v = g.vbegin(); v != g.vend(); ++v)
    {
      unsigned indeg = g.indegree(*v);
      inDegree[*v] = indeg;
      if (indeg == 0) 
        q.push(*v);  
    }

In step 1, we get the indegrees of all the vertices, putting them into a map whose key type is Vertex and whose associated data isunsigned. I've chosen to use a hash_map for this code. hash_map is a hash-table implementation of the standard map interface.hash_map isn't actually part of the standard library, but it is a commonly available addition. If I wanted to stay entirely within the standard library, I could just use map, but at the cost of somewhat slower access. Alternatively, I could (and have, in the past) implemented a vector-based implementation of the map interface that simply uses the Vertex::id() function to index into the vector, which would give me true O(1) access time.
As we do this, we also add any vertices whose indegree is zero into a queue, q.

// Step 2. Take vertices from the q, one at a time, and add to sorted.
  // As we do, pretend that we have deleted these vertices from the graph,
  // decreasing the indegree of all adjacent nodes. If any nodes attain an
  // indegree of 0 because of this, add them to the queue.
  while (!q.empty())
    {
      Vertex v = q.front();
      q.pop();

      sorted.push_back(v);

      for (AllOutgoingEdges e = g.outbegin(v); e != g.outend(v); ++e)
        {
          Vertex adjacent = (*e).dest();
          inDegree[adjacent] = inDegree[adjacent] - 1; 
          if (inDegree[adjacent] == 0)
            q.push (adjacent);        
        }
    }

In step 2, we repeatedly remove vertices from the queue and add them to the sorted list output, sorted. We can do this because we know that there is nothing in the graph that needs to come before these vertices.
We then look at the outgoing edges of each vertex, and reduce the inDegree values of the neighboring vertices to simulate having removed v from the graph. If doing this causes any of their (simulated) indegrees to become zero, we add them to the queue, because we know that there is nothing remaining in the graph that needs to come before these vertices.

// Step 3:  Did we finish the entire graph?
  if (sorted.size() == g.numVertices())
    return true; 
  else
    {
      sorted.clear();
      return false;
    }
}

Finally, in step 3, we check to see if all the vertices have been “removed” from the graph and placed into the sorted list. If so, we have successfully found a topological sort. If not, then no topological sort is possible (the graph must have a cycle).
Try running a topological sort.

Analysis

bool topologicalSort (const DiGraph& g, list<Vertex>& sorted)
{
  // Step 1: get the indegrees of all vertices. Place vertices with
  // indegree 0 into a queue.
  hash_map<Vertex, unsigned, VertexHash> inDegree;
  queue<list<Vertex> > q;
  for (AllVertices v = g.vbegin(); v != g.vend(); ++v)
    {
      unsigned indeg = g.indegree(*v);  // O(1)
      inDegree[*v] = indeg;             // O(1) 
      if (indeg == 0)                   // O(1) = O(1) 
        q.push(*v);                     // O(1) 
    }
    ⋮

In analyzing this algorithm, we will assume that the graph is implementing using adjacency lists, and that the inDegree map is implemented using a vector-like structure.
In step 1, therefore, we see that everything in the loop body is O(1). The loop itself goes around once for every vertex in the graph. Following our definition that says that a graph G=(V,E) is a set V of vertices and a set E of edges, we can say that the number of iterations of this loop is |V|, the number of vertices.

bool topologicalSort (const DiGraph& g, list<Vertex>& sorted)
{
  // Step 1: get the indegrees of all vertices. Place vertices with
  // indegree 0 into a queue.
  hash_map<Vertex, unsigned, VertexHash> inDegree;
  queue<list<Vertex> > q;
  for (AllVertices v = g.vbegin(); v != g.vend(); ++v) // O(1) |V|*
    {
      unsigned indeg = g.indegree(*v);  // O(1)
      inDegree[*v] = indeg;             // O(1)
      if (indeg == 0)                   // O(1) = O(1)
        q.push(*v);                     // O(1)
    }
    <[:]>

We therefore conclude that the entire step 1 loop is O(|V|).
(We could also write this as O(g.numVertices()), but |V| is shorter and is the usual way that people describe graphs.)

bool topologicalSort (const DiGraph& g, list<Vertex>& sorted)
{
  O(|V|)

  // Step 2. Take vertices from the q, one at a time, and add to sorted.
  // As we do, pretend that we have deleted these vertices from the graph,
  // decreasing the indegree of all adjacent nodes. If any nodes attain an
  // indegree of 0 because of this, add them to the queue.
  while (!q.empty())
    {
      Vertex v = q.front();
      q.pop();

      sorted.push_back(v);

      for (AllOutgoingEdges e = g.outbegin(v); e != g.outend(v); ++e)
        {
          Vertex adjacent = (*e).dest();
          inDegree[adjacent] = inDegree[adjacent] - 1; 
          if (inDegree[adjacent] == 0)
            q.push (adjacent);        
        }
    }
    ⋮

Looking at the inner loop of step 2, we see that everything in the body is O(1).

bool topologicalSort (const DiGraph& g, list<Vertex>& sorted)
{
  O(|V|)

  // Step 2. Take vertices from the q, one at a time, and add to sorted.
  // As we do, pretend that we have deleted these vertices from the graph,
  // decreasing the indegree of all adjacent nodes. If any nodes attain an
  // indegree of 0 because of this, add them to the queue.
  while (!q.empty())
    {
      Vertex v = q.front();  // O(1)
      q.pop();               // O(1)

      sorted.push_back(v);   // O(1)

      for (AllOutgoingEdges e = g.outbegin(v); e != g.outend(v); ++e)  // O(1)
        {
          O(1)
        }
    }
    ⋮
}

And the simple statements in the outer loop (everything except for the inner loop) are O(1).

bool topologicalSort (const DiGraph& g, list<Vertex>& sorted)
{
  O(|V|)

  while (!q.empty())
    {
      O(1)
      for (AllOutgoingEdges e = g.outbegin(v); e != g.outend(v); ++e)  // O(1)
        {
          O(1)
        }
    }
   ⋮

Now, at this point our normal copy-and-paste approach breaks down. The number of iterations of the inner loop may be different for each vertex visited by the outer loop.
But, let's just stop and think about what's going on here.

Each vertex goes into the queue at most once
So, in a successful sort, the outer loop will execute |V| times, once for each vertex.
The inner loop simply visits the edges emanating from the vertex being visited by the outer loop.
So if the outer loop visits every vertex, and the inner one visits every edge leaving that vertex, over the course of all the outer loop iterations, the inner loop will visit every edge in the graph exactly once.

So the statements in the body of the inner loop get executed |E| times; the other statements in the outer loop get visited |V| times. Since all of these statements are O(1), the total cost is O(|V|+|E|).

bool topologicalSort (const DiGraph& g, list<Vertex>& sorted)
{
  O(|V|)
  O(|V| + |E|)
  
  // Step 3:  Did we finish the entire graph?
  if (sorted.size() == g.numVertices())
    return true;      // O(1)
  else
    {
      sorted.clear(); // O(|V|)
      return false;   // O(1)
    }
}

In step 3, the only non-trivial operations is clearing the sorted list (done when we can't find a solution). Since this list is actually a list of vertices that we have successfully sorted, it contains at most |V| elements, and so the clear operation is O(|V|).
That makes the worst case for the step 3 “if” statement O(|V|) as well.

bool topologicalSort (const DiGraph& g, list<Vertex>& sorted)
{
  O(|V|)
  O(|V| + |E|)
  O(|V|)  
}

So the total cost of the topological sort is O(|V| + |E|)
(Note: this does not mean that topological sorts are faster than conventional sorts --- the number of edges can be as high as |V|², so this is actually more comparable to the slowest of our conventional sorting algorithms. That's the penalty we pay for working with partial orders. We don't write the cost of this algorithm as O(|V|²), though, because the number of edges varies widely in practical problems, and we may sometimes know that |E| will be far less than that maximum, so O(|V| + |E|) is a more accurate portrayal of the behavior.

topological_sort

Saturday 19 March 2016

toposort using queue theory

Analysis

No comments:

Post a Comment