Reach all vertices

Can I reach all vertices from one vertex?

In another programming challenge from Rosalind I was asked to find a vertex, if any, from which all other vertices are reachable. In a DAG.

The very first solution that comes to mind is to simply go through the list of edges and have a couple of sets recording the FROM vertex and TO vertex of each edge.

After reviewing all the edges, a simple set difference (FROM - TO) will yield the vertices that have no incident edges.

If this difference set has more than one element, that means that those vertices won't be reachable from those other vertices.

This looks very easy, lets look at a couple of examples first and see if we can spot some weird cases.

In [1]:
# Here I will set up networkx for the images and graph samples.
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np

def showgraph(graph):
    np.random.seed(0)
    G=nx.DiGraph()
    for node, edges in graph.items():
        for edge in edges:
            G.add_edge(node, edge)

    pos=nx.spring_layout(G)
    nx.draw_networkx_nodes(G, pos)
    nx.draw_networkx_edges(G, pos, arrows = True)
    nx.draw_networkx_labels(G, pos)

    plt.axis('off')
    plt.show()

Let's define 4 graphs. 1. Graph with cycles. 2. Graph with multiple vertices with no incident edges. 3. Ideal graph for our naive solution. 4. Graph with two separate cycles.

In [2]:
graph1 = {
    0:set([1]),
    1:set([4, 2]),
    2:set([3]),
    3:set([1]),
    4:set([0, 3])
}
showgraph(graph1)
In [3]:
graph2 = {
    0:set([5]),
    1:set([5]),
    2:set([5]),
    3:set([5]),
    4:set([5]),
    5:set([5])
}
showgraph(graph2)
In [4]:
graph3 = {
    0:set([1]),
    1:set([3]),
    2:set([1]),
    3:set([0]),
    4:set([2, 5]),
    5:set([])
}
showgraph(graph3)
In [5]:
graph4 = {
    0:set([1, 4]),
    1:set([2]),
    2:set([0]),
    3:set([4]),
    4:set([5]),
    5:set([3])
}
showgraph(graph4)

Graph 1

graph1 presents a situation where we can use our naive approach to infer the solution. Why? because every vertex appears at least once in the from and to sides of an edge. The set difference in the end will yield an empty set.

An empty set with our approach means that we can use any vertex as the starting point and we could reach every other vertex using the cycles.

Graph 2

graph2 is clearly a good graph for our naive approach since the difference set will have more than one element, meaning that there is no way to reach all vertices.

Graph 3

This graph is a good fit for our approach since the difference set will leave us with the vertex labeled 4. From which we can reach all other vertices.

Graph 4

graph4 is a case where our naive approach falls short. This is basically a mixture between graph1 and graph3. In this kind of graph we have a couple of cycles with one edge from one cycle to the other.

For graphs that have a structure like the one presented on graph4. This is what I did.

My approach

Making use of the Strongly Connected Components (I talked about it in Strongly Connected Components in a graph) on a graph I reduced and simplified a graph from something like graph4 to something more like graph3. Running Tarjan's algorithm over graph4 will leave me with something like graph5 (below), where I can easily use the naive approach and pick any of the vertices from the mega-vertex on the simplified graph.

In [6]:
graph5 = {
    "0, 1, 2":set(["3, 4, 5"]),
    "3, 4, 5":set([]),
}
showgraph(graph5)

From this new representation, we know that any node from the "0, 1, 2" vertex is a good answer for our problem.

In general, the steps were: 1. Reduce and simplify the graph with Tarjan's SCC algorithm. 2. Create the FORM and TO sets with the mega-vertices or components. 3. Check the FROM-TO set for elements, if it has one, pick any element from that component; if it has zero elements, pick any element from the whole graph; if it has more than one, there is no solution.

Note:

Remember Tarjan's SCC Algorithm I presented for the other problem at Strongly Connected Components in a graph.

And here is the implementation for Tarjan's Algorithm I used before.

In [7]:
def tarjan(graph):
    #input: graph G = (V, E)
    #output: Array of strongly connected components (sets of vertices)
    n = len(graph)
    sccs = []
    index = [0]
    indexes = [-1] * n
    lows = [float('Inf')] * n
    S = []

    def strongconnect(v):
        # Set the depth index for v to the smallest unused index
        indexes[v] = index[0]
        lows[v] = index[0]
        index[0] += 1
        S.append(v)

        # Consider successors of v
        for chld in graph[v]:
            if indexes[chld] == -1:
                # Successor chld has not yet been visited; recurse on it
                strongconnect(chld)
                lows[v] = min(lows[v], lows[chld])
            elif chld in S:
                # Successor w is in stack S and hence in the current SCC
                lows[v] = min(lows[v], lows[chld])

        # If v is a root node, pop the stack and generate an SCC
        if lows[v] == indexes[v]:
            scc = set([v])
            w = S.pop()
            while w != v:
                scc.add(w)
                w = S.pop()
            sccs.append(scc)

    for v in graph.keys():
        if indexes[v] == -1:
            strongconnect(v)

    return sccs
In [8]:
def vertices_that_reach_all(graph):
    scc = tarjan(graph)
    ncomp = len(scc)

    if ncomp > 1:
        sources = set()
        dests = set()
        for incomp in range(ncomp):
            for node in scc[incomp]:
                for edge in graph[node]:
                    for outcomp in range(ncomp):
                        if incomp != outcomp and edge in scc[outcomp]:
                            sources.add(incomp)
                            dests.add(outcomp)
                            break

        sources.difference_update(dests)

        if len(sources) == 1:
            comp = sources.pop()
            return list(scc[comp])
        else:
            return [-1]
    else:
        return graph.keys()
In [9]:
for num, graph in enumerate([graph1, graph2, graph3, graph4]):
    print "graph", num , vertices_that_reach_all(graph)
graph 0 [0, 1, 2, 3, 4]
graph 1 [-1]
graph 2 [4]
graph 3 [0, 1, 2]

This method gives us what we expected to be, and for the test cases for the challenge on Rosalind, it ran pretty well.

Implementation notes:

4 nested loops is a piece of code you usually never want to see and I am sure that changing the graph representation would simplify that procedure. In the meantime what I am trying to achieve with this is the following:

The loops are basically creating the FROM and TO sets, but to get to that simplification, I need to go through every vertex on every component. From every to-vertex on the edges, add the component instead of the vertex.

  1. Visit every component.
  2. Visit every vertex on the component.
  3. Visit every edge for the vertex.
  4. Visit components to see if the other vertex is a member.

One way to remove one loop would be to add a 'lookup table' where we could easily get the component of a given vertex.

Any comments? Know another way to solve this problem? Let me know on @sergiobuj

social