Re: [igraph] Faster way to detect duplicate graphs in a vector ptr?

Hi,

first, all this would be much-much simpler in R or Python, so you might consider using a high level language.

Second, you can define a hash function on the graph, and then calculate that for each graph. If the hash function is good, then you don't need to compare most pairs of graphs, because their hash will be different if they are different. If their hash is the same, then you need to compare them. This can be done by querying their edge lists, sorting them and then comparing them.

An additional trick would be to sort the graphs according to their hash, and then only compare neighboring elements in the list with the same hash.

Gabor

On Tue, Jan 28, 2014 at 5:29 PM, Timothy Rice <address@hidden> wrote:

Hello,

I have a question which may be a little similar to the backtracking matrix
question, although I don't think the solution will be the same.

TL;DR my code has a bottle neck when repetitively calling a function that
checks an igraph_vector_ptr_t for graphs identical to another given graph.

Overall, the program is intended to count the number of candidate
aggregates of two input trees. The algorithm starts with the edge
intersection, then incrementally add edges from the symmetric difference
whenever this preserves acyclicity.

My initial implementation didn't check for duplicates so the count
ended up greatly exaggerated. Now that I'm checking for duplicates, each
iteration requires re-checking the vector of previously discovered trees.
The duplicate_check function takes the vector ptr and two matrices. One of
the matrices gives the adjacencies of the new candidate tree. The other
matrix is used in a loop. It holds the adjacencies from the vector of
existing trees. The code is as follows:

igraph_bool_t duplicate_found(igraph_vector_ptr_t *N,
igraph_matrix_t *adj_g,
igraph_matrix_t *adj_N) {
long int n;
for(n = 0; n < igraph_vector_ptr_size(N); ++n) {
igraph_get_adjacency(VECTOR(*N)[n], adj_N, IGRAPH_GET_ADJACENCY_UPPER, 0);
if(igraph_matrix_all_e(adj_g, adj_N)) {
return 1;
}
}
return 0;
}

If the function returns 0, no duplicates were found, so the graph
corresponding to adj_g can be added to the end of the vector N.

My code runs okay for small input trees, say up to 10 nodes, but as the
powerset of the symmetric difference grows something like exponentially,
things start slowing down a lot after that. When I compile with
"-pg" and run gprof, I see that the biggest bottleneck is the above
duplicate-checking function. Therefore, I'd like to squeeze a bit more
efficiency out of that function if possible.

It has occurred to me that at the same time as adding new trees to the
vector N, I could also be adding their matrices to another vector; or I
could define a structure that contains both a graph and the graph's
adjacency matrix, and make an igraph_vector_ptr_t that holds instances of
this struct. This would save recomputing *adj_N more than once per tree.

Another idea was to try using threads to check multiple trees in N
simultaneously.

Unfortunately, implementing either of these ideas would add a lot more
complexity to my code. I'd rather avoid so much overhead if I can help it.

So my question is, is there some faster way of scanning the vector of trees
for duplicates that isn't going to add crazy extra complexity to my program?
To be honest, I'm a little pessimistic that any simple solution exists, but
maybe I've overlooked something due to inexperience with C and igraph.

Cheers,

Tim Rice
--
PhD Candidate
A: Room 210, Richard Berry (Maths & Stats), Melbourne University
E: address@hidden
W: http://www.ms.unimelb.edu.au/~trice
G: https://github.com/cryptarch

_______________________________________________
igraph-help mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/igraph-help

From:	Gábor Csárdi
Subject:	Re: [igraph] Faster way to detect duplicate graphs in a vector ptr?
Date:	Tue, 28 Jan 2014 21:25:36 -0500