igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] 'decompose.graph' versus 'clusters'


From: Gabor Csardi
Subject: Re: [igraph] 'decompose.graph' versus 'clusters'
Date: Mon, 21 Jul 2008 03:40:14 -0500
User-agent: Mutt/1.5.17+20080114 (2008-01-14)

David,

this is what decompose.graph does:

1) the same as clusters, finds the connected components of the graph.
2) for each component it does the following:
   - copies the whole graph.
   - removes the vertices that are not this component, this involves 
     creating a new igraph object and subsetting vertex/edge attributes.

All these operations are O(V+E), but the ones in 2) are performed 
once for each component. 

I had some good reasons writing decompose.graph this way, 
namely to minimize the number of functions that use the igraph data 
structures directly, and to handle vertex/edge attributes properly.
But probably you're not interested in the details.

You're right, it is too slow, it should be faster. I'll 
do something about it, but this is not a five-minute job.

Btw. you can speed it up a bit if you remove attributes from 
the graph. I assume you want to keep track of the vertices, 
so try removing everything except for a numeric vertex identifier 
attribute.

Btw.2. you can try running the new betweenness.estimate even 
without decomposing the graph first. Try some smaller graphs 
first.

I'll let you know if there is something new, or you can also 
follow the issue here:
http://code.google.com/p/igraph/issues/detail?id=94

Gabor

On Sat, Jul 19, 2008 at 10:38:51AM -0700, David Hunkins wrote:
> Thanks Gabor. Don't you take a break on the weekends?!  I'm going to try what
> you suggest, since that seems the best path to find the most interesting
> 'connectors' in the large clusters. By eliminating clusters up to size 5, I 
> can
> get rid of 73% of the clusters, which should improve on the O(c(E+V))
> calculation. I am loath to kill the clusters up to size 10 since that
> represents 'interesting' green growth of the social network (and only kills
> another 10%). Also, thanks for the new version of betweenness--I'll let you
> know the results. Because 'clusters' appears to be performing so well, could
> you look at the following and tell me what you think?
> 
> I had done some tests on 'clusters' versus 'decompose.graph', and I found the
> following table of results, which suggest to me that somehow 'clusters' is 
> very
> much faster than 'decompose.graph':
> 
> clusters decompose.graph
> 5k .005sec 1.155sec
> 10k .008sec 3.824sec
> 20k .008sec 12.257sec
> 5M 2.905sec <did not finish after 24 hours>
> 
> 'Clusters' is amazingly fast. I hadn't realized that constructing the subgraph
> from a large graph would be so computationally intensive compared to just
> identifying the members of a cluster. But I wonder, if the subgraph were
> constructed at the same time as the members of the cluster are detected, would
> it run much faster? I realize I'm in over my head, so feel free to let that 
> one
> go...
> 
> Dave
> 
> David Hunkins
> address@hidden
> im: davehunkins
> 415 336-8965
> 
> 
> On Jul 19, 2008, at 2:18 AM, Gabor Csardi wrote:
> 
> 
>     Hi David,
> 
>     yes, you're right, decompose.graph is not O(V+E), it is in fact O(c(V+E)),
>     where 'c' is the number of components, I'll correct that.
> 
>     'clusters' gives back the membership of the vertices, it is
>     in the 'membership' component, so you could use this to create subgraphs.
>     But it does not make sense, since this is exactly what decompose.graph is
>     doing, so it will be just as slow.
> 
>     What you can try, is to eliminate the trivial components from your graph
>     first, i.e. the one with one, two, vertices, maybe up to ten, and
>     then (if there are much less components left) decompose the graph.
>     Remember,
>     however, that you cannot run betweenness on a graph with hundred thousend
>     vertices or more. Most networks have a giant component, so if you have
>     5M vertices in the full graph, you might still end up with 1M in the
>     largest component. Check this first with 'clusters'.
> 
>     I've been working on speeding up betweenness.estimate, it is much
>     better now, but of course I'm still not sure that it is fast enough
>     for your graph, it depends on the graph structure as well, not
>     only on the size of the graph. You can give it another try, here
>     is the new package:
>     http://cneurocvs.rmki.kfki.hu/igraph/download/igraph_0.6.tar.gz
> 
>     I think a viable approach could be to
>     1) eliminate the small clusters from the graph
>     2) decompose the remainder into components
>     3) run betweenness.estimate on the components, with cutoff=2, or 3.
>     It is a question, however, whether such a small cutoff is enough.
> 
>     Speeding up decompose.graph has been on the TODO list for long,
>     I gave more priority now.
> 
>     G.
> 
>     On Fri, Jul 18, 2008 at 01:31:35PM -0700, David Hunkins wrote:
> 
>         Hi, I'm working on a large disconnected graph (5M vertices, 10M edges,
>         500k
> 
>         clusters). I'm trying to reduce the time it takes to compute
>         betweenness for
> 
>         each vertex by breaking the graph up into connected components.
>         Decompose.graph
> 
>         does this in a very convenient way, since it returns graph objects 
> that
>         I can
> 
>         run betweenness on:
> 
> 
> 
>         comps <- decompose.graph(g10k)
> 
>         for (i in 1:length(comps)){
> 
>         write(rbind(V(comps[[i]])$id,betweenness(comps[[i]])),file="outfile",
>         nc=2, sep
> 
>         =",", append=TRUE)
> 
>         }
> 
> 
> 
>         However decompose.graph is very slow compared with clusters, which
>         appears to
> 
>         do the same thing in almost no time. (I can compute no.clusters on my
>         graph in
> 
>         a few seconds, whereas decompose.graph, run on the same graph, does 
> not
>         finish
> 
>         in 24 hours.) The docs for the C functions indicate that  'clusters'
>         and
> 
>         'decompose.graph' both have O(V + E) time complexity, but I have not
>         found this
> 
>         to be true.
> 
> 
> 
>         It appears that others have selected 'clusters' to partition large
>         graphs:
> 
> 
> 
>         http://lists.gnu.org/archive/html/igraph-help/2007-12/msg00046.html
> 
> 
> 
>         Does anybody have some R 'glue code' that makes clusters return a list
>         of
> 
>         graphs like decompose.graph does? (I'm an R newbie.) Or other
>         suggestions /
> 
>         clarifications?
> 
> 
> 
>         Thanks,
> 
> 
> 
>         Dave
> 
> 
> 
>         David Hunkins
> 
>         address@hidden
> 
>         im: davehunkins
> 
> 
> 
> 
> 
> 
> 
> 
>         _______________________________________________
> 
>         igraph-help mailing list
> 
>         address@hidden
> 
>         http://lists.nongnu.org/mailman/listinfo/igraph-help
> 
> 
> 
>     --
>     Csardi Gabor <address@hidden>    UNIL DGM
> 
> 
>     _______________________________________________
>     igraph-help mailing list
>     address@hidden
>     http://lists.nongnu.org/mailman/listinfo/igraph-help
> 
> 

> _______________________________________________
> igraph-help mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/igraph-help


-- 
Csardi Gabor <address@hidden>    UNIL DGM




reply via email to

[Prev in Thread] Current Thread [Next in Thread]