igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] Calculating metrics for very large / complex graphs


From: Gábor Csárdi
Subject: Re: [igraph] Calculating metrics for very large / complex graphs
Date: Wed, 14 Jan 2009 14:24:11 +0100

On Wed, Jan 14, 2009 at 2:11 PM, Chris Wj <address@hidden> wrote:
> I am starting to calculate metrics for very large graphs. I am looking to
> calculate betweenness centrality, closeness centrality, average path length,
> edge betweenness and maybe some others. Performing betweenness centrality
> calculation for all vertices on a graph with 300K vertices and 100K edges
> took a 3 GHz processor 53 minutes. I upped the edge count to 600K, left it
> running overnight, and it still hasn't finished (over 16 hours).

These are all (at least) quadratic.

> I was wondering what approaches others have taken to calculate metrics for
> large graphs. I see that there are methods of estimating these numbers, but
> I am not sure the best way to pick the cutoff values for estimation.

Just start with 2 and increase it one by one. But in the 0.5.x version
these functions are not actually that fast, so probably you gain much
only if you use a low threshold.

> Ideally, I want each metric calculation to take no longer than 6 hours to
> perform, obtaining as high a degree of accuracy as possible for those 6
> hours. Could I use the big-o notation for each function to estimate the
> duration and then assign a proper cutoff value?

Not really, at least no by itself, because that notation lacks any
constants. You can probably make some runs with networks of different
sizes, plot a graph of the running times (it should follow the formula
given in the O() notation), and then determine the constants. Keep the
degree distribution, or at least the density of your networks constant
while making the runs.

> What have others done?

You can use different measures, e.g. eigenvector centrality or page
rank.  For average shortest paths you can probably to some sampling.
I've seen a case where they used sampling for the diameter, on
networks with millions of vertices and edges.

Or you can try to parallelize, but that requires writing the code for
yourself. Reusing some igraph code can help you doing this.

G.

ps. I am back

> _______________________________________________
> igraph-help mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/igraph-help
>
>



-- 
Gabor Csardi <address@hidden>     UNIL DGM




reply via email to

[Prev in Thread] Current Thread [Next in Thread]