Re: [igraph] Performance issue regarding when calculating induced

Hi Tamás,

First of all thank you for your reply and again I also would like to thank you for the personal consult. We tried your solution and after we removed the name attribute from our graph it seems like the calculations will be done within a reasonable time. However, when we run the exact codes of yours we got quite different results:

> n = 15000

> radius = 0.2 / ((n/100) ** 0.5)

> g = grg.game(n, radius)

> cl = label.propagation.community(g)

> system.time(lapply(groups(cl), function(x){induced.subgraph(g, x)}))

user system elapsed

1.14 0.00 1.14

> V(g)$name = 10000000:(10000000+n-1)

> cl = label.propagation.community(g)

> system.time(lapply(groups(cl), function(x){induced.subgraph(g, x)}))

user system elapsed

8.86 0.08 8.93

> V(g)$name = sapply(10000000:(10000000+n-1), toString)

> cl = label.propagation.community(g)

> system.time(lapply(groups(cl), function(x){induced.subgraph(g, x)}))

user system elapsed

1.46 0.04 1.51

We got the biggest slowdown when the type of the name attribute was numeric instead of using string attribute as you did at the third time.

And there is another odd thing using the authority.score function. First, below you can see our script :

# deleting variables

rm(list=ls())

# if not installed then install.packages("igraph")

# if not installed then install.packages("plyr")

library("igraph")

library("plyr")

# set working diractory

setwd("********************")

# reading and creating graph

g_in = read.csv("SNA_05_Net.csv", sep=" ")

g = graph.data.frame(g_in, directed=FALSE)

node_list = as.matrix(as.numeric(V(g)$name))

write.table(node_list, file="SNA_Node_List.csv", quote=FALSE, sep="§", col.names=FALSE)

g = remove.vertex.attribute(g, "name")

# creating clusters -- cluster_optimal?

clust = groups(cluster_label_prop(g, weights=E(g)$weight))

# exporting clusters

cl = ldply(clust, data.matrix)

write.table(as.matrix(cl), file="sna_R.csv", quote=FALSE, sep="§", col.names=FALSE)

# creating sub-graphs

g_sub = lapply(clust, function(x){induced.subgraph(g, x)})

# creating cluster/subgraph KPIs

auth = lapply(g_sub, function(x){authority_score(x)$vector})

auth_scr = ldply(auth, data.matrix)

write.table(as.matrix(auth_scr), file="sna_R_ath_scr.csv", quote=FALSE, sep="§", col.names=FALSE)

edg = lapply(g_sub, function(x){edge_density(x, loops=FALSE)})

edg_dens = ldply(edg, data.matrix)

write.table(edg_dens, file="sna_R_edg_dens.csv", quote=FALSE, sep="§", col.names=FALSE)

dgr_o = lapply(g_sub, function(x){degree(x, mode=c("out"), loops=FALSE)})

dgr_out = ldply(dgr_o, data.matrix)

write.table(as.matrix(dgr_out), file="sna_R_dgr_out.csv", quote=FALSE, sep="§", col.names=FALSE)

dgr_i = lapply(g_sub, function(x){degree(x, mode=c("in"), loops=FALSE)})

dgr_in = ldply(dgr_i, data.matrix)

write.table(as.matrix(dgr_in), file="sna_R_dgr_in.csv", quote=FALSE, sep="§", col.names=FALSE)

eigv = lapply(g_sub, function(x){eigen_centrality(x)$vector})

eigv_cent = ldply(eigv, data.matrix)

write.table(as.matrix(eigv_cent), file="sna_R_eigv_cent.csv", quote=FALSE, sep="§", col.names=FALSE)

el = lapply(g_sub, function(x){as_edgelist(x)})

edg_list = ldply(el, data.matrix)

write.table(as.matrix(edg_list), file="sna_R_subgraps.csv", quote=FALSE, sep="§", col.names=FALSE)

Everything is working fine except the „lapply(g_sub, function(x){authority_score(x)$vector})” statement, because there is one group where the authority_score function fails. This cheeky bastard is the number 293863 cluster. If I run the „lapply(g_sub[1:293862], function(x){authority_score(x)$vector})” or the „lapply(g_sub[293864:length(g_sub)], function(x){authority_score(x)$vector})” statements they are working fine but when I run the „lapply(g_sub, function(x){authority_score(x)$vector})” statement I got the same error message when I run „lapply(g_sub[293863], function(x){authority_score(x)$vector})”. This is the error message:

„Error in .Call("R_igraph_authority_score", graph, scale, weights, options, :

At arpack.c:944 : ARPACK error, No shifts could be applied during a cycle of the Implicitly restarted Arnoldi iteration. One possibility is to increase the size of NCV relative to NEV”

I made a google search to understand what causes the probem, but I didn’t find anything useful. Maybe I can find something in the arpack manual but I definitely need more time for that. Here are the details about the subgraph of group 293863:

> g_sub[293863]

$`293863`

IGRAPH U-W- 4 3 --

+ attr: weight (e/n)

+ edges:

[1] 1--2 1--3 1--4

> E(g_sub[[293863]])$weight

[1] 270 5677 3032

I don’t see why the authority_score function can’t run on that kind of graph (this is a classical star schemed graph and I think there are many of them because there are about 440 000 clusters/subgraphs)

I hope you can send us some kind of solution for this problem. Thanks in advance.

Best regards,

Adam Sohonyai

From:	AaaSDFfff
Subject:	Re: [igraph] Performance issue regarding when calculating induced_subgra
Date:	Mon, 2 May 2016 17:31:19 +0200 (CEST)

Re: [igraph] Performance issue regarding when calculating induced_subgra