igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] Re: Calculating measures for a bunch of ego-networks


From: Magnus Torfason
Subject: Re: [igraph] Re: Calculating measures for a bunch of ego-networks
Date: Thu, 06 May 2010 10:36:32 -0400
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4

Thanks for your suggestion Gábor,

Here is a somewhat lengthy reply, and I hope it is a bit useful, even if it does apply to the 0.5.2-2 version.

I ran the example (my vertex info is already in the form of IDs, so I had to use get.edgelist(g, names=FALSE), and as can be seen below, it is in fact much faster than the %--% method, but it is interesting to see why it is so - it seems to be due to edge sequence subsetting.

==========================
> el <- get.edgelist(g, names=FALSE)
> system.time({ res.ownroll.num  =
            which(el[,1] %in% myfrom & el[,2] %in% myto)-1 })
   user  system elapsed
   0.19    0.01    0.20
> system.time({ res.ownroll.seq  =
      E(g)[ which(el[,1] %in% myfrom & el[,2] %in% myto)-1 ]  })
   user  system elapsed
   2.11    0.46    2.59
> system.time({ res.official     = E(g)[ myto %--% myfrom ] })
   user  system elapsed
   1.83    0.47    2.37
> res.ownroll.num
[1]      0      1      2 136174
> res.ownroll.seq
Edge sequence:

[0]      307939736  -- 100000978
[1]      1101644524 -- 100000978
[2]      2078085089 -- 100000978
[136174] 2078085089 -- 307939736
> res.official
Edge sequence:

[0]      307939736  -- 100000978
[1]      1101644524 -- 100000978
[2]      2078085089 -- 100000978
[136174] 2078085089 -- 307939736
==========================

Constructing an edge sequence from the four item vector takes two seconds here, which is pretty long. I do wonder if this has anything to do with the attributes (as I said, I do have quite a few attributes on both edges and vertices).

This is good information, but .2 seconds per vertex are still quite a long time when I have to loop through everything, as would be expected since this is a brute-force search through a vector of length ecount(g).

I do wonder if E(g,P=...) and are.connected() rely on a similar brute force search. I am thinking surely not, since they are so much faster. As can be seen by the fourth method I applied to solve this problem (and I think this is a general method of solving the %--% in a way that is much faster than the brute force way):

==========================
> system.time({
+     temp.func = function(x, myto)
+     {
+         ego = x[1]
+         others = x[-1]
+         as.vector(t(cbind( ego, others[ others %in% myto ] )))
+     }
+
+     temp = unlist( sapply( neighborhood(g, 1, myfrom),
              temp.func, myto=myto ) )
+     # Here, we are pretty much ready, we have found all the pairs.
+     # The following is only needed to get rid of duplicates in
+     # the resulting edge list, since dupes cause E(g,P=...)
+     # to error out
+     temp = matrix(temp, ncol=2, byrow=2)
+     temp = t(apply(temp,1,sort))
+     temp = temp[!duplicated(temp), ]
+     temp = as.vector(t(temp))
+
+     res.ownroll.fast = E(g, P=temp)
+ })
   user  system elapsed
      0       0       0
>
> res.ownroll.fast
Edge sequence:

[0]      307939736  -- 100000978
[1]      1101644524 -- 100000978
[2]      2078085089 -- 100000978
[136174] 2078085089 -- 307939736
==========================

Again, I find it very interesting that E(g, P=temp) is much much faster than E(g)[edge.ids]. It seems that there must be some problem with the subsetting operator [] for edge sequences. One last code sample here:

==========================
> full.edge.sequence=E(g)
> system.time(full.edge.sequence[100000]) # !
   user  system elapsed
   0.11    0.00    0.11
> system.time(full.edge.sequence[c(100000,100001)]) # !!!!!
   user  system elapsed
   3.20    0.08    3.37
==========================

Now, unfortunately all of the above applies to version 0.5.2-2.

I have downloaded the 0.6 version of igraph and am looking into installing it, but I am finding the prospect a little bit daunting. I'm on Windows, and have never installed R packages from source. Are there directions on the web somewhere specific to igraph, or is this a matter of installing R-tools and that whole environment as per the (somewhat lengthy) guidelines for developing your own packages on Windows?

Best regards,
Magnus


On 5/6/2010 8:34 AM, Gábor Csárdi wrote:
Magnus,

you can try the 0.6 development version, it has some improvements for
V() and E() and also for %--%. You can download it from
http://code.google.com/p/igraph/downloads/list

If you want all edges that connect two vertex sets, and your graph is
not changing much, then you can query the edge list of the graph and
extract the edges from there:

el<- get.edgelist(G)
which(el[,1] %in% myfrom &  el[,2] %in% myto)-1

On a second thought, this is more or less what %--% is doing, so I am
not sure that it will be any faster.

Please tell me if the 0.6 version solves your problem. Thanks,
G.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]