[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [igraph] Re: Calculating measures for a bunch of ego-networks
From: |
Magnus Torfason |
Subject: |
Re: [igraph] Re: Calculating measures for a bunch of ego-networks |
Date: |
Thu, 06 May 2010 10:36:32 -0400 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 |
Thanks for your suggestion Gábor,
Here is a somewhat lengthy reply, and I hope it is a bit useful, even if
it does apply to the 0.5.2-2 version.
I ran the example (my vertex info is already in the form of IDs, so I
had to use get.edgelist(g, names=FALSE), and as can be seen below, it is
in fact much faster than the %--% method, but it is interesting to see
why it is so - it seems to be due to edge sequence subsetting.
==========================
> el <- get.edgelist(g, names=FALSE)
> system.time({ res.ownroll.num =
which(el[,1] %in% myfrom & el[,2] %in% myto)-1 })
user system elapsed
0.19 0.01 0.20
> system.time({ res.ownroll.seq =
E(g)[ which(el[,1] %in% myfrom & el[,2] %in% myto)-1 ] })
user system elapsed
2.11 0.46 2.59
> system.time({ res.official = E(g)[ myto %--% myfrom ] })
user system elapsed
1.83 0.47 2.37
> res.ownroll.num
[1] 0 1 2 136174
> res.ownroll.seq
Edge sequence:
[0] 307939736 -- 100000978
[1] 1101644524 -- 100000978
[2] 2078085089 -- 100000978
[136174] 2078085089 -- 307939736
> res.official
Edge sequence:
[0] 307939736 -- 100000978
[1] 1101644524 -- 100000978
[2] 2078085089 -- 100000978
[136174] 2078085089 -- 307939736
==========================
Constructing an edge sequence from the four item vector takes two
seconds here, which is pretty long. I do wonder if this has anything to
do with the attributes (as I said, I do have quite a few attributes on
both edges and vertices).
This is good information, but .2 seconds per vertex are still quite a
long time when I have to loop through everything, as would be expected
since this is a brute-force search through a vector of length ecount(g).
I do wonder if E(g,P=...) and are.connected() rely on a similar brute
force search. I am thinking surely not, since they are so much faster.
As can be seen by the fourth method I applied to solve this problem (and
I think this is a general method of solving the %--% in a way that is
much faster than the brute force way):
==========================
> system.time({
+ temp.func = function(x, myto)
+ {
+ ego = x[1]
+ others = x[-1]
+ as.vector(t(cbind( ego, others[ others %in% myto ] )))
+ }
+
+ temp = unlist( sapply( neighborhood(g, 1, myfrom),
temp.func, myto=myto ) )
+ # Here, we are pretty much ready, we have found all the pairs.
+ # The following is only needed to get rid of duplicates in
+ # the resulting edge list, since dupes cause E(g,P=...)
+ # to error out
+ temp = matrix(temp, ncol=2, byrow=2)
+ temp = t(apply(temp,1,sort))
+ temp = temp[!duplicated(temp), ]
+ temp = as.vector(t(temp))
+
+ res.ownroll.fast = E(g, P=temp)
+ })
user system elapsed
0 0 0
>
> res.ownroll.fast
Edge sequence:
[0] 307939736 -- 100000978
[1] 1101644524 -- 100000978
[2] 2078085089 -- 100000978
[136174] 2078085089 -- 307939736
==========================
Again, I find it very interesting that E(g, P=temp) is much much faster
than E(g)[edge.ids]. It seems that there must be some problem with the
subsetting operator [] for edge sequences. One last code sample here:
==========================
> full.edge.sequence=E(g)
> system.time(full.edge.sequence[100000]) # !
user system elapsed
0.11 0.00 0.11
> system.time(full.edge.sequence[c(100000,100001)]) # !!!!!
user system elapsed
3.20 0.08 3.37
==========================
Now, unfortunately all of the above applies to version 0.5.2-2.
I have downloaded the 0.6 version of igraph and am looking into
installing it, but I am finding the prospect a little bit daunting. I'm
on Windows, and have never installed R packages from source. Are there
directions on the web somewhere specific to igraph, or is this a matter
of installing R-tools and that whole environment as per the (somewhat
lengthy) guidelines for developing your own packages on Windows?
Best regards,
Magnus
On 5/6/2010 8:34 AM, Gábor Csárdi wrote:
Magnus,
you can try the 0.6 development version, it has some improvements for
V() and E() and also for %--%. You can download it from
http://code.google.com/p/igraph/downloads/list
If you want all edges that connect two vertex sets, and your graph is
not changing much, then you can query the edge list of the graph and
extract the edges from there:
el<- get.edgelist(G)
which(el[,1] %in% myfrom & el[,2] %in% myto)-1
On a second thought, this is more or less what %--% is doing, so I am
not sure that it will be any faster.
Please tell me if the 0.6 version solves your problem. Thanks,
G.
Re: [igraph] Calculating measures for a bunch of ego-networks, Tamas Nepusz, 2010/05/06