Re: [igraph] R help: Interpretation of PageRank scores for different set

Hi Tamas,

So, I have tried several types of graphs and 'PRPACK', 'ARPACK' and I also wrote simple power iteration in R. Here are some observations so far.
1.) For small networks 'PRPACK' seems to have problem to deal with combination of edge weights and directed=F / as.undirected(g, 'each'). One would expect that PageRank for directed=F / as.undirected(g,'each') / as.undirected(g)

should be the same, however only the last option(which sums edge weights) gives the same PageRank as 'ARPACK' or power iteration(with weight matrix instead of adjacency, where weights are summed).
2.) Another example is again small network, directed and weighted, but with multiple edges(considering directions). Then ' PRPACK' gives different scores then other methods. And moreover if it's at the same time personalized version, 'PRPACK' scores doesn't add up to 1! Only after simplification simplify(g, edge.attr.comb = 'sum') it coincides with other two methods.
3.) If I create network with 'sink' nodes, 'ARPACK' (directed) seems to have problem for personalized version of PageRank.

4.) There also seems to be problem with as.undirected(g) vs. as.undirected(g,'each'). Please see attached .R file with sample network(bipartite) and demonstration of this behavior. So maybe that's the reason why 'PRPACK' gives strange scores?

###

Now when I move from these small sample networks to my huge weighted, directed network(millions of links) I got bit different results.
1.) If network contains loops, 'ARPACK' seems to give wrong scores.

2.) Again surprisingly problem no. 2.) from previous part, is no longer present.

So for now I have to stick with the power method. I'll find out something related to this problem, I'll let you know.

With regards,
Daniel

2016-11-30 9:38 GMT+01:00 Tamas Nepusz <address@hidden>:

Hi,

Sorry for the late response. Most likely there is a bug in how PRPACK treats directed graphs; in theory, there should be no difference between the ARPACK and PRPACK implementations (apart from the latter being faster and more stable), both should treat directed=F and as.undirected() the same way. I was briefly looking at the conversion between an igraph graph and PRPACK's internal data structures in the source code, but I could not find anything yet that could explain the difference; feel free to take a look yourself as well:

https://github.com/igraph/igraph/blob/master/src/prpack/prpack_igraph_graph.cpp

I have added an issue to Github but I don't think I will be able to get around to fixing this any time in the near future, so please chime in if you managed to figure out anything in the meanwhile:

https://github.com/igraph/igraph/issues/972

T.

On Sat, Nov 19, 2016 at 11:59 AM, danopiacek . <address@hidden> wrote:
Hello to everyone,

First of all, I would like to thank the authors of these amazing package. I use it extensively in my master thesis.
I am sorry for duplicate if there already exists same thread.

I have a huge network with couple of millions of edges and would like to calculate personalized PageRank. My network is attributed and I would like to use edge weights to calculate PageRank. I can either consider my network as directed or undirected, but in both cases my network is multigraph. I have encountered several 'strange' results when calculating weighted PageRank.

Here I will try to present it on simple networks without 'personalization'.

1a.) Example with directed symmetric network, with no multiple edges and no weights.
A = matrix(c(0,1,1,1,0,0,1,0,0),byrow=T,nrow=3)
g = graph.adjacency(A)

Then I compute PageRank with page_rank function and get the same result for both "PRPACK" and "ARPACK" algo. I can also compute via matrix inverse as follows:
D = diag(1/apply(A,1,sum),nrow(A)); Beta = (1-0.85)*rep(1/nrow(A),nrow(A))
(pr.in = Beta%*%solve(diag(1,nrow(A))-0.85*D%*%A))
And as expected I get the same PageRank score. In addition, if I consider network as undirected I get the same scores, as expected.

1b.) The same example with edge weights.

set.seed(1)
E(g)$weight = round(runif(ecount(g)),1)
And again, using page_rank I get the same scores for both algorithms. Or I can get the same scores with weight matrix W as:
W = matrix(c(0,0.3,0.4,0.6,0,0,0.9,0,0),byrow=T,nrow=3)
D = diag(1/apply(W,1,sum),nrow(W)); Beta = (1-0.85)*rep(1/nrow(W),nrow(W))
  (pr.in = Beta%*%solve(diag(1,nrow(W))-0.85*D%*%W))
However, if I consider this network being undirected, I get different scores, namely:
  g_UN = as.undirected(g,'each') # do not collapse
  (prUn1 = page_rank(g,algo='PRPACK', directed = F)$vector)
(prUn2 = page_rank(g,algo='ARPACK', directed = F)$vector)
  (prUn3 = page_rank(g_UN, algo='PRPACK')$vector)
  (prUn4 = page_rank(g_UN, algo='ARPACK')$vector)
Here scores given by 'ARPACK' are different to those given by 'PRPACK'. Moreover, with 'ARPACK' I get same scores for both g_UN(undirected network) and for directed=F. However 'PRPACK' gives different scores.
I can replicate results for 'ARPACK' with weight matrix, where I take sum/average of edge weights:
W_UN = matrix(c(0,0.9,1.3,0.9,0,0,1.3,0,0),byrow=T,nrow=3)
D = diag(1/apply(W_UN,1,sum),nrow(W_UN))
(pr.in = Beta%*%solve(diag(1,nrow(W_UN))-0.85*D%*%W_UN))

2a.) Example with directed multigraph.

A2 = matrix(c(0,2,1,1,0,0,1,1,0),byrow=T,nrow=3)
g2 = graph.adjacency(A2)
Without considering edge weights both algorithms give the same scores, which can be also found as
  D = diag(1/apply(A2,1,sum),nrow(A2))
  (pr.in = Beta%*%solve(diag(1,nrow(A2))-0.85*D%*%A2))

2b.) Same graph as undirected.
Here again both algorithms give the same score for both directed=F and as.undirected(g2,'each'), or:
A2_UN = matrix(c(0,3,2,3,0,1,2,1,0),byrow=T,nrow=3)
D = diag(1/apply(A2_UN,1,sum),nrow(A2_UN))
  (pr.in = Beta%*%solve(diag(1,nrow(A2_UN))-0.85*D%*%A2_UN))
gives the same scores.

2c.) Same graph with edge weights-directed.
set.seed(1)
E(g2)$weight = round(runif(ecount(g2)),1)
And again I get different scores
  page_rank(g2,algo='PRPACK')$vector
  page_rank(g2,algo='ARPACK')$vector
Where 'ARPACK' gives the same scores as with weight matrix W where edge weights for multiple
edges(considering direction) are summed
W2 = matrix(c(0,0.7,0.6,0.9,0,0,0.2,0.9,0),byrow=T,nrow=3)
D = diag(1/apply(W2,1,sum),nrow(W2))
(pr.in = Beta%*%solve(diag(1,nrow(W2))-0.85*D%*%W2))

2d.) Same graph with edge weights-undirected.
  g2_UN = as.undirected(g2,'each') # do not collapse
Again, 'PRPACK' and 'ARPACK' give different scores. Moreover, 'PRPACK' differs for
(prUn1 = page_rank(g2,algo='PRPACK',directed = F)$vector)
(prUN2 = page_rank(g2_UN,algo='PRPACK')$vector)
, while 'ARPACK' gives same scores for both. Or with weight matrix W I can replicate 'ARPACK' by summing edge weights:
W2_UN = matrix(c(0,1.6,0.8,1.6,0,0.9,0.8,0.9,0),nrow=3,byrow = T)
D = diag(1/apply(W2_UN,1,sum),nrow(W2_UN))
(pr.in = Beta%*%solve(diag(1,nrow(W2_UN))-0.85*D%*%W2_UN))

I have examined .c files (centrality.c, arpack.c) and I know that 'ARPACK' treats directed=F and as.undirected() same.

So the main questions are, how does 'PRPACK' treat these two options and moreover why 'PRPACK' gives different results than 'ARPACK'/weight matrix.

Thank you very much.
Daniel Piacek



_______________________________________________
igraph-help mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/igraph-help

_______________________________________________
igraph-help mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/igraph-help

From:	danopiacek .
Subject:	Re: [igraph] R help: Interpretation of PageRank scores for different setups(weighted/mutligraph)
Date:	Thu, 1 Dec 2016 10:13:06 +0100