igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] How to read in a large graph (and output a sparse matrix)


From: Raphael C
Subject: Re: [igraph] How to read in a large graph (and output a sparse matrix)
Date: Mon, 1 Aug 2016 13:37:48 +0100

Thank you for the quick reply. My system is certainly 64 bit. The
problem is just the amount of RAM

g = Graph.Read_Ncol('edges.txt')

uses it seems.

Here is some code to produce a fake edge list that reproduces my problem.

import random

#Number of edges, vertices
m = 62500000
n = m/2

for i in xrange(m):
    fromnode = str(random.randint(0, n-1)).zfill(9)
    tonode = str(random.randint(0, n-1)).zfill(9)
    print fromnode, tonode

If I produce a file edges.txt using this code and  then run

from igraph import Graph
g = Graph.Read_Ncol('edges.txt')

it runs out of RAM.

To get a better picture of the RAM usage I ran the same test with m =
20000000 (that is about one third of the edges and vertices).

/usr/bin/time -v python ./test.py

shows

Maximum resident set size (kbytes): 3172988

With m = 30000000 I see Maximum resident set size (kbytes): 4750440

Maybe one solution is to relabel the nodes myself external so I can
avoid the overhead of Ncol?

Raphael




On 1 August 2016 at 10:23, Tamas Nepusz <address@hidden> wrote:
> Hello,
>
> Read_Edgelist() won't work because that assumes that the vertex IDs
> are in the range [0; |V|-1] so it would create lots of isolated
> vertices if your vertex ID range has "gaps" in it. Read_Ncol() is the
> way to go, but it has an additional space penalty as it has to
> maintain a mapping from the numeric IDs in the file to the range [0;
> |V|-1].
>
> igraph requires 32 bytes per edge and 16 bytes per vertex to store the
> graph itself, plus additional data structures to store the vertex/edge
> attributes. Therefore, a graph of your size would require ~2.5 GB of
> memory plus the attributes. 8 GB of RAM should therefore be enough --
> however, note that Python might not be able to utilize all that
> memory. In particular, 32-bit Python on Windows is limited to 2 or 3
> GBs of memory (see
> https://msdn.microsoft.com/en-us/library/aa366778(v=vs.85).aspx#memory_limits
> ). If you happen to use a 32-bit Python on a 64-bit machine, you will
> need to install a 64-bit Python with a corresponding igraph package
> that is also built for 64-bit, and then try again.
>
> Best,
> T.
>
>
> On Mon, Aug 1, 2016 at 9:52 AM, Raphael C <address@hidden> wrote:
>> I have 8GB of RAM and I have a simple edge list text file of size
>> 1.2GB. It was 62500000 edges and about half that many vertices. Each
>> line looks like
>>
>>      287111206 357850135
>>
>> I would like to read in the graph and output a sparse adjacency
>> matrix. I am failing on all counts.  I have tried
>>
>>
>> g = Graph.Read_Edgelist('edges.txt')
>>
>> but this fails immediately with
>>
>> MemoryError: Error at vector.pmt:439: cannot reserve space for vector,
>> Out of memory
>>
>> This seems unrelated to the size of the graph is just a function of
>> the node ids being large.
>>
>> So instead I tried
>>
>> g = Graph.Read_Ncol('edges.txt')
>>
>> This eats up all the RAM in my PC forcing me to kill the code.
>>
>> I fact I tested g = Graph.Read_Ncol('edges.txt') with the first 1/5 of
>> the edges and have the same memory problem.
>>
>> Each node id is a 32 bit integer so the graph should fit easily in 8GB of 
>> RAM.
>>
>> What can I do?
>>
>> Thanks very much for any help.
>> Raphael
>>
>> _______________________________________________
>> igraph-help mailing list
>> address@hidden
>> https://lists.nongnu.org/mailman/listinfo/igraph-help
>
> _______________________________________________
> igraph-help mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/igraph-help



reply via email to

[Prev in Thread] Current Thread [Next in Thread]