igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] How to read in a large graph (and output a sparse matrix)


From: Raphael Clifford
Subject: Re: [igraph] How to read in a large graph (and output a sparse matrix)
Date: Mon, 1 Aug 2016 20:58:26 +0100

It does now seem to work once I transform the node ids as we
discussed. igraph still takes 4GB of RAM for only 62.5 million edges
and 31.25 million vertices but at least that fits.

Is there a function to write a sparse adjacency matrix of a graph to a
file? I see  "write_adjacency" but the docs don't indicate it gives a
sparse matrix.

Raphael


On 1 August 2016 at 14:57, Tamas Nepusz <address@hidden> wrote:
> Yes, it's probably the best if you do the relabeling externally. Let
> us know if it still doesn't work after using Read_Edgelist() with a
> relabeled file.
> T.
>
>
> On Mon, Aug 1, 2016 at 2:37 PM, Raphael C <address@hidden> wrote:
>> Thank you for the quick reply. My system is certainly 64 bit. The
>> problem is just the amount of RAM
>>
>> g = Graph.Read_Ncol('edges.txt')
>>
>> uses it seems.
>>
>> Here is some code to produce a fake edge list that reproduces my problem.
>>
>> import random
>>
>> #Number of edges, vertices
>> m = 62500000
>> n = m/2
>>
>> for i in xrange(m):
>>     fromnode = str(random.randint(0, n-1)).zfill(9)
>>     tonode = str(random.randint(0, n-1)).zfill(9)
>>     print fromnode, tonode
>>
>> If I produce a file edges.txt using this code and  then run
>>
>> from igraph import Graph
>> g = Graph.Read_Ncol('edges.txt')
>>
>> it runs out of RAM.
>>
>> To get a better picture of the RAM usage I ran the same test with m =
>> 20000000 (that is about one third of the edges and vertices).
>>
>> /usr/bin/time -v python ./test.py
>>
>> shows
>>
>> Maximum resident set size (kbytes): 3172988
>>
>> With m = 30000000 I see Maximum resident set size (kbytes): 4750440
>>
>> Maybe one solution is to relabel the nodes myself external so I can
>> avoid the overhead of Ncol?
>>
>> Raphael
>>
>>
>>
>>
>> On 1 August 2016 at 10:23, Tamas Nepusz <address@hidden> wrote:
>>> Hello,
>>>
>>> Read_Edgelist() won't work because that assumes that the vertex IDs
>>> are in the range [0; |V|-1] so it would create lots of isolated
>>> vertices if your vertex ID range has "gaps" in it. Read_Ncol() is the
>>> way to go, but it has an additional space penalty as it has to
>>> maintain a mapping from the numeric IDs in the file to the range [0;
>>> |V|-1].
>>>
>>> igraph requires 32 bytes per edge and 16 bytes per vertex to store the
>>> graph itself, plus additional data structures to store the vertex/edge
>>> attributes. Therefore, a graph of your size would require ~2.5 GB of
>>> memory plus the attributes. 8 GB of RAM should therefore be enough --
>>> however, note that Python might not be able to utilize all that
>>> memory. In particular, 32-bit Python on Windows is limited to 2 or 3
>>> GBs of memory (see
>>> https://msdn.microsoft.com/en-us/library/aa366778(v=vs.85).aspx#memory_limits
>>> ). If you happen to use a 32-bit Python on a 64-bit machine, you will
>>> need to install a 64-bit Python with a corresponding igraph package
>>> that is also built for 64-bit, and then try again.
>>>
>>> Best,
>>> T.
>>>
>>>
>>> On Mon, Aug 1, 2016 at 9:52 AM, Raphael C <address@hidden> wrote:
>>>> I have 8GB of RAM and I have a simple edge list text file of size
>>>> 1.2GB. It was 62500000 edges and about half that many vertices. Each
>>>> line looks like
>>>>
>>>>      287111206 357850135
>>>>
>>>> I would like to read in the graph and output a sparse adjacency
>>>> matrix. I am failing on all counts.  I have tried
>>>>
>>>>
>>>> g = Graph.Read_Edgelist('edges.txt')
>>>>
>>>> but this fails immediately with
>>>>
>>>> MemoryError: Error at vector.pmt:439: cannot reserve space for vector,
>>>> Out of memory
>>>>
>>>> This seems unrelated to the size of the graph is just a function of
>>>> the node ids being large.
>>>>
>>>> So instead I tried
>>>>
>>>> g = Graph.Read_Ncol('edges.txt')
>>>>
>>>> This eats up all the RAM in my PC forcing me to kill the code.
>>>>
>>>> I fact I tested g = Graph.Read_Ncol('edges.txt') with the first 1/5 of
>>>> the edges and have the same memory problem.
>>>>
>>>> Each node id is a 32 bit integer so the graph should fit easily in 8GB of 
>>>> RAM.
>>>>
>>>> What can I do?
>>>>
>>>> Thanks very much for any help.
>>>> Raphael
>>>>
>>>> _______________________________________________
>>>> igraph-help mailing list
>>>> address@hidden
>>>> https://lists.nongnu.org/mailman/listinfo/igraph-help
>>>
>>> _______________________________________________
>>> igraph-help mailing list
>>> address@hidden
>>> https://lists.nongnu.org/mailman/listinfo/igraph-help
>>
>> _______________________________________________
>> igraph-help mailing list
>> address@hidden
>> https://lists.nongnu.org/mailman/listinfo/igraph-help
>
> _______________________________________________
> igraph-help mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/igraph-help



reply via email to

[Prev in Thread] Current Thread [Next in Thread]