igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] Python - Creating new vertices and edges


From: Tamás Nepusz
Subject: Re: [igraph] Python - Creating new vertices and edges
Date: Thu, 27 Feb 2014 10:51:47 +0100

Hi,  

> I was wondering what's the rationale behind not returning a reference object 
> to newly created vertices and edges in Python (or at least an index)?
References to newly created vertices and edges are not returned because there 
is actually no new vertex/edge object created when you do the addition. The 
core of igraph is written in C, so adding a vertex involves increasing a 
variable that stores the number of vertices, plus extending a few index 
vectors; similarly, adding an edge involves extending some vectors and 
recreating some indices. The Vertex and Edge objects you see in Python are 
actually dead simple proxies that store a reference to the graph they are a 
part of, plus the index of the vertex or edge that they represent. Creating 
such a proxy object just for the sake of returning it is an overhead, 
especially considering that they will be thrown away immediately.

You are right that we could at least return the index of the newly created edge 
or vertex; we do not do this because igraph simply allocates the next available 
vertex or edge ID. So, if you have n vertices and add one more, the newly 
created vertex will have ID=n (and the same for edges).

> If I am reading a large graph file (special-format, let's assume), checking 
> if a vertex (or an edge between two vertices) has been already created is 
> really problematic.
How about using g.are_connected(source, target)? This method automatically 
accepts vertex names and converts them to vertex IDs.
  
> I am assuming that "find" takes O(n) time where n is number of nodes. So this 
> is an O(n*n) time operation.

True, but most methods accept vertex names in place of vertex IDs, and they use 
an internal dictionary that maps vertex names back to vertex IDs to do the 
conversion. So, for instance, using g.vs.find(name=“whatever”).degree() is 
O(n), but using g.degree(“whatever”) is O(1). This is because the find() method 
cannot assume that the VertexSeq it is called on represents the entire vertex 
set (it might represent a subset or a permutation of it) so we cannot use a 
plain dictionary lookup.

For what it’s worth, my goal is to allow the usage of vertex names in place of 
vertex IDs for every igraph method that accepts vertices or lists of vertices. 
Some of the methods have slipped through the cracks, so for instance 
g.get_eid(“A”, “B”) does not work yet (even though g.are_connected(“A”, “B”) 
does). If you find a method that should accept vertex names but does not accept 
it yet, please file a bug report at https://github.com/igraph/igraph and I’ll 
fix it.

By the way, if you are constructing a large graph with repeated calls to 
add_vertex() and add_edge(), then you will probably encounter another 
bottleneck as your graph grows. igraph’s data structures are optimized for 
static graphs (i.e. graphs where vertex and edge mutations are far less 
frequent than queries). Every time you add an edge, some internal data 
structures have to be rebuilt from scratch, which effectively means that it is 
almost as fast to add, say, thousands of edges at the same time as the addition 
of a single edge. In most cases, it is way faster to create the edge list of 
your graph in advance and then add all the edges at once.

Alternatively, you may take a look at the Graph.DictList() constructor, which 
takes two iterables, one for the vertices and one for the edges. Both iterables 
should yield dictionaries; keys of the dictionaries are converted to 
vertex/edge attribute names and the corresponding values are converted to 
vertex/edge attribute values. The “source” and “target” keys in the 
dictionaries corresponding to edges are automatically used to look up the 
source and target vertex efficiently, and Graph.DictList() is also capable of 
“batching” edge additions in larger chunks to alleviate the overhead of 
repeated calls to add_edges().

All the best,
T.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]