|
From: | Dan Suthers |
Subject: | Re: [igraph] Specifying multiple output nodes from 1 input node |
Date: | Wed, 8 Apr 2020 16:28:08 -1000 |
User-agent: | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 |
Use tidyverse packages for data manipulation. They are excellent
at this sort of thing.
I had a similar problem. I used readr::read_delim to read a .csv
from Twint's representation of twitter data into tibble 'tweets'.
Each tweet mentions several users in the tweets$mentions field, in
the same format as yours but as a string, for example
"['repadamschiff', 'realdonaldtrump']"
I used stringr::str_extract_all to turn this string into a list,
and then tidyr::unnest_longer to turn the single row into one row
per each value of this list:
mention_edges <-
tweets %>%
# Extract lists of mentioned users from the string
representation.
mutate(mentioned_user = str_extract_all(tweets$mentions,
boundary("word")))
%>%
# Unnest each mention into its own row
unnest_longer(mentioned_user) %>%
# drop tweets that don't mention anyone
drop_na(mentioned_user) %>%
... continues with other processing
It is done in memory, but I have been able to run this on a
fairly large data set.
-- Dan
I have many large dataframes of the following structure with 1 input node in each row and multiple output nodes and edge weights.input_node output_nodes edge-weights id-attr attribute
1 11347-5 ['64837-1', '116228-0'] [0.01001617, 0.01778383] 82249852 372856
2 116228-0 ['14328-3'] [0.3505] 82283186 372892
3 39644-0 ['116228-0'] [0.10184362] 82273700 372878
4 116228-0 ['116228-0'] [0.21326264] 82278451 372887
5 116228-0 ['64827-1', '116228-0'] [0.02947139, 0.08275262] 82249816 372855
>
For example, rows 1 and 5 have 1 input node, 2 output nodes, the corresponding 2 edge weights (they are numbers), and few attributes; rows 2 through 4 have 1 input, and 1 output, etc .
How do I read this dataframe in igraph to make a graph while retaining attributes. Typically igraph asks for the dataframe to have the first 2 columns to be individual and output nodes. This is a large dataframe where, the # of output nodes could be large in some rows.
I can imagine doing this by a "for" loop and regex. But, that would be too slow and the new dataframe would require more memory. Would appreciate any suggestions.
Thank you. Sid
_______________________________________________ igraph-help mailing list address@hidden https://lists.nongnu.org/mailman/listinfo/igraph-help
-- Dan Suthers Professor and Graduate Program Chair Dept. of Information and Computer Sciences University of Hawaii at Manoa 1680 East West Road, POST 309, Honolulu, HI 96822 (808) 956-3890 office Personal: http://www2.hawaii.edu/~suthers/ Lab: http://lilt.ics.hawaii.edu/ Department: http://www.ics.hawaii.edu/
[Prev in Thread] | Current Thread | [Next in Thread] |