Module raphtory_io::graph_loader::example::reddit_hyperlinks
source · Expand description
Load (a subset of) Reddit hyperlinks dataset into a graph. The dataset is available at http://snap.stanford.edu/data/soc-redditHyperlinks-title.tsv The hyperlink network represents the directed connections between two subreddits (a subreddit is a community on Reddit). We also provide subreddit embeddings. The network is extracted from publicly available Reddit data of 2.5 years from Jan 2014 to April 2017. *NOTE: It may take a while to download the dataset
Dataset statistics
- Number of nodes (subreddits) 35,776
- Number of edges (hyperlink between subreddits) 137,821
- Timespan Jan 2014 - April 2017
Source
S. Kumar, W.L. Hamilton, J. Leskovec, D. Jurafsky. Community Interaction and Conflict on the Web. World Wide Web Conference, 2018.
Properties
- SOURCE_SUBREDDIT: the subreddit where the link originates
- TARGET_SUBREDDIT: the subreddit where the link ends
- POST_ID: the post in the source subreddit that starts the link
- TIMESTAMP: time time of the post
- POST_LABEL: label indicating if the source post is explicitly negative towards the target post. The value is -1 if the source is negative towards the target, and 1 if it is neutral or positive. The label is created using crowd-sourcing and training a text based classifier, and is better than simple sentiment analysis of the posts. Please see the reference paper for details.
- POST_PROPERTIES: a vector representing the text properties of the source post, listed as a list of comma separated numbers. This can be found on the source website
Example:
use raphtory_io::graph_loader::example::reddit_hyperlinks::reddit_graph;
use raphtory::db::graph::Graph;
use raphtory::db::view_api::*;
let graph = reddit_graph(1, 120, false);
println!("The graph has {:?} vertices", graph.num_vertices());
println!("The graph has {:?} edges", graph.num_edges());Functions
- Download the dataset and return the path to the file
- Load the Reddit hyperlinks dataset into a graph and return it