Expand description

Load (a subset of) Reddit hyperlinks dataset into a graph. The dataset is available at http://snap.stanford.edu/data/soc-redditHyperlinks-title.tsv The hyperlink network represents the directed connections between two subreddits (a subreddit is a community on Reddit). We also provide subreddit embeddings. The network is extracted from publicly available Reddit data of 2.5 years from Jan 2014 to April 2017. *NOTE: It may take a while to download the dataset

Dataset statistics

  • Number of nodes (subreddits) 35,776
  • Number of edges (hyperlink between subreddits) 137,821
  • Timespan Jan 2014 - April 2017

Source

S. Kumar, W.L. Hamilton, J. Leskovec, D. Jurafsky. Community Interaction and Conflict on the Web. World Wide Web Conference, 2018.

Properties

  • SOURCE_SUBREDDIT: the subreddit where the link originates
  • TARGET_SUBREDDIT: the subreddit where the link ends
  • POST_ID: the post in the source subreddit that starts the link
  • TIMESTAMP: time time of the post
  • POST_LABEL: label indicating if the source post is explicitly negative towards the target post. The value is -1 if the source is negative towards the target, and 1 if it is neutral or positive. The label is created using crowd-sourcing and training a text based classifier, and is better than simple sentiment analysis of the posts. Please see the reference paper for details.
  • POST_PROPERTIES: a vector representing the text properties of the source post, listed as a list of comma separated numbers. This can be found on the source website

Example:

use raphtory_io::graph_loader::example::reddit_hyperlinks::reddit_graph;
use raphtory::db::graph::Graph;
use raphtory::db::view_api::*;

let graph = reddit_graph(1, 120, false);

println!("The graph has {:?} vertices", graph.num_vertices());
println!("The graph has {:?} edges", graph.num_edges());

Functions

  • Download the dataset and return the path to the file
  • Load the Reddit hyperlinks dataset into a graph and return it