networkit-rs 0.1.0

Rust bindings for Networkit
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NetworKit Randomization Tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The randomization module implements algorithms to perturb existing graphs. This is commonly used to obtain a null-model for hypothesis testing in network analysis (see below for an example). Intuitively one tests whether an observation in original also appears in *similar* graphs. By doing so one can quantify the statistical significance of the observation. The ensemble of *similar* networks is commonly chosen as the set of (simple) graphs that have the same degree sequence. \n",
    "\n",
    "Edge Switching is a commonly used Markov Chain approach for this purpose. In every step it selects two edges uniformly at random and exchanges one endpoint of each. While no practical rigorous bounds on the mixing time are known, in literature typically 10 to 100 times the number of edges is used.\n",
    "\n",
    "The Curveball algorithms creates a random sample with the same degree sequence, by repeatedly selecting two random nodes and randomly exchanging their neighbourhoods (this is referred to as a trade). This corresponds to a random walk on the space of all graphs in the ensemble. The more trades are carried out, the less correlated input and output are. While there a no practical lower bounds, one typically choose 10 to 100 times the number n of nodes.\n",
    "\n",
    "GlobalCurveball is a related algorithm that is typically faster than Curveball. Also it's implementation is more versatile (e.g., it support directed and undirected graphs). In each step it carries out n/2 trades involving all nodes. A typical choice of the number of GlobalTrades is 10 to 100. **We recommend the usage of GlobalCurveball over Curveball for performance reasons**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import networkit as nk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Global Curveball"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Global Curveball class is an implementation of EM-GCB proposed in `Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades\", Carstens et al., ESA 2018`. The algorithm perturbs an input graph, by iteratively randomizing the neighbourhoods of node pairs. For a large number of global trades this process is shown to produce a uniform sample from the set of all graphs with the same degree sequence as the input graph.\n",
    "\n",
    "The `GlobalCurveball(G, number_of_global_rounds=20, allowSelfLoops=False, degreePreservingShufflePreprocessing=True)` constructor expecs a graph, followed by the parameter `number_of_global_rounds` which dictactes the number of global rounds to carry out. To accelerate the perturbation process we recommend to set degreePreservingShufflePreprocessing to true. This jump-starts the randomization process and yields faster convergence of the algorithm (see \"Smaller Universes for Uniform Sampling of 0,1-matrices with fixed row and column sums\" * by Annabell Berger, Corrie Jacobien Carstens [https://arxiv.org/abs/1803.02624]).\n",
    "For directed simple graphs the algorithm will not yield a uniform sample unless the preprocessing is enabled."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read graph\n",
    "G = nk.graphio.readGraph(\"../input/karate.graph\", nk.Format.METIS)\n",
    "print(G.numberOfNodes(), G.numberOfEdges())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize algorithm\n",
    "globalCurve = nk.randomization.GlobalCurveball(G)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run algorithm\n",
    "globalCurve.run()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get randomized graph\n",
    "randomGlobalG = globalCurve.getGraph()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Verify\n",
    "print(randomGlobalG.numberOfNodes(), randomGlobalG.numberOfEdges())\n",
    "assert(randomGlobalG.numberOfNodes() == G.numberOfNodes())\n",
    "for u in range (randomGlobalG.upperNodeIdBound()):\n",
    "    assert(randomGlobalG.degree(u) == G.degree(u))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Application example\n",
    "\n",
    "Our hypothesis is that Hyperbolic Graphs (RHGs) have a high clustering. To test the hypothesis\n",
    "we first obtain an RHG and compute its local clustering coefficient. Then, we compute five random graphs with the same degree sequence and observe that their mean LCC score is much lower.\n",
    "This gives empirical support towards our hypothesis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def llcScore(G):\n",
    "    \"\"\"Compute average local clustering coefficient\"\"\"\n",
    "    return sum(nk.centrality.LocalClusteringCoefficient(G).run().scores()) / G.numberOfNodes()\n",
    "\n",
    "# Hyperbolic graphs are known to have an above average clustering\n",
    "G = nk.generators.HyperbolicGenerator(1000, 10).generate()\n",
    "print(\"Avg. clustering of input:         %.3f\" % llcScore(G))\n",
    "\n",
    "for i in range(5):\n",
    "    # Take 5 random graphs with the same degree sequence\n",
    "    sampledGraph = nk.randomization.GlobalCurveball(G).run().getGraph()\n",
    "    print(\"Avg. clustering of random sample: %.3f\" % llcScore(sampledGraph))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Curveball"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Curveball class in an implementation of IM-CB proposed in `Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades\", Carstens et al., ESA 2018`. The algorithm perturbs an undirected and unweighted input graph, by iteratively randomizing the neighbourhoods of node pairs.\n",
    "\n",
    "The `Curveball(G)` constructor expects an undirected, unweighted graph. This class does not support the `run()` method; it instead provides the `run(trades)` method. The parameter `trades` is a vector of pairs of nodes. The `run` method can be called multiple times.\n",
    "\n",
    "Following is an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read graph\n",
    "G = nk.graphio.readGraph(\"../input/celegans_metabolic.thrill\", nk.Format.ThrillBinary)\n",
    "print(G.numberOfNodes(), G.numberOfEdges())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize algorithm\n",
    "curve = nk.randomization.Curveball(G)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `CurveballUniformTradeGenerator(num_trades, num_nodes)` class can be used to generate trades. `num_trades` is the number of trades to generate while `num_nodes` is the number of node indices to draw from. It generates a trade sequence consisting of `num_trades` many single trades. Each trade contains two different node indices drawn uniformly at random from the interval [0, num_nodes)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate trade sequence\n",
    "utg = nk.randomization.CurveballUniformTradeGenerator(G.numberOfNodes(), G.numberOfNodes())\n",
    "trades = utg.generate()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run algorithm\n",
    "curve.run(trades)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get randomized graph\n",
    "randomCurveG = curve.getGraph()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Verify\n",
    "print(randomCurveG.numberOfNodes(), randomCurveG.numberOfEdges())\n",
    "assert(randomCurveG.numberOfNodes() == G.numberOfNodes())\n",
    "for u in range (randomCurveG.upperNodeIdBound()):\n",
    "    assert(randomCurveG.degree(u) == G.degree(u))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DegreePreservingShuffle"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DegreePreservingShuffle is available as a standalone module. It will relabel nodes while keeping their degrees but wont change the topology of the graph. The constructor `DegreePreservingShuffle(G)` expects a directed or undirected graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate graph\n",
    "G = nk.generators.ErdosRenyiGenerator(200, 0.2).generate()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialze algorithm\n",
    "dps = nk.randomization.DegreePreservingShuffle(G)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run algorithm\n",
    "dps.run()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "randomG = dps.getGraph()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Verify\n",
    "for u in range(G.upperNodeIdBound()):\n",
    "    assert(G.degree(u) == randomG.degree(u))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Edge Switching Markov Chain\n",
    "\n",
    "Edge Switching Markov Chain [\"The markov chain simulation method for generating connected power law random graphs\", Mihail and Zegura] perturbs simple directed or undirected graphs while preserving the node degrees. In each step, two edges are selected uniformly at random, and two endpoints exchanged. Swaps that introduce multi-edges or self-loops are rejected *without* replacement -- this is necessary to allow uniform sampling [see \"Switching edges to randomize networks: what goes wrong and how to fix it\", Carstens and Horadam]. \n",
    "\n",
    "In general, simple edge switching does not yield a uniform distribution on simple **directed** graphs because the orientation of a directed triangles cannot be changed. Using DegreePreservingShuffle as a preprocessing step overcomes this limitation. For convenience this is done by default for any graph to heuristically accelerate mixing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate graph\n",
    "G = nk.generators.ErdosRenyiGenerator(200, 0.2).generate()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialze algorithm\n",
    "es = nk.randomization.EdgeSwitching(G)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run algorithm\n",
    "es.run()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "randomG = es.getGraph()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Verify\n",
    "for u in range(G.upperNodeIdBound()):\n",
    "    assert(G.degree(u) == randomG.degree(u))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Inplace Edge Switching\n",
    "\n",
    "We offer an in-place variant of `nk.randomization.EdgeSwitching` which is more efficient if the original graph is not needed anymore.\n",
    "The wrapper also takes ownership of the graph, so it's safe to let the original graph name go out of scope.\n",
    "\n",
    "In the following we pick up the previous Local Clustering Coefficient example.\n",
    "This time, however, we are not interested in the average value, but rather in the progress the made as we carry out an increasing number of Markov Chain steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "G = nk.generators.HyperbolicGenerator(1000, 10).generate()\n",
    "alg = nk.randomization.EdgeSwitchingInPlace(G, 0.1)\n",
    "\n",
    "for i in range(10):\n",
    "    if i > 0: \n",
    "        # do a few switches\n",
    "        alg.run() # this will update G directly\n",
    "        \n",
    "    score =  sum(nk.centrality.LocalClusteringCoefficient(G).run().scores()) / G.numberOfNodes()\n",
    "    print(\"After % 5d switches the llc score is %.3f\" % (500 * i, score))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}