{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Community Detection with NetworKit "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook we will cover some community detection algorithms implemented in the `community` module of NetworKit. Community detection is concerned with identifying groups of nodes which are significantly more densely connected to each other than to the rest of the network. As a first step we import NetworKit:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import networkit as nk"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `community` module provides a top-level function, [detectCommunities(G, algo=None, inspect=True)](https://networkit.github.io/dev-docs/python_api/community.html?highlight=detect#networkit.community.detectCommunities) to perform community detection of a given graph with a suitable algorithm, and print some statistics about the result. If no algorithm is specified via the `algo` parameter, community detection is performed using the [PLM](https://networkit.github.io/dev-docs/python_api/community.html?highlight=plm#networkit.community.PLM) algorithm.\n",
"\n",
"This function can be used as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Read graph\n",
"G = nk.readGraph(\"../input/karate.graph\", nk.Format.METIS)\n",
"communities = nk.community.detectCommunities(G)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sections cover two popular community detection algorithms, `PLM` and `PLP`, and will illustrate how to use them."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## PLM"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"NetworKit provides a parallel implementation of the well-known Louvain method, which can be found in the [PLM](https://networkit.github.io/dev-docs/python_api/community.html?highlight=plm#networkit.community.PLM) class. It yields a high-quality solution at reasonably fast running times. The constructor `PLM(Graph, refine=False, gamma=0.1, par='balance', maxIter=32, turbo=True, recurse=True)` expects a [networkit.Graph](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=graph#networkit.Graph) as a mandatory parameter. If the parameter `refine` is set to true, the algorithm performs a second move phase to refine the communities. The parameter `gamma` defines the multi-resolution modularity parameter. The string `par` defines the openmp parallelization strategy. `maxIter` is the maximum number of iterations for move phase. When `turbo` is set to true, the algorithm is faster but uses O(n) additional memory per thread. Set `recurse`to true in order to use recursive coarsening. Refer to [this]( http://journals.aps.org/pre/abstract/10.1103/PhysRevE.89.049902) for more details on recursive coarsening.\n",
"\n",
"In the example below we run PLM with `refine` set to true while leaving the rest of the parameters to their default values.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Choose and initialize algorithm \n",
"plmCommunities = nk.community.detectCommunities(G, algo=nk.community.PLM(G, True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The output of the `detectCommunities` function is a partition of the nodes of the graph. It is represented by the [Partition](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=partition#networkit.Partition) data structure, which provides several methods for inspecting and manipulating a partition of a set of elements."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"{0} elements assigned to {1} subsets\".format(plmCommunities.numberOfElements(),\n",
" plmCommunities.numberOfSubsets()))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"the biggest subset has size {0}\".format(max(plmCommunities.subsetSizes())))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The contents of a partition object can be written to file in a simple format, in which the `i`-th line contains an integer representing the subset id of node `i`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nk.community.writeCommunities(plmCommunities, \"output/communtiesPLM.partition\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## PLP "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Label Propagation algorithm is an algorithm for finding communities in a graph. NetworKit provides a parallel implementation, [PLP(G, updateThreshold=none, maxIterations=none)](https://networkit.github.io/dev-docs/python_api/community.html?highlight=plp#networkit.community.PLP). The constructor expects a [networkit.Graph](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=graph#networkit.Graph) as a mandatory parameter. The parameter `updateThreshold` dictates the number of nodes that have to be changed in each iteration so that a new iteration starts, and `maxIterations` is the maximum number of iterations. `none` is NetworKit constant set to the maximum value of a 64-bit integer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Read graph\n",
"G = nk.readGraph(\"../input/jazz.graph\", nk.Format.METIS)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Choose and initialize algorithm \n",
"plpCommunities = nk.community.detectCommunities(G, algo=nk.community.PLP(G))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"{0} elements assigned to {1} subsets\".format(plpCommunities.numberOfElements(),\n",
" plpCommunities.numberOfSubsets()))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"the biggest subset has size {0}\".format(max(plpCommunities.subsetSizes())))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nk.community.writeCommunities(plpCommunities, \"output/communtiesPLP.partition\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}