socksfinder 0.4.0

Search engine for sock puppets on Wikimedia projects

socksfinder License Build status

socksfinder is a search engine for sock puppets on Wikimedia projects.


Usage: socksfinder build <index>
       socksfinder query [--cooccurrences | --threshold=<threshold>] <index> <user>...
       socksfinder -h | --help
       socksfinder --version

    build                    Build an index from a MediaWiki XML dump (read on the standard input).
    query                    Search pages modified by several users in the index.

    index                    Index built from a MediaWiki dump.
    user                     User which has modified pages to look for.

    --cooccurrences          Show the co-occurrences matrix instead of the page names.
    -h, --help               Show this screen.
    --threshold=<threshold>  Number of different contributors, 0 for all of them [default: 0].
    --version                Show version.


Run cargo build --release in your working copy.


Building an index from the last dump of the French Wikipedia

Building an index can take quite a while and eat a significant amount of memory (depending on the size of the dump). For the French Wikipedia, it takes about 45 minutes with a fast internet access, and consumes close to 500 MiB of RAM.

curl -s "" |
    gunzip |
    socksfinder build frwiki-latest.idx

This only needs to be done once, though, and the resulting index can be redistributed to other users who don't have a fast enough internet access or a powerful enough computer. For the French Wikipedia, the index is almost 600 Mio big and can be compressed quite efficiently for distribution (less than 300 Mio when compressed using gzip --best.

Searching for pages modified by editors from a list

Searching for pages modified by one or several editors usually requires only a very limited amount of memory (by today standards, at least), around 20 or 30 MiB of RAM. It's usually quite fast as well, around 10 to 50 milliseconds per user depending on your CPU and the number of unique modified pages, though it can take as much as a few seconds when searching for pages modified by editors who have modified several hundred thousands of distinct pages.

socksfinder query frwiki-latest.idx Arkanosis Arktest Arkbot

By default, only pages modified by all the users in the list are returned. If you want pages modified by at least some threashold, use the --threshold option.

socksfinder query --threshold=2 frwiki-latest.idx Arkanosis Arktest Arkbot

Instead of the list of modified pages, you can get the co-occurrences matrix, that is, the matrix of the number of pages modified by each pair of editors from the list.

socksfinder query --cooccurrences frwiki-latest.idx Arkanosis Arktest Arkbot

Contributing and reporting bugs

Contributions are welcome through GitHub pull requests.

Please report bugs and feature requests on GitHub issues.


socksfinder is copyright (C) 2020 Jérémie Roquet and licensed under the ISC license.