Documentation
.. vtext documentation master file, created by
   sphinx-quickstart on Sun Apr 14 23:28:11 2019.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

vtext
=====

NLP in Rust with Python bindings

This package aims to provide a high performance toolkit for ingesting textual data for
machine learning applications.

Features
--------

 - Tokenization: Regexp tokenizer, Unicode segmentation + language specific rules
 - Stemming: Snowball (in Python 15-20x faster than NLTK)
 - Token counting: converting token counts to sparse matrices for use
   in machine learning libraries. Similar to `CountVectorizer` and
   `HashingVectorizer` in scikit-learn.
 - Levenshtein edit distance; Sørensen-Dice, Jaro, Jaro Winkler string similarities

.. toctree::
   :maxdepth: 2
   :caption: Contents

   install
   benchmarks
   rust-api
   python-api
   contributing