xan-0.6.0 is not a library.
xan
Warning: this repository stores SciencesPo's médialab fork of BurntSushi's xsv command line tool.
Feel free to use it, if you feel its added features are useful to your own workflows.
Presentation
xan is a command line program for indexing, slicing, analyzing, splitting
and joining CSV files. Commands should be simple, fast and composable:
- Simple tasks should be easy.
- Performance trade offs should be exposed in the CLI interface.
- Composition should not come at the expense of performance.
This README contains information on how to
install xan, in addition to
a quick tour of several commands.
Dual-licensed under MIT or the UNLICENSE.
How to install
xan can be installed using cargo:
cargo install xan
New Features
xan aggxan beheadxan binsxan datefmtxan dedupxan enumxan explodexan flatmapxan filterxan foreachxan fromxan globxan groupbyxan histxan implodexan join --prefix-left/--prefix-rightxan kwayxan mapxan rangexan renamexan reverse --in-memoryxan search --exactxan search --flag colxan shufflexan sort -uxan transformxan transposexan view
Available commands
- agg - Aggregate data from CSV file.
- behead - Drop headers from CSV file.
- bins - Dispatch numeric columns into bins.
- cat - Concatenate CSV files by row or by column.
- count - Count the rows in a CSV file. (Instantaneous with an index.)
- datefmt - Add a column with the date from a CSV column in a specified format and timezone.
- dedup - Deduplicate a CSV file.
- enum - Enumerate CSV file by preprending an index column.
- explode - Explode rows into multiple ones by splitting a column value based on the given separator.
- filter - Only keep some CSV rows based on an evaluated expression.
- fixlengths - Force a CSV file to have same-length records by either padding or truncating them.
- flatmap - Emit one row per value yielded by an expression evaluated for each CSV row.
- flatten - A flattened view of CSV records. Useful for viewing one record
at a time. e.g.,
xan slice -i 5 data.csv | xan flatten. - fmt - Reformat CSV data with different delimiters, record terminators or quoting rules. (Supports ASCII delimited data.)
- foreach - Loop over a CSV file to execute bash commands.
- frequency - Build frequency tables of each column in CSV data. (Uses parallelism to go faster if an index is present.)
- from - Convert a variety of formats to CSV.
- glob - Create a CSV file with paths matching a glob pattern.
- groupby - Aggregate data by groups of a CSV file.
- headers - Show the headers of CSV data. Or show the intersection of all headers between many CSV files.
- implode - Collapse consecutive identical rows based on a diverging column.
- index - Create an index for a CSV file. This is very quick and provides constant time indexing into the CSV file.
- input - Read CSV data with exotic quoting/escaping rules.
- join - Inner, outer and cross joins. Uses a simple hash index to make it fast.
- kway - Merge multiple similar already sorted CSV files.
- lang, optional - Add a column with the language detected in a given CSV column.
- map - Create a new column by evaluating an expression on each CSV row.
- partition - Partition CSV data based on a column value.
- pseudo - Pseudonymise the value of the given column by replacing them by an incremental identifier.
- sample - Randomly draw rows from CSV data using reservoir sampling (i.e., use memory proportional to the size of the sample).
- range - Create a CSV file from a numerical range.
- rename - Rename columns of a CSV file.
- reverse - Reverse order of rows in CSV data.
- search - Run a regex over CSV data. Applies the regex to each field individually and shows only matching rows.
- select - Select or re-order columns from CSV data.
- shuffle - Shuffle rows of a CSV file.
- slice - Slice rows from any part of a CSV file. When an index is present, this only has to parse the rows in the slice (instead of all rows leading up to the start of the slice).
- sort - Sort CSV data.
- split - Split one CSV file into many CSV files of N chunks.
- stats - Show basic types and statistics of each column in the CSV file. (i.e., mean, standard deviation, median, range, etc.)
- transform - Transform a column by evaluating an expression on each CSV row.
- transpose - Transpose CSV file
- view - Preview a CSV file in a human-friendly way.