GFFx Command Line Manual
GFFx is a high-performance, Rust-based toolkit for extracting and querying annotations from GFF3 files. It supports fast indexing and feature retrieval with several subcommands. It can be used both as a command-line tool and as a Rust library.
Breaking Changes
Starting from GFFx v0.3.1 and all later versions, the following breaking changes are in effect. These changes introduce important updates to default behavior, improve performance, and fix bugs.
-
Default extraction/intersection mode changed:
Feature-Only Mode is now the default. To preserve the previous behavior (returning the entire gene model), users must explicitly pass the-F/--full-modelflag. Feature-Only Mode and Full-Model Mode have comparable runtime performance. -
To avoid short option conflicts in
extract, renamed:-f(feature id) ->-e-F(feature file) ->-E
Table of Contents
GFFx version 0.3.2
- Installation
- Basic Usage
- Example Use Cases
- Using GFFx as a Rust Library
- Available Public APIs
- Index File Types
- License
- Citation
Installation
Option 1: Install via crates.io
Option 2: Install from source
# Optionally copy the binary
Requires Rust 1.70 or later. You can install or update Rust using rustup.
Basic Usage
Available subcommands:
- [index] Build index files
- [intersect] Extract features by region
- [extract] Extract features by ID
- [search] Search features by attribute
index
Builds index files from a GFF file to accelerate downstream operations.
Options:
| Option | Description |
|---|---|
-i, --input |
Input GFF file |
-a, --attribute |
Attribute key to extract (default: gene_name) |
-v, --verbose |
Enable verbose output |
-h, --help |
Print help |
intersect
Extracts models intersecting with regions from a GFF file, either from a single region or a BED file.
|
Options: Required
| Option | Description |
|---|---|
-i, --input <INPUT> |
Input GFF file path |
-r, --region <REGION> |
Single region in chr:start-end format |
-b, --bed <BED> |
BED file containing multiple regions |
Note: Exactly one of
--regionor--bedmust be specified.
Optional
| Option | Description |
|---|---|
-o, --output <OUT> |
Output file path (default: stdout) |
-F, --full-model |
Enable the "full-model" mode, which whill return the non-redundant gene models |
| for all matched features, instead of only the directly matched features. | |
-v, --invert |
Invert selection (exclude matched features) |
-T, --types <TYPES> |
Filter output to include only features of specified types (e.g., gene,exon) |
-t, --threads <NUM> |
Number of threads [default: 12] |
-V, --verbose |
Enable verbose output |
-h, --help |
Show help message |
| (one of) | |
-c, --contained |
Only keep features fully contained within the region |
-C, --contains-region |
Only keep features that fully contain the region |
-O, --overlap |
Keep features that partially or fully overlap (default mode) |
extract
Extracts annotation models by feature ID(s), including their parent models.
|
Options:
Required
| Option | Description |
|---|---|
-i, --input <INPUT> |
Input GFF file path |
| (one of) | |
-e, --feature-id <FEATURE_ID> |
Extrach by a single feature id |
-E, --feature-file <FEATURE_FILE> |
Extrach by a BED file containing multiple regions |
Optional
| Option | Description |
|---|---|
-o, --output <OUT> |
Output file path (default: stdout) |
-F, --full-model |
Enable the "full-model" mode, which whill return the non-redundant gene models |
| for all matched features, instead of only the directly matched features. | |
-T, --types <TYPES> |
Filter output to include only features of specified types (e.g., gene,exon) |
-t, --threads <NUM> |
Number of threads [default: 12] |
-V, --verbose |
Enable verbose output |
-h, --help |
Show help message |
search
Searches for features using a specified attribute value and retrieves the full annotation models.
Options:
Required
| Option | Description |
|---|---|
-i, --input <INPUT> |
Input GFF file path |
| (one of) | |
-a, --attr ATTRIBUTE_VALUE> |
Search a single attribute value/pattern |
-A, --attr-list <ATTRIBUTE_LIST> |
Search attribute values/patterns defined in a text file |
Optional
| Option | Description |
|---|---|
-o, --output <OUT> |
Output file path (default: stdout) |
-F, --full-model |
Enable the "full-model" mode, which whill return the non-redundant gene models |
| for all matched features, instead of only the directly matched features. | |
-r, --regex <REGEX> |
Enable regex matching for attribute values |
-T, --types <TYPES> |
Filter output to include only features of specified types (e.g., gene,exon) |
-t, --threads <NUM> |
Number of threads [default: 12] |
-V, --verbose |
Enable verbose output |
-h, --help |
Show help message |
Example Use Cases
# Build index
# Extract all features overlapping with a region
# Extract models from a list of gene IDs
# Search by gene name and extract the full model
Using GFFx as a Rust Library
You can use GFFx as a Rust library in your own project.
Add to Cargo.toml
[]
= "^0.2.0" # Please check the latest version in crates.io
Example: Manually extract features from region using index files
The following example runs inside a main() -> Result<()> context:
use ;
use Path;
// End of function
Available Public APIs
Index building & checking (index_builder)
build_index
Index loading (index_loader)
load_gof,load_prt,load_fts,load_atn,load_a2f,load_sqssafe_mmap_readonlyGofMap,PrtMap,FtsMap,A2fMap
Interval querying data structures (utils::serial_interval_trees)
IntervalTree,Intervalsave_multiple_trees,write_offsets_to_file
Other utilities (utils::common)
CommonArgs,append_suffixwrite_gff_output,write_gff_output_filteredcheck_index_files_exist
Index File Types
| File Extension | Purpose |
|---|---|
.gof |
Byte offset index for GFF feature blocks |
.fts |
Feature ID table |
.prt |
Child to parent mapping |
.a2f |
Attribute to feature ID mapping |
.atn |
Attribute value table |
.sqs |
Sequence ID table |
.rit |
Interval tree index |
.rix |
Byte offest index for interval trees in.rit file |
Notes
- Make sure you run
gffx indexbefore usingintersect,extract, orsearch.
License
GFFx is released under the MIT or Apache-2.0 License.
Citation
If you use GFFx, please cite our preprint:
Chen, B., Wu, D., & Zhang, G. (2025).
GFFx: A Rust-based suite of utilities for ultra-fast genomic feature extraction.
bioRxiv 2025.08.08.669426. https://doi.org/10.1101/2025.08.08.669426