GFFx Command Line Manual
GFFx is a high-performance, Rust-based toolkit for extracting and querying annotations from GFF3 files. It supports fast indexing and feature retrieval with several subcommands. It can be used both as a command-line tool and as a Rust library.
Table of Contents
GFFx version 0.3.1
GFFx v0.3.1 is a major release , introducing important changes to default behavior, performance improvements, and bug fixes.
Breaking Changes
-
Default extraction/intersection mode changed:
Non Full-Model Mode is now the default. To preserve the previous behavior (returning the entire gene model), users must explicitly pass the-F/--full-modelflag. Non Full-Model Mode and Full-Model Mode have comparable runtime performance. -
To avoid short option conflicts in
extract, renamed:-f(feature id) ->-e-F(feature file) ->-E
- Installation
- Basic Usage
- Example Use Cases
- Using GFFx as a Rust Library
- Available Public APIs
- Index File Types
- License
- Citation
Installation
Option 1: Install via crates.io
Option 2: Install from source
# Optionally copy the binary
Requires Rust 1.70 or later. You can install or update Rust using rustup.
Basic Usage
Available subcommands:
- [index] Build index files
- [intersect] Extract features by region
- [extract] Extract features by ID
- [search] Search features by attribute
index
Builds index files from a GFF file to accelerate downstream operations.
Options:
| Option | Description |
|---|---|
-i, --input |
Input GFF file |
-a, --attribute |
Attribute key to extract (default: gene_name) |
-v, --verbose |
Enable verbose output |
-h, --help |
Print help |
intersect
Extracts models intersecting with regions from a GFF file, either from a single region or a BED file.
|
Options: Required
| Option | Description |
|---|---|
-i, --input <INPUT> |
Input GFF file path |
-r, --region <REGION> |
Single region in chr:start-end format |
-b, --bed <BED> |
BED file containing multiple regions |
Note: Exactly one of
--regionor--bedmust be specified.
Optional
| Option | Description |
|---|---|
-o, --output <OUT> |
Output file path (default: stdout) |
-F, --full-model |
Enable the "full-model" mode, which whill return the non-redundant gene models |
| for all matched features, instead of only the directly matched features. | |
-v, --invert |
Invert selection (exclude matched features) |
-T, --types <TYPES> |
Filter output to include only features of specified types (e.g., gene,exon) |
-t, --threads <NUM> |
Number of threads [default: 12] |
-V, --verbose |
Enable verbose output |
-h, --help |
Show help message |
| (one of) | |
-c, --contained |
Only keep features fully contained within the region |
-C, --contains-region |
Only keep features that fully contain the region |
-O, --overlap |
Keep features that partially or fully overlap (default mode) |
extract
Extracts annotation models by feature ID(s), including their parent models.
|
Options:
Required
| Option | Description |
|---|---|
-i, --input <INPUT> |
Input GFF file path |
| (one of) | |
-f, --feature-id <FEATURE_ID> |
Extrach by a single feature id |
-F, --feature-file <FEATURE_FILE> |
Extrach by a BED file containing multiple regions |
Optional
| Option | Description |
|---|---|
-o, --output <OUT> |
Output file path (default: stdout) |
-F, --full-model |
Enable the "full-model" mode, which whill return the non-redundant gene models |
| for all matched features, instead of only the directly matched features. | |
-T, --types <TYPES> |
Filter output to include only features of specified types (e.g., gene,exon) |
-V, --verbose |
Enable verbose output |
-h, --help |
Show help message |
search
Searches for features using a specified attribute value and retrieves the full annotation models.
Options:
Required
| Option | Description |
|---|---|
-i, --input <INPUT> |
Input GFF file path |
| (one of) | |
-a, --attr ATTRIBUTE_VALUE> |
Search a single attribute value/pattern |
-A, --attr-list <ATTRIBUTE_LIST> |
Search attribute values/patterns defined in a text file |
Optional
| Option | Description |
|---|---|
-o, --output <OUT> |
Output file path (default: stdout) |
-r, --regex <REGEX> |
Enable regex matching for attribute values |
-T, --types <TYPES> |
Filter output to include only features of specified types (e.g., gene,exon) |
-V, --verbose |
Enable verbose output |
-h, --help |
Show help message |
Example Use Cases
# Build index
# Extract all features overlapping with a region
# Extract models from a list of gene IDs
# Search by gene name and extract the full model
Using GFFx as a Rust Library
You can use GFFx as a Rust library in your own project.
Add to Cargo.toml
[]
= "^0.2.0" # Please check the latest version in crates.io
Example: Manually extract features from region using index files
The following example runs inside a main() -> Result<()> context:
use Result;
use ;
use PathBuf;
use FxHashMap;
Available Public APIs
Index building & checking (index_builder)
build_index
Index loading (index_loader)
load_gof,load_prt,load_fts,load_atn,load_a2f,load_sqssafe_mmap_readonlyGofEntry,PrtEntry,A2fEntry
Interval querying data structures (utils::serial_interval_trees)
IntervalTree,Intervalsave_multiple_trees,write_offsets_to_file
Other utilities (utils::common)
CommonArgs(for command-line compatibility)write_gff_output,check_index_files_existRootMatched,resolve_root,extract_root_matches,roots_to_offsets
Index File Types
| File Extension | Purpose |
|---|---|
.gof |
Byte offset index for GFF feature blocks |
.fts |
Feature ID table |
.prt |
Child to parent mapping |
.a2f |
Attribute to feature ID mapping |
.atn |
Attribute value table |
.sqs |
Sequence ID table |
.rit |
Interval tree index |
.rix |
Byte offest index for interval trees in.rit file |
Notes
- Make sure you run
gffx indexbefore usingintersect,extract, orsearch.
License
GFFx is released under the MIT or Apache-2.0 License.
Citation
If you use GFFx, please cite our preprint:
Chen, B., Wu, D., & Zhang, G. (2025).
GFFx: A Rust-based suite of utilities for ultra-fast genomic feature extraction.
bioRxiv 2025.08.08.669426. https://doi.org/10.1101/2025.08.08.669426