TSG - Transcript Segment Graph
TSG is a Rust library and command-line tool for creating, manipulating, and analyzing transcript segment graphs. It provides a comprehensive framework for modeling segmented transcript data, analyzing non-linear splicing events, and working with genomic structural variants.

Features
- Parse and write TSG format files
- Build and manipulate transcript segment graphs
- Support for multiple graphs within a single file
- Analyze paths and connectivity between transcript segments
- Support for various element types: nodes, edges, groups, and chains
- Export graphs to DOT format for visualization
- Traverse the graph to identify valid transcript paths
- Read identity tracking to ensure biological validity
- Build graphs from chains and validate path traversals
- Support for genomic coordinates with strand information
- Support for read evidence with types
- Inter-graph links for fusion events and other cross-graph relationships
Installation
Library
Add this to your Cargo.toml:
[]
= "0.1.0"
Command-line Tool
Install the CLI tool:
Library Usage
Loading a TSG file
use TSGraph;
use Path;
Working with Multiple Graphs
use ;
use BString;
Building Graphs from Chains
use ;
use HashMap;
Finding Valid Paths Through Specific Graphs
use TSGraph;
CLI Usage
The TSG command-line tool provides a convenient interface for common operations:
# Display help
# Parse and validate a TSG file
# List all graphs in a TSG file
# Convert a specific graph to DOT format for visualization
# Extract statistics from a TSG file
# Find all paths through a specific graph
# Find all inter-graph links
TSG File Format
The TSG format is a tab-delimited text format representing transcript assemblies as graphs. It supports multiple independent graphs within a single file.
Multi-Graph Support
TSG supports multiple graphs within a single file using a graph namespace approach. Each element in the file can be associated with a specific graph using a graph ID prefix:
graph_id:element_id
For example, gene_a:n1 refers to node n1 in the graph identified as "gene_a".
Record Types
Each line in a TSG file starts with a letter denoting the record type:
H- Header information (including graph definitions)N- Node definition (exon or transcript segment)E- Edge definition (splice junction or structural variant)U- Unordered group (set of elements)P- Path (ordered traversal through the graph)C- Chain (alternating nodes and edges)A- Attribute for any element (metadata)L- Inter-graph link (connections between different graphs)
Conceptual Model
In the TSG model:
- Graphs (G) represent independent transcript graphs, each with its own set of nodes and edges.
- Chains (C) are used to build each graph's structure.
- Paths (P) are traversals through the constructed graphs.
- Links (L) establish relationships between elements in different graphs.
This distinction is important: chains define what each graph is, paths define ways to traverse each graph, and links define relationships between graphs.
Example with Multiple Graphs
# Global headers
H TSG 1.0
H reference GRCh38
# First graph
G gene_a name:Z:BRCA1 locus:Z:chr17q21.31
# Nodes for gene_a
N n1 chr17:+:41196312-41196402 read1:SO,read2:SO ACGTACGT
N n2 chr17:+:41199660-41199720 read2:IN,read3:IN TGCATGCA
N n3 chr17:+:41203080-41203134 read1:SI,read2:SI CTGACTGA
# Edges for gene_a
E e1 n1 n2 chr17,chr17,41196402,41199660,splice
E e2 n2 n3 chr17,chr17,41199720,41203080,splice
# Chains for gene_a
C chain1 n1 e1 n2 e2 n3
# Paths for gene_a
P transcript1 n1+ e1+ n2+ e2+ n3+
# Sets for gene_a
U exon_set n1 n2 n3
# Attributes for gene_a elements
A N n1 expression:f:10.5
A N n1 ptc:i:10
A P transcript1 tpm:f:8.2
# Second graph
G gene_b name:Z:BRCA2 locus:Z:chr13q13.1
# Nodes for gene_b
N n1 chr13:+:32315480-32315652 read4:SO,read5:SO GATTACA
N n2 chr13:+:32316528-32316800 read4:IN,read5:IN TACGATCG
N n3 chr13:+:32319077-32319325 read4:SI,read5:SI CGTACGTA
# Edges for gene_b
E e1 n1 n2 chr13,chr13,32315652,32316528,splice
E e2 n2 n3 chr13,chr13,32316800,32319077,splice
# Chains for gene_b
C chain1 n1 e1 n2 e2 n3
# Paths for gene_b
P transcript1 n1+ e1+ n2+ e2+ n3+
# Sets for gene_b
U exon_set n1 n2 n3
# Attributes for gene_b elements
A P transcript1 tpm:f:3.7
# Inter-graph links (appears after all graph sections)
L fusion1 gene_a:n3 gene_b:n1 fusion type:Z:chromosomal
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.