# xportrs
[](https://crates.io/crates/xportrs)
[](https://docs.rs/xportrs)
[](https://github.com/rubentalstra/xportrs/actions/workflows/ci.yml)
[](https://opensource.org/licenses/MIT)
[](https://blog.rust-lang.org/)
**Pure Rust SAS XPORT (XPT) reader and writer for CDISC clinical trial data submissions.**
`xportrs` provides a safe, DataFrame-agnostic implementation of XPT v5 I/O with built-in regulatory compliance validation for FDA, PMDA, and NMPA submissions.
## Features
- **DataFrame-agnostic** - Works with any in-memory table representation
- **Agency compliance** - Built-in validation for FDA, PMDA, and NMPA requirements
- **Auto file splitting** - Automatically splits large files to meet agency size limits (5GB)
- **XPT v5 support** - Full read and write support for SAS XPORT v5 format
- **Configurable** - Text encoding modes, validation strictness, and more
## Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
xportrs = "0.0.2"
```
With optional features:
```toml
[dependencies]
xportrs = { version = "0.0.2", features = ["serde", "tracing"] }
```
## Quick Start
### Reading XPT Files
```rust
use xportrs::Xpt;
// Simple: read the first dataset
let dataset = Xpt::read("ae.xpt") ?;
println!("Domain: {}", dataset.domain_code);
println!("Rows: {}", dataset.nrows);
// Read a specific member from a multi-dataset file
let dm = Xpt::reader("study.xpt") ?.read_member("DM") ?;
// Read all members
let datasets = Xpt::reader("study.xpt") ?.read_all() ?;
// Inspect file metadata without loading data
let info = Xpt::inspect("data.xpt") ?;
for name in info.member_names() {
println ! ("Member: {}", name);
}
```
### Writing XPT Files
```rust
use xportrs::{Xpt, DomainDataset, Column, ColumnData};
// Create a dataset
let dataset = DomainDataset::new(
"AE".to_string(),
vec![
Column::new("USUBJID", ColumnData::String(vec![
Some("01-001".into()),
Some("01-002".into()),
])),
Column::new("AESEQ", ColumnData::I64(vec![Some(1), Some(1)])),
Column::new("AESTDY", ColumnData::F64(vec![Some(15.0), Some(22.0)])),
],
) ?;
// Write with structural validation only
Xpt::writer(dataset)
.finalize() ?
.write_path("ae.xpt") ?;
```
## Agency Compliance
When submitting clinical trial data to regulatory agencies, use the
`agency()` method to enable agency-specific validation rules:
```rust
use xportrs::{Xpt, Agency, DomainDataset};
let dataset = DomainDataset::new("AE".into(), vec![/* ... */]) ?;
// FDA submission - applies all FDA validation rules
let files = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize() ?
.write_path("ae.xpt") ?;
// Returns Vec<PathBuf> - multiple files if splitting occurred
println!("Created {} file(s)", files.len());
```
### Supported Agencies
| `Agency::FDA` | U.S. Food and Drug Administration | 5 GB |
| `Agency::PMDA` | Japan Pharmaceuticals and Medical Devices Agency | 5 GB |
| `Agency::NMPA` | China National Medical Products Administration | 5 GB |
### Agency Validation Rules
When an agency is specified, the following validations are applied:
- **ASCII-only** names, labels, and character values
- **Dataset names**: max 8 bytes, uppercase alphanumeric, must start with letter
- **Variable names**: max 8 bytes, uppercase alphanumeric with underscores
- **Labels**: max 40 bytes
- **Character values**: max 200 bytes
- **File naming**: dataset name must match file stem (case-insensitive)
## Automatic File Splitting
Large XPT files are automatically split when an agency is specified:
```rust
use xportrs::{Xpt, Agency, DomainDataset};
// Large dataset with millions of rows
let large_dataset = DomainDataset::new("LB".into(), /* ... */) ?;
// Files > 5GB are automatically split into numbered parts
let files = Xpt::writer(large_dataset)
.agency(Agency::FDA)
.finalize() ?
.write_path("lb.xpt") ?;
// Result: ["lb_001.xpt", "lb_002.xpt", ...] if split
// Result: ["lb.xpt"] if no split needed
```
You can also manually configure the size limit:
```rust
use xportrs::{Xpt, Config, DomainDataset};
let mut config = Config::default ();
config.write.max_size_gb = Some(2.0); // Split at 2GB
let files = Xpt::writer(dataset)
.config(config)
.finalize() ?
.write_path("ae.xpt") ?;
```
## Configuration Options
### Read Options
```rust
use xportrs::{Xpt, ReadOptions, TextMode};
let dataset = Xpt::reader("ae.xpt") ?
.options(ReadOptions::new()
.with_text_mode(TextMode::Latin1) // or Utf8 (default), Strict
.with_preserve_blanks(true)) // Keep trailing spaces
.read() ?;
```
### Write Options
```rust
use xportrs::{Xpt, Config, DomainDataset};
let mut config = Config::default ();
config.strict_checks = true; // Fail on validation errors
config.write.max_size_gb = Some(5.0); // Enable file splitting
Xpt::writer(dataset)
.config(config)
.finalize() ?
.write_path("ae.xpt") ?;
```
## Data Types
`xportrs` supports the following column data types:
| `ColumnData::F64` | Numeric | 64-bit floating point |
| `ColumnData::I64` | Numeric | 64-bit integer (stored as float) |
| `ColumnData::String` | Character | Variable-length text (1-200 bytes) |
All types support `Option<T>` for missing values (SAS missing = `.`).
## Validation Issues
The library provides detailed validation feedback:
```rust
use xportrs::{Xpt, Agency, DomainDataset};
let plan = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize() ?;
// Check for issues before writing
if plan.has_errors() {
for issue in plan.issues() {
eprintln ! ("{}", issue);
}
}
if plan.has_warnings() {
}
}
plan.write_path("ae.xpt") ?;
```
## CDISC Terminology
This crate uses CDISC SDTM vocabulary:
| **Domain dataset** | A table identified by a domain code (e.g., "AE", "DM", "LB") |
| **Observation** | One row/record in the dataset |
| **Variable** | One column; may have a role (Identifier/Topic/Timing/Qualifier/Rule) |
## Feature Flags
| `serde` | Enable serialization/deserialization support |
| `tracing` | Enable structured logging with the `tracing` crate |
| `full` | Enable all optional features |
```toml
# Enable all features
xportrs = { version = "0.0.2", features = ["full"] }
```
## Temporal Utilities
Convert between Rust chrono types and SAS date/time values:
```rust
use xportrs::temporal::{
sas_days_since_1960,
sas_seconds_since_1960,
date_from_sas_days,
datetime_from_sas_seconds,
};
use chrono::NaiveDate;
// Convert Rust date to SAS days
let date = NaiveDate::from_ymd_opt(2024, 6, 15).unwrap();
let sas_days = sas_days_since_1960( & date);
// Convert SAS days back to Rust date
let back = date_from_sas_days(sas_days);
```
## Safety
This crate is built with
`#![forbid(unsafe_code)]`. All binary parsing and encoding uses safe Rust constructs. The library has been designed with security in mind:
- No unsafe code blocks
- No external C dependencies
- Comprehensive input validation
- Protection against malformed files
## Minimum Supported Rust Version (MSRV)
The minimum supported Rust version is **1.92**.
## Related Projects
- [xportr (R)](https://github.com/atorus-research/xportr) - The R package that inspired this crate
- [Trial Submission Studio](https://github.com/rubentalstra/trial-submission-studio) - Desktop application using xportrs
## License
MIT License - see [LICENSE](LICENSE) for details.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Run tests (`cargo test`)
4. Run clippy (`cargo clippy --all-targets --all-features -- -D warnings`)
5. Commit your changes (`git commit -m 'Add amazing feature'`)
6. Push to the branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request