polars_readstat_rs
Rust library for reading SAS (.sas7bdat), Stata (.dta), and SPSS (.sav/.zsav) files with Polars.
The crate provides:
- format-specific readers (
Sas7bdatReader,StataReader,SpssReader) - a format-agnostic scan API (
readstat_scan) - metadata/schema helpers
- Arrow FFI export helpers
- Stata/SPSS writers
This was nearly completed coded by Claude Code and Codex, but with a very particular setup that I hope makes it less likely to be a mess than other moslty-AI code repository. It was meant to directly replace the C++ and C code in relatively small, existing codebase (polars_readstat, v0.11.1) with the ability to exactly validate the new code's output against the old. For any given regression, the AI models could be told to refer directly to the spot in the code where the prior implementation did the same operation to try to figure out how to solve the issue. It could also compare any output to the output produced by other similar tools such as pandas and pyreadstat. I'm sure it's not the most beautiful code, but I'm an economist and I wanted a tool to exist that was faster than what's out there and implemented in Rust (so many build issues with using C++ and C across systems. So many...) but I didn't want to spend months on figuring out records layouts and encoding of SAS, Stata, and SPSS files. Hence, my first attempt that just directly plugged into other tools (polars_readstat, v0.11.1) and the AI-first version of this.
Install
[]
= "0.1"
Core API
1) Read directly to a DataFrame
use Sas7bdatReader;
let df = open?
.read
.finish?;
2) Read with projection/offset/limit
use StataReader;
let df = open?
.read
.with_columns
.with_offset
.with_limit
.finish?;
3) Format-agnostic lazy scan
use *;
use ;
let opts = ScanOptions ;
let lf = readstat_scan?;
let out = lf.select.collect?;
4) Metadata and schema
use ;
let metadata_json = readstat_metadata_json?;
let schema = readstat_schema?;
5) Writing (Stata/SPSS)
use ;
new.write_df?;
new.write_df?;
6) SPSS writer with schema and labels
use *;
use ;
use HashMap;
let df = new?;
let schema = SpssWriteSchema ;
let mut status_map: SpssValueLabelMap = new;
status_map.insert;
status_map.insert;
status_map.insert;
let value_labels: SpssValueLabels = from;
let variable_labels: SpssVariableLabels = from;
new
.with_schema
.with_value_labels
.with_variable_labels
.write_df?;
SPSS writer behavior and current limits:
- Variable names are validated as non-empty and
<= 64bytes. - SPSS short names are generated automatically (ASCII, uppercase, unique, max 8 chars) when needed.
- Strings are fixed-width in bytes and limited to
<= 255bytes per value. - Numeric output supports integer/float/bool/date/datetime/time columns (written as SPSS numeric values).
- Value labels are currently supported for numeric variables only.
- String value labels are not currently supported.
- Output encoding is selected automatically: Windows-1252 when possible, otherwise UTF-8 with an SPSS encoding record.
Arrow export
use sas_arrow_output;
let mut schema = read_to_arrow_schema_ffi?;
let mut stream = read_to_arrow_stream_ffi?;
See ARROW_EXPORT.md for FFI details.
Basic validation and benchmarks
Compare against Python reference outputs:
Read performance checks: