A fast, chunked, memory-bounded Rust engine for electrophysiology signal processing — Neuropixels-scale, callable from Python.
Segovia is a lazy-evaluated, chunked, concurrent compute engine for massive multi-channel electrophysiology time-series (Neuropixels-scale: 30 kHz × thousands of channels). It is written in Rust, exposed to Python via PyO3, and built to slot into the existing neuroscience stack — SpikeInterface, SpikeGLX, Zarr, and NWB — rather than replace it. The aim is out-of-core, bounded-memory streaming preprocessing (bandpass filtering, common-median referencing, whitening) with GIL-released shared-memory threads instead of the process-pool / pickle / per-process-copy model that makes Python spike-sorting pipelines run out of memory.
Status
Early development — pre-MVP. This repository is currently architecture docs plus a project scaffold; the compute engine is not built yet and nothing is published to crates.io or PyPI. The install and quickstart below describe the target API, not a shipped one. Follow the roadmap for progress. The whole premise rests on one make-or-break benchmark — see The benchmark gate.
Contents
- Why Segovia
- How it works
- Install
- Quickstart
- The benchmark gate
- Architecture
- Roadmap
- Why the name
- Contributing
- Citation
- License
Why Segovia
A neuroscience lab can record a brain faster than its software can read it back. A single high-density Neuropixels probe writes roughly 80 GB/hour (~22 MB/s); standard Python pipelines load that at double size and then copy it wholesale into every worker process. Documented failures include a 26 GiB memory error filtering a modest recording and a 102 GiB blow-up during motion correction. The data is fine — the plumbing leaks.
Segovia targets that plumbing. It is CPU-first (the workload is IO/memory-bound, so a GPU would
spend more time waiting on the PCIe bus than computing), reuses mature Rust storage crates
(zarrs, hdf5-metno, arrow-rs) instead of reinventing them, and earns its keep through one
concrete advantage: true shared-memory threading in Rust with the GIL released. This is
out-of-core spike-sorting preprocessing — bounded memory regardless of recording length, real-time
capable, and callable from the Python tools researchers already use.
How it works
flowchart LR
A["Storage<br/>SpikeGLX .bin · Zarr · NWB/HDF5"] --> B["Chunked source<br/>channels × samples tiles"]
B --> C["Op chain<br/>bandpass → CMR → whiten → detect"]
C --> D["Sink<br/>zero-copy NumPy / Arrow"]
C -. "Rayon over chunks, GIL released" .-> C
style A fill:#0B1020,stroke:#5A6B8C,color:#F5F7FA
style B fill:#0B1020,stroke:#5A6B8C,color:#F5F7FA
style C fill:#0B1020,stroke:#CE422B,color:#F5F7FA
style D fill:#0B1020,stroke:#DEA584,color:#F5F7FA
Data is read in chunks (spans of channels × samples), streamed through an operation chain, and returned to Python zero-copy. Only a bounded window is ever resident in memory — the metaphor is the Aqueduct of Segovia, a continuous stream carried span-by-span across a row of stone arches.
Install
Not yet published — this is the planned install once the first release ships.
Quickstart
Target API (illustrative, not yet shipped). Read a SpikeGLX recording, run the bandpass → common-median-reference → whiten chain in bounded memory, and get a zero-copy NumPy result.
=
=
=
The benchmark gate
Segovia's existence hinges on one measurable claim (call it SC1): on a real 1-hour Neuropixels
recording, the Rust bandpass + CMR + whiten chain must run in under 2 GB of peak memory and
be faster than the equivalent spikeinterface(n_jobs=N) call on Windows and macOS. If that cannot
be shown, the premise is wrong and the project says so. This benchmark is built first, not last.
Result: pending — see the roadmap.
Architecture
The full architecture document set lives in docs/architecture/:
ARD.md— requirements, NFRs, risks, decisions.candidate-architectures.md— four options, trade-offs, recommendation.tech-stack.md— concrete crate choices and their sharp edges.roadmap.md— the milestone-level plan.adr/— Architecture Decision Records.
Roadmap
ROADMAP.md is the single source of truth for version and scope. In short: learn the
domain and de-risk the toolchain (M0–2), prove the benchmark win (M2–4, the go/no-go gate), grow
into a real engine with a Python API (M4–7), add breadth and correctness (M7–10), and ship as a
SpikeInterface preprocessing backend (M10–12). A deferred, gated single-cell vertical sits beyond
that — see docs/future/leukemia-direction.md.
Why the name
Segovia is named for Claudio Segovia, a friend who died of leukemia at 26. The name also evokes the Aqueduct of Segovia — a continuous stream carried across a long row of segmented stone arches, which is exactly this engine's chunked, span-by-span streaming model.
The connection is honest, not a marketing claim. An electrophysiology engine does not cure cancer, and
saying otherwise would be dishonest. But the underlying computational problem — data too large for
memory, and a Python layer that copies it until it chokes — is shared with single-cell genomics,
the computational backbone of modern leukemia research (clonal evolution, drug resistance, CAR-T).
Segovia's core is kept domain-neutral so the same machinery could one day help with that work too:
aided by the tool, not a tool made for it. That direction is deliberately deferred and gated —
the honest details, including disconfirming evidence, are in
docs/future/leukemia-direction.md.
Contributing
Contributions are welcome — see CONTRIBUTING.md. The project is Windows-first,
uses a Rust + PyO3 + maturin toolchain, conventional commits, and STAR-format PRs.
Citation
If you use Segovia in your research, please cite it via CITATION.cff (GitHub shows a
"Cite this repository" button). A DOI will be added on the first archived release.
License
Segovia is licensed under the GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later).
This is deliberate: Segovia is free for everyone — researchers, individuals, and non-profits — and the copyleft terms keep it that way. Anyone who distributes Segovia, or runs a modified version as a network service, must release their complete corresponding source under the same license, so the project cannot be taken closed-source or proprietary.
Unless you explicitly state otherwise, any contribution you submit for inclusion is licensed under AGPL-3.0-or-later, without any additional terms or conditions.