crossref-rs 0.3.0

Crossref literature metadata & BibTeX tool (Nushell plugin + universal CLI)
Documentation
# crossref-rs Project Specification (spec.md)

**Project Name**: crossref-rs  
**Version**: v0.2.0
**Status**: Revised
**Last Updated**: 2026-04-01
**Authors**: TWu20 (Tony Wu)
**Goal**: Refactor the original Nushell Crossref tool into a high-performance, cross-platform Rust implementation that supports both Nushell native plugins and a standard CLI for use in any shell.

## 1. Project Goals and Vision

### 1.1 Core Objectives

- Provide a unified command-line tool for literature metadata querying and BibTeX management.
- Deliver a single binary (or multiple binaries) that supports:
  - Nushell native plugin (structured data, pipeline-friendly)
  - Standard CLI for any shell (zsh, fish, bash, PowerShell, etc.)
- Fully preserve and enhance the core functionality of the original Nushell script.
- **Strong emphasis on first-time user experience** to lower the configuration barrier.
- Prioritize **performance**, **reliability**, **usability**, **maintainability**, and **extensibility**.
- Comply with Crossref Polite Usage guidelines, including caching, configuration, and proper error handling.

## 2. Functional Requirements

### 2.1 Core Commands

All commands support the following common options:

- `--format` / `-f`: `table` (default), `json`, `bibtex`, `yaml`, `fzf`
- `--config`: Path to config file (default: `~/.config/crossref.toml`)
- `--no-cache`: Disable cache
- `--verbose` / `-v`: Verbose output
- `--email`: Override polite email (highest priority)

#### 2.1.1 `crossref fetch-meta <DOI>`

- Input: Single DOI or list of DOIs
- Output: Literature metadata (Record / Table)
- Enhancement: Include Unpaywall OA information (`is_oa`, `oa_status`, `pdf_url`)

#### 2.1.2 `crossref fetch-bib <DOI>`

- Input: Single or multiple DOIs
- Output: BibTeX string or save to file
- Enhancements:
  - Intelligent citation key generation (AuthorYear + title abbreviation + conflict suffix a/b/c)
  - `--append <file.bib>`: Automatically deduplicate and append
  - `--key-style`: `author-year` (default), `short-title`, etc.

#### 2.1.3 `crossref search`

- Rich filters:
  - `--title`, `--author`, `--year-from`, `--year-to`
  - `--type` (journal-article, book, etc.)
  - `--open-access`, `--rows`, `--sort`
- Output: Structured table (supports piping: `| where score > 80 | crossref fetch-bib`)

#### 2.1.4 `crossref pdf <DOI>`

- Return the best available PDF link (Unpaywall OA → CityU EZproxy → DOI landing page)

#### 2.1.5 Helper Commands

- `crossref config`: View/edit configuration
- `crossref cache clear`
- `crossref version`

### 2.2 Configuration Loading and First-Run Guidance (Key Feature)

#### 2.2.1 Configuration Priority (Strict Order)

1. **Command-line arguments** (`--email`, `--config`, etc.) — highest priority
2. **Environment variables** (`CROSSREF_EMAIL`, `CROSSREF_PROXY`, `CROSSREF_ROWS`, `CROSSREF_CACHE_TTL_DAYS`, `CROSSREF_DEFAULT_FORMAT`, `CROSSREF_FUZZY_FINDER`, etc.)
3. **Configuration file** (`~/.config/crossref.toml` or path specified by `--config`)
4. **Built-in defaults**

#### 2.2.2 First-Run Smart Guidance (CLI-only)

**Enabled only in the standard CLI binary (`crossref`)**. The Nushell plugin does **not** trigger automatic creation.

**Trigger Condition**:

- No `email` provided via command-line argument, **and**
- `CROSSREF_EMAIL` environment variable is not set, **and**
- Config file does not exist or `email` field is empty/missing

**Behavior**:

- Automatically create a default configuration file at the standard XDG path (`~/.config/crossref.toml`)
- The generated file contains **detailed comments** (in both English and Chinese) explaining each field
- Display a **prominent, well-formatted guidance message** in the terminal including:
  - Full path of the created config file
  - Strong recommendation to edit the `email` field immediately
  - Instructions on how to configure via **environment variables** for different shells
  - Example commands for Bash/Zsh, Fish, Nushell, and PowerShell
  - Message: “Edit the file and re-run your command to continue.”
- After showing guidance, exit gracefully (exit code 0) without executing the original command

**Default Config Template (key sections)**:

```toml
# crossref-rs Default Configuration File
# Auto-generated on {timestamp}

# [REQUIRED] Email for Crossref API (polite usage)
# Replace with your real email to avoid rate limiting
email = "your.name@example.com"

# CityU EZproxy (commonly used by users in Hong Kong)
proxy = "doi-org.ezproxy.cityu.edu.hk"

# Default number of search results
default_rows = 10

# Cache expiration in days
cache_ttl_days = 30

# Optional: custom cache directory
# cache_dir = "/path/to/cache"

# Default output format (table, json, yaml, bibtex, fzf)
# default_format = "table"

# Fuzzy finder program for interactive selection (default: fzf)
# fuzzy_finder_cmd = "fzf"
```

**Terminal Guidance Example** (with color support):

```text
╔══════════════════════════════════════════════════════════════╗
║                 crossref-rs First-Run Setup                  ║
╟──────────────────────────────────────────────────────────────╢
║ A default configuration file has been created for you at:    ║
║   ~/.config/crossref.toml                                    ║
║                                                              ║
║ Please open it now and set your email address:               ║
║   email = "your.real.email@example.com"                      ║
║                                                              ║
║ Alternatively, set via environment variable (quick setup):   ║
║   • Bash / Zsh   : export CROSSREF_EMAIL=you@example.com     ║
║   • Fish         : set -gx CROSSREF_EMAIL you@example.com    ║
║   • Nushell      : $env.CROSSREF_EMAIL = "you@example.com"   ║
║   • PowerShell   : $env:CROSSREF_EMAIL = "you@example.com"   ║
║                                                              ║
║ After editing, re-run your command.                          ║
╚══════════════════════════════════════════════════════════════╝
```

#### 2.2.3 Nushell Plugin Configuration Handling

- **Does not** auto-create the config file
- When critical configuration (email) is missing, output a clear error message guiding the user to set the environment variable or create the config manually

## 3. Technical Architecture

### 3.1 Project Structure

```
crossref-rs/
├── Cargo.toml
├── README.md
├── spec.md
├── src/
│   ├── lib.rs               # Public library interface
│   ├── models.rs            # Data models (Work, BibEntry, etc.)
│   ├── client.rs            # API client abstraction
│   ├── cache.rs             # Caching layer
│   ├── bibtex.rs            # BibTeX generation & parsing
│   ├── config.rs            # Config loading + first-run guidance
│   ├── utils.rs             # Pure utility functions
│   └── error.rs             # Custom error types
├── src/bin/
│   ├── cli.rs               # Standard CLI binary
│   └── nu_plugin_crossref.rs # Nushell plugin binary
├── tests/                   # Unit & integration tests
└── benches/                 # Optional performance benchmarks
```

### 3.2 Key Dependencies

- `directories` / `dirs`: Cross-platform config directory detection
- `confy` or `config`: Config file loading
- `colored`: Beautiful guidance message
- `chrono`: Timestamp in generated config (optional)
- Plus all previously listed crates (reqwest, clap, nu-plugin, etc.)

### 3.3 Config Module Responsibilities (`config.rs`)

- Merge command-line args + environment variables + config file
- Provide `Config::load()` and `Config::load_with_guidance()`
- Implement `create_default_config()` and `print_first_run_guidance()`

## 4. Non-Functional Requirements

### 4.1 Code Quality & Maintainability (Critical)

- **Modular architecture** strictly following the **Single Responsibility Principle (SRP)**: each module, struct, and function has exactly one reason to change.
- **Test-Driven Development (TDD)** is mandatory for all core logic:
  - Write failing tests first
  - Implement the minimal code to make tests pass
  - Refactor while keeping tests green
- Unit test coverage target: ≥ 85% for `lib/` modules
- Integration tests for CLI and plugin entry points
- **Builder pattern** **must** be used for all complex struct creation.
  - Use the **`derive_builder`** crate (`#[derive(Builder)]`) for all non-trivial structs (`Config`, `SearchQuery`, `BibEntry`, `Work`, etc.).
  - This provides clean, ergonomic, immutable builders with method chaining.
- **Functional programming style** must be used as much as possible:
  - Prefer iterators (`iter()`, `map`, `filter`, `filter_map`, `flat_map`, `fold`, `collect`, etc.) over imperative `for` loops.
  - Minimize mutable state; favor immutable transformations and method chaining.
  - Use higher-order functions and combinators wherever they improve clarity and reduce side effects.

## 5. Development Standards and Workflow

### 5.1 Code Structure Principles

- **Modular design**: Every file and module must follow SRP
- **Clear separation of concerns** (API, config, cache, bibtex, CLI, plugin)
- **No god modules or god functions**
- All public APIs in `lib.rs` must be well-documented with Rust doc comments
- Every complex struct **must** derive `Builder` via `derive_builder`.
- Enforce **builder pattern** and **functional style** in every new struct and algorithm

### 5.2 Test-Driven Development (TDD) Mandate

- All new features or bug fixes must be accompanied by tests written **before** the implementation code
- Use `#[cfg(test)]` modules colocated with the code under test
- Mock external HTTP calls with `wiremock` or `httpmock`
- Run `cargo test` before every commit

### 5.3 Code Style & Tooling

- Enforce `rustfmt` and `clippy --all-targets -- -D warnings`
- Use `cargo clippy` and `cargo fmt` in CI
- Conventional Commits for all git messages
- Semantic Versioning (SemVer)

### 5.4 Git Workflow

- `main` branch is always stable and releasable
- Feature branches prefixed with `feat/`, `fix/`, `refactor/`, `test/`
- Every PR must:
  - Pass all tests (including TDD tests)
  - Update this `spec.md` if requirements change
  - Include relevant documentation updates

## 6. Installation & Usage Examples

(Examples will be expanded in README.md)

## 7. Roadmap

- Phase 1: Core commands + first-run guidance + caching + configuration + **SRP/TDD/Builder/FP foundation**
- Phase 2: Deep Unpaywall integration, smart key generation, etc.
- Phase 3 (Current): Fuzzy finder integration (`--format fzf`), configurable default output format, configurable fuzzy finder program

## 8. Contribution Guidelines

- Fork → Feature branch → PR
- All new code must adhere to the modular SRP + TDD + Builder pattern + Functional style rules above
- Keep this `spec.md` updated with any changes

## 9. License

MIT