file_to_json 0.1.0

Convert arbitrary text-based files into JSON using local parsers and an OpenRouter-powered fallback.
Documentation

file_to_json

file_to_json is a Rust library that converts arbitrary text-based files into JSON. It understands a set of common structured formats locally (CSV, JSON, YAML, TOML) and falls back to an OpenRouter-hosted LLM for any formats it does not recognise.

Features

  • Local parsers for CSV, JSON, YAML, and TOML.
  • Automatic PDF text extraction before calling the LLM.
  • OpenRouter LLM fallback (default model: openrouter/polaris-alpha).
  • Automatic chunking for large text payloads to stay within LLM limits.
  • Safe guards against sending large or non-UTF-8 payloads to the LLM.
  • Simple API returning serde_json::Value.
  • Configurable fallback strategies for large files (chunking or code generation).

Installation

Add the crate to your project:

cargo add file_to_json --git https://github.com/your-org/file_to_json

(Replace the repository URL with where you host the crate.)

Usage

use file_to_json::{Converter, FallbackStrategy, OpenRouterConfig};

fn main() -> Result<(), file_to_json::ConvertError> {
    // Option 1: from environment variables
    let converter = Converter::from_env()?;

    // Option 2: custom configuration
    // let converter = Converter::new(OpenRouterConfig {
    //     api_key: "sk-or-...".into(),
    //     model: "openrouter/polaris-alpha".into(),
    //     timeout: std::time::Duration::from_secs(60),
    //     fallback_strategy: FallbackStrategy::CodeGeneration,
    // })?;

    let value = converter.convert_path("data/sample.csv")?;
    println!("{}", serde_json::to_string_pretty(&value)?);
    Ok(())
}

Environment variables

  • OPENROUTER_API_KEYrequired. Your OpenRouter API key.
  • OPENROUTER_MODEL – optional. Defaults to openrouter/polaris-alpha.
  • OPENROUTER_FALLBACK_STRATEGY – optional. chunked (default) or code.

Behaviour

  1. If the file extension is recognised, the crate parses it locally.
  2. Otherwise it sends the UTF-8 content (after extracting text for PDFs) to OpenRouter. For inputs that exceed 128 KiB the fallback strategy determines how to proceed:
    • chunked (default): splits the input into ≤128 KiB segments, converts each chunk, and merges the returned JSON (arrays concatenated, objects shallow-merged, mixed types wrapped in an array). Works best when each chunk shares a compatible structure.
    • code: sends the first/middle/last 10 lines to the model, asks for Python 3 code that can parse the full file, writes that code to a temporary script, and executes it locally to produce JSON (requires python3 on the PATH).
  3. The result is returned as serde_json::Value.

Binary files are rejected unless they can be converted to UTF-8 text (e.g. PDFs via the built-in extractor) or handled by the code-generation fallback.

Testing

cargo test

License

This project is distributed under the terms of the MIT license.