sds-converter-0.1.0 is not a library.
sds-converter
CLI tool for bidirectional conversion between Safety Data Sheet (SDS) documents (Word/PDF) and the Japanese Ministry of Health, Labour and Welfare (MHLW) standard JSON format.
Supports Japanese, English, Simplified Chinese, and Traditional Chinese.
Embedding in your Rust project? Use
sds-converter-coredirectly.
Installation
Commands
to-json — Convert PDF/Word → MHLW standard JSON
# Single file (Anthropic Claude, default)
# Specify source language
# Batch mode — process a whole directory
# OpenAI GPT (defaults to gpt-4o)
# Google Gemini (defaults to gemini-2.0-flash)
# Local LLM via Ollama (any OpenAI-compatible endpoint)
# From pre-extracted text (skip PDF parsing)
| Flag | Default | Description |
|---|---|---|
--input |
— | Input PDF, DOCX, or TXT file |
--input-dir |
— | Input directory (batch — processes all .pdf/.docx) |
--output |
— | Output JSON file |
--output-dir |
— | Output directory (batch — created if absent) |
--provider |
anthropic |
LLM provider: anthropic, openai, gemini |
--api-key |
env var | API key (fallback: ANTHROPIC_API_KEY / OPENAI_API_KEY / GEMINI_API_KEY) |
--model |
per-provider | Model name (defaults: claude-sonnet-4-6 / gpt-4o / gemini-2.0-flash) |
--base-url |
— | Custom OpenAI-compatible endpoint (Ollama, vLLM, etc.) |
--lang |
auto-detect | Source document language: ja, en, zh-cn, zh-tw |
to-docx — Convert MHLW standard JSON → Word document
# Single file
# Batch mode
| Flag | Default | Description |
|---|---|---|
--input |
— | Input JSON file |
--input-dir |
— | Input directory (batch — processes all .json) |
--output |
— | Output DOCX file |
--output-dir |
— | Output directory (batch) |
--lang |
ja |
Output language: ja, en, zh-cn, zh-tw |
extract-text — Extract raw text from PDF/DOCX
Extracts the text that the LLM would receive, without making any API call. Useful for inspecting extraction quality or running the LLM step separately.
# Save to file
# Print to stdout
# Then feed back into to-json
validate — Check a JSON file for structural issues
# Human-readable output (exits 0 = OK, 1 = warnings found)
# JSON array output for CI/scripting
Checks that key sections (Identification, HazardIdentification, ToxicologicalInformation, etc.) are populated. Exits with code 1 if any issues are found.
Language Support
| Language | --lang |
Source documents | Output DOCX headings |
|---|---|---|---|
| Japanese | ja |
JIS Z 7253 compliant SDS | JIS Z 7253 |
| English | en |
GHS/OSHA HazCom format | GHS Rev.10 / ISO 11014 |
| Simplified Chinese | zh-cn |
GB/T 16483 format | GB/T 16483-2012 |
| Traditional Chinese | zh-tw |
CNS 15030 format | CNS 15030 |
Requirements
- Rust 1.75+
- An LLM API key (for
to-jsononly)- Anthropic: set
ANTHROPIC_API_KEYor use--api-key - OpenAI: set
OPENAI_API_KEYor use--api-key - Google Gemini: set
GEMINI_API_KEYor use--api-key - Local LLM (Ollama etc.): use
--provider openai --base-url <url> --api-key dummy
- Anthropic: set
- Input files must be text-based PDF or DOCX
- Encrypted PDFs are not supported
- Scanned/image-only PDFs are not supported (no text to extract)
License
Licensed under either of:
- Apache License, Version 2.0
- MIT License
at your option.