sds-converter
GUI + CLI tool for bidirectional conversion between Safety Data Sheet (SDS) documents (Word/PDF) and the Japanese Ministry of Health, Labour and Welfare (MHLW) standard JSON format.
Supports Japanese, English, Simplified Chinese, and Traditional Chinese.
Embedding in your Rust project? Use
sds-converter-coredirectly.
Download
| Platform | Download |
|---|---|
| macOS (Universal — Apple Silicon + Intel) | sds-converter-macos.zip |
| Windows (Portable .exe — no install required) | sds-converter-windows-portable.zip |
| Rust / CLI | cargo install sds-converter |
GUI Mode
Launch the graphical interface by running sds-converter without any arguments:
The GUI window (820×640) opens with five tabs:
| Tab | Function |
|---|---|
| Convert | SDS document (PDF/DOCX/XLSX/HTML/URL) → MHLW standard JSON |
| Generate | MHLW JSON → DOCX / HTML / PDF (with optional DOCX template) |
| Validate | Structural validation of MHLW JSON with colored OK/Warning/Error results |
| Extract Text | Raw text extraction from documents — no LLM API required |
| Settings | API key, model name, base URL, quality, language, UI language |
| Convert tab | Generate tab | Extract Text tab |
|---|---|---|
![]() |
![]() |
![]() |
Drag & drop files onto any tab to fill the input field automatically.
Settings are saved to ~/.config/sds-converter/config.toml and restored on next launch.
The GUI and CLI share the same conversion engine (tasks.rs), so results are identical.
Commands
to-json — Convert PDF/Word → MHLW standard JSON
# Single file (Anthropic Claude, default)
# Specify source language
# Batch mode — process a whole directory
# OpenAI GPT (defaults to gpt-4o-mini)
# Google Gemini (defaults to gemini-2.0-flash)
# Local LLM via Ollama (any OpenAI-compatible endpoint)
# From pre-extracted text (skip PDF parsing)
| Flag | Default | Description |
|---|---|---|
--input |
— | Input PDF, DOCX, XLSX, or TXT file |
--input-dir |
— | Input directory (batch — processes all .pdf/.docx/.xlsx/.xls) |
--output |
— | Output JSON file |
--output-dir |
— | Output directory (batch — created if absent) |
--provider |
anthropic |
LLM provider: anthropic, openai, gemini, mistral, groq, cohere, local |
--api-key |
env var | API key (see provider defaults below) |
--model |
per-provider | Model name override |
--base-url |
— | Custom OpenAI-compatible endpoint (for --provider local) |
--lang |
auto-detect | Source document language: ja, en, zh-cn, zh-tw |
--quality |
medium |
Preset: low (fast/cheap), medium, high (accurate) |
--concurrency |
4 |
Max parallel files in batch mode |
--suggested-name |
— | Rename output to SDS_<IssueDate>_<ProductCode>.json (MHLW §2.1.2 recommended convention) |
Provider defaults:
--provider |
Default model | Environment variable |
|---|---|---|
anthropic |
claude-haiku-4-5-20251001 (low/medium) · claude-sonnet-4-6 (high) |
ANTHROPIC_API_KEY |
openai |
gpt-4o-mini |
OPENAI_API_KEY |
gemini |
gemini-2.0-flash |
GEMINI_API_KEY |
mistral |
mistral-small-latest |
MISTRAL_API_KEY |
groq |
llama-3.3-70b-versatile |
GROQ_API_KEY |
cohere |
command-r-plus |
COHERE_API_KEY |
local |
llama3 |
LOCAL_LLM_API_KEY (optional; defaults to ollama) |
to-docx — Convert MHLW standard JSON → Word document
# Single file (built-in layout)
# Batch mode (built-in layout)
# Fill a Word template with {{Placeholder}} substitution
# Batch mode with template
Word template format
Prepare a .docx file with {{FieldName}} placeholders where FieldName is
a leaf key from the MHLW JSON schema. The full dot-path is also accepted for
disambiguation.
{{TradeNameJP}} → 製品和名
{{CompanyName}} → 会社名
{{Phone}} → 電話番号
{{IssueDate}} → 発行日
{{Identification.SupplierInformation.CompanyName}} → フルパス指定
Placeholders can appear anywhere in the document — paragraphs, table cells, headers, and footers. Word sometimes splits typed text across internal runs; the tool automatically merges such splits before substitution.
| Flag | Default | Description |
|---|---|---|
--input |
— | Input JSON file |
--input-dir |
— | Input directory (batch — processes all .json) |
--output |
— | Output DOCX file |
--output-dir |
— | Output directory (batch) |
--lang |
ja |
Output language: ja, en, zh-cn, zh-tw (without --template) |
--template |
— | Word template with {{FieldName}} placeholders |
extract-text — Extract raw text from PDF/DOCX
Extracts the text that the LLM would receive, without making any API call. Useful for inspecting extraction quality or running the LLM step separately.
# Save to file
# Print to stdout
# Then feed back into to-json
validate — Check a JSON file for structural issues
# Human-readable output (exits 0 = OK, 1 = warnings found)
# JSON array output for CI/scripting
Checks that key sections (Identification, HazardIdentification, ToxicologicalInformation, etc.) are populated. Exits with code 1 if any issues are found.
Language Support
| Language | --lang |
Source documents | Output DOCX headings |
|---|---|---|---|
| Japanese | ja |
JIS Z 7253 compliant SDS | JIS Z 7253 |
| English | en |
GHS/OSHA HazCom format | GHS Rev.10 / ISO 11014 |
| Simplified Chinese | zh-cn |
GB/T 16483 format | GB/T 16483-2012 |
| Traditional Chinese | zh-tw |
CNS 15030 format | CNS 15030 |
Requirements
- Rust 1.75+
- An LLM API key (for
to-jsononly) — set the provider's environment variable or pass--api-key- Anthropic:
ANTHROPIC_API_KEY - OpenAI:
OPENAI_API_KEY - Google Gemini:
GEMINI_API_KEY - Mistral:
MISTRAL_API_KEY - Groq:
GROQ_API_KEY - Cohere:
COHERE_API_KEY - Local LLM (Ollama etc.): use
--provider local --base-url <url>(no API key required)
- Anthropic:
- Input files must be text-based PDF or DOCX
- Encrypted PDFs are not supported
- CID font / Shift-JIS encoded PDFs (common in Japanese documents): handled by
pdftotext(poppler) fallback - Scanned/image-only PDFs: automatically retried via
pdftoppm+tesseractOCR (if installed), or via Claude Vision API (when using--provider anthropic) - Full 3-tier PDF fallback:
pdf-extract->pdftotext-> OCR/Vision
Rust Library
The conversion engine is available as a standalone library:
| Crate | crates.io | Description |
|---|---|---|
sds-converter-core |
sds-converter-core |
LLM-based extraction, DOCX/HTML generation, MHLW schema |
[]
= "0.3"
References
- MHLW — SDS Standard Data Exchange Format (official page) (Japanese)
- SDS Data Exchange Format Developer Manual (PDF) (Japanese)
License
Licensed under either of:
- Apache License, Version 2.0
- MIT License
at your option.


