sds-converter-core
A Rust library for bidirectional conversion between Safety Data Sheet (SDS) documents (Word/PDF) and the Japanese Ministry of Health, Labour and Welfare (MHLW) standard JSON format.
Supports documents in Japanese, English, Simplified Chinese, and Traditional Chinese.
Looking for the CLI? Install
sds-converterinstead.
Features
- SDS document → JSON: Extracts text from PDF/DOCX and converts it to the MHLW SDS data exchange format v1.0 via LLM API.
- JSON → DOCX: Generates a JIS Z 7253-compliant 16-section Word document from the standard JSON, with localized section headings.
- Multilingual: Handles source documents in
ja/en/zh-CN/zh-TW. - Extensible LLM backend: Ships with Anthropic Claude, OpenAI GPT, and Google Gemini backends. Bring your own by implementing
LlmBackend.
Installation
[]
= "0.1"
Library Usage
Convert SDS document to JSON (Anthropic Claude)
use ;
async
Convert JSON to Word document
use ;
OpenAI GPT or Google Gemini backend
use ;
// OpenAI GPT
let config = LlmConfig ;
let backend = openai;
// Google Gemini
let config = LlmConfig ;
let backend = gemini;
// Any OpenAI-compatible endpoint
let backend = new;
Extract raw text from a document
Use extract_text to pull the raw text out of a PDF, DOCX, or plain-text file without making an LLM call. Useful for building custom pipelines or inspecting what the LLM receives.
use extract_text;
async
Supported extensions: .pdf, .docx, .xlsx, .txt.
Validate an extracted SdsRoot
validate checks the structural completeness of an SdsRoot and returns a list of warning strings. It does not hard-fail — partial results remain usable.
use ;
Custom LLM backend
Implement the LlmBackend trait to use any LLM provider:
use ;
JSON Format
The output JSON conforms to the MHLW SDS Data Exchange Format v1.0 (厚生労働省SDS情報交換のための標準的フォーマット, published 2025-03-31).
The schema covers all 16 sections of JIS Z 7253 with ~200 structured fields.
Language Support
| Language | source_language / output_language |
Source document standard | Output DOCX headings |
|---|---|---|---|
| Japanese | Language::Japanese |
JIS Z 7253 | JIS Z 7253 |
| English | Language::English |
GHS/OSHA HazCom | GHS Rev.10 / ISO 11014 |
| Simplified Chinese | Language::ChineseSimplified |
GB/T 16483 | GB/T 16483-2012 |
| Traditional Chinese | Language::ChineseTraditional |
CNS 15030 | CNS 15030 |
Requirements
- Rust 1.75+
- An LLM API key (for
convert_to_jsononly)- Anthropic: Get API key
- OpenAI: Get API key
- Google Gemini: Get API key
- Input files must be text-based PDF or DOCX
- Encrypted PDFs are not supported (text extraction will fail)
- Scanned/image-only PDFs are not supported (no text to extract)
References
- MHLW — SDS Standard Data Exchange Format (official page) (Japanese)
- SDS Data Exchange Format Developer Manual (PDF) (Japanese)
License
Licensed under either of:
- Apache License, Version 2.0
- MIT License
at your option.