# hs-predict
[](https://crates.io/crates/hs-predict)
[](https://docs.rs/hs-predict)
[](https://github.com/kent-tokyo/hs-predict/actions)
[](#license)
**HS (Harmonized System) code prediction for chemical products.**
`hs-predict` uses an **Akinator-style interactive session** — asking targeted questions one at a time — to collect just enough information to classify your product, then applies a hybrid rule-based engine to produce a six-digit HS 2022 code.
> **Disclaimer**: Predictions are advisory only and must not be used as the sole basis for a customs declaration. Always verify with a qualified trade-compliance expert or the relevant customs authority.
---
## Features
- **Akinator-style UX** — ask only what's needed; no upfront form to fill in
- **Hybrid classification pipeline** — static rule table → SMILES functional-group engine → LLM fallback (priority order)
- **Physical-form awareness** — same compound, different form = different HS code (e.g. NaOH solid → 2815.11, solution → 2815.12)
- **98-compound static rule table** — common industrial chemicals across Chapters 28, 29, 72–81
- **SMILES functional-group detection** *(v0.3)* — 20 functional groups, organic/inorganic classification, heading-level hint (≤ 0.70 confidence)
- **Mixture support** — enter each component identifier and weight fraction (w/w%) progressively
- **IUPAC name → SMILES** — auto-resolved via [`chem-name-resolver`](https://crates.io/crates/chem-name-resolver)
- **PubChem enrichment** *(v0.2, `pubchem` feature)* — fills missing identifiers from CAS / IUPAC / SMILES
- **LLM integration** *(v0.4, `llm` feature)* — **trait-hook design**: implement `LlmClassifier` with your HTTP client; library supplies `PromptBuilder` (EN/JA), `LlmResponse`, validation, and `MockLlmClassifier` for tests
- **Japan tariff codes** — 統計品目番号 (9-digit) included in every result, based on 実行関税率表 2026-04-01
---
## Quick start
### Interactive mode (Akinator-style)
```rust
use hs_predict::session::{ClassificationSession, Answer, SessionResult};
use hs_predict::pipeline::HsPipeline;
let mut session = ClassificationSession::new();
let pipeline = HsPipeline::new();
let q = session.start();
println!("{}", q.prompt()); // "Please enter a CAS number, IUPAC name, SMILES, or InChIKey"
match session.answer(Answer::Text("1310-73-2".to_string()))? {
SessionResult::NeedMoreInfo { next_question } => {
println!("{}", next_question.prompt()); // "Is this a mixture?"
}
SessionResult::Ready => {
let product = session.to_product_description();
let prediction = pipeline.classify(&product)?;
println!("HS code: {}", prediction.display()); // "28.15.11"
if let Some(jp) = &prediction.jp_tariff_code {
println!("Japan tariff: {}", jp); // "281511000"
}
}
_ => {}
}
# Ok::<(), hs_predict::HsPredictError>(())
```
### Japanese session
```rust
use hs_predict::session::ClassificationSession;
use hs_predict::Language;
let mut session = ClassificationSession::new_ja(); // Japanese prompts
let q = session.start();
println!("{}", q.prompt()); // "CAS番号、IUPAC名、SMILES、InChIKey のいずれかを入力してください"
```
### Direct mode (known CAS + physical form)
```rust
use hs_predict::pipeline::HsPipeline;
use hs_predict::types::{ProductDescription, SubstanceIdentifier, PhysicalForm};
let pipeline = HsPipeline::new();
let product = ProductDescription {
identifier: SubstanceIdentifier::from_cas("1310-73-2"), // Sodium hydroxide
physical_form: Some(PhysicalForm::Solid),
purity_pct: None,
purity_type: None,
mixture_components: None,
intended_use: None,
additional_context: None,
};
let p = pipeline.classify(&product)?;
assert_eq!(&p.hs_code, "281511");
assert_eq!(p.display(), "28.15.11");
# Ok::<(), hs_predict::HsPredictError>(())
```
---
## Classification pipeline
```
Input: ProductDescription
│
▼
┌──────────────────────────────────────────────────────────┐
│ Priority 1: User mapping (confidence = 1.0) │
│ pipeline.with_mapping("64-19-7", "291511") │
└──────────────────────┬───────────────────────────────────┘
│ miss
▼
┌──────────────────────────────────────────────────────────┐
│ Priority 2: Static rule table (98 chemicals) │
│ CAS + physical form + purity → exact HS subheading │
└──────────────────────┬───────────────────────────────────┘
│ miss
▼
┌──────────────────────────────────────────────────────────┐
│ Priority 3: SMILES functional-group engine (v0.3) │
│ 20 functional groups → heading-level hint (≤ 0.70) │
└──────────────────────┬───────────────────────────────────┘
│ miss / low confidence
▼
┌──────────────────────────────────────────────────────────┐
│ Priority 4: LLM classifier (v0.4, trait hook) │
│ impl LlmClassifier for YourClient { ... } │
└──────────────────────┬───────────────────────────────────┘
│
▼
HsPrediction
{ hs_code, confidence, notes,
jp_tariff_code, recommended_action }
```
---
## SMILES functional-group detection (v0.3)
When a SMILES string is available (from the user or auto-filled by PubChem), the engine detects the following functional groups and maps them to a Chapter 29 heading hint:
| Anhydride | 29.15 | 0.65 |
| Isocyanate | 29.29 | 0.70 |
| Nitrile | 29.26 | 0.70 |
| Epoxide | 29.10 | 0.70 |
| Sulphonic acid | 29.04 | 0.68 |
| Amide | 29.24 | 0.67 |
| Aldehyde | 29.12 | 0.67 |
| Ketone | 29.14 | 0.67 |
| Carboxylic acid | 29.15 | 0.60 |
| Ester | 29.15 | 0.55 |
| Phenol | 29.07 | 0.67 |
| Alcohol | 29.05 | 0.60 |
| Amine | 29.21 | 0.63 |
| Organohalide | 29.03 | 0.65 |
| Ether | 29.09 | 0.63 |
| Thiol / Sulphide | 29.30 | 0.65 |
| Phosphate | 29.20 | 0.62 |
| Nitro | 29.04 | 0.60 |
| Inorganic (no C–C/C–H) | Ch. 28 | 0.55 |
```rust
use hs_predict::smiles::classify_smiles;
let r = classify_smiles("CC(C)=O").unwrap(); // acetone
assert_eq!(r.heading_hint.heading, Some(2914)); // 29.14 ketone
```
---
## LLM integration — design philosophy (v0.4)
### Why a trait hook, not a built-in client
HS code errors carry legal and financial consequences. Building an LLM API client directly into the library would:
- **Lock users into a specific provider** (Anthropic, OpenAI, …)
- **Create non-determinism in a compliance context** — the same compound might return different codes on different calls
- **Add secret management burden** to a library (API keys in `Cargo.toml`?)
- **Embed network latency and failure modes** into a synchronous classification call
Instead, `hs-predict` defines a trait. You implement it with whatever HTTP client, model, and prompt customisation your application needs. The library provides the structured input and validates the output.
```rust
// v0.4 — implement this trait with your preferred LLM client
use hs_predict::llm::{LlmClassifier, LlmPrompt, LlmResponse};
use futures::future::BoxFuture;
struct MyClaudeClient { api_key: String }
impl LlmClassifier for MyClaudeClient {
fn classify<'a>(&'a self, prompt: &'a LlmPrompt) -> BoxFuture<'a, hs_predict::Result<LlmResponse>> {
Box::pin(async move {
// 1. Call your LLM API using prompt.system_text / prompt.user_text
// 2. Parse the JSON response into LlmResponse
// 3. The library validates hs_code format and chapter consistency
todo!()
})
}
}
// Attach to the pipeline — no API key stored in the library
let pipeline = HsPipeline::new().with_llm(MyClaudeClient { api_key: "...".into() });
let prediction = pipeline.classify_with_llm(&product).await?;
```
The library provides:
- `LlmPrompt` — pre-built system prompt + user message (product info + SMILES hints)
- `LlmResponse` — the expected return type (`hs_code`, `confidence`, `rationale`, `alternatives`)
- Chapter-consistency validation (LLM code vs. SMILES engine hint)
- `MockLlmClassifier` under the `mock` feature for testing
---
## PubChem enrichment (v0.2)
PubChem integration fills in missing identifier fields before classification.
It is **factual data retrieval** (deterministic), not classification — a different role from the LLM fallback.
```rust
# #[cfg(feature = "pubchem")]
# async fn example() -> hs_predict::Result<()> {
use hs_predict::pipeline::HsPipeline;
use hs_predict::pubchem::PubChemClient;
use hs_predict::types::{ProductDescription, SubstanceIdentifier, PhysicalForm};
let pipeline = HsPipeline::new().with_pubchem(PubChemClient::new());
let mut product = ProductDescription {
identifier: SubstanceIdentifier::from_cas("1310-73-2"),
physical_form: Some(PhysicalForm::Solid),
purity_pct: None,
purity_type: None,
mixture_components: None,
intended_use: None,
additional_context: None,
};
pipeline.enrich(&mut product).await?; // fills SMILES, InChI, IUPAC name …
let prediction = pipeline.classify(&product)?;
println!("{}", prediction.display()); // "28.15.11"
# Ok(())
# }
```
---
## Akinator question flow
```
Q1: CAS / IUPAC name / SMILES / InChIKey?
│
├─ PubChem lookup (pubchem feature) ────────────────────────────┐
│ │
▼ ▼
Q2: Is this a mixture?
│
├─ Yes ──► Q: How many components?
│ └─ For each component:
│ ├─ Q: Identifier?
│ └─ Q: Weight fraction (w/w%)?
│
└─ No ───► Q3: Physical form?
(Solid / Powder / Granules / Liquid /
Solution / Gas / Foil / Ingot / Unknown)
│
├─ Solution ──► Q: Concentration (w/w%)?
│
▼
Q4: Intended use?
(Industrial / Pharmaceutical / Agricultural /
Food / Cosmetic / Other)
│
├─ No SMILES ──► Q5: Organic or Inorganic?
│ │
│ └─ Organic ──► Q6: Functional groups?
▼
Classification pipeline (Priorities 1–4)
```
---
## Supported identifiers
| CAS number | `1310-73-2` | ✅ |
| IUPAC systematic name | `sodium hydroxide` | ✅ (fallback) |
| SMILES | `[Na+].[OH-]` | ✅ |
| InChI | `InChI=1S/Na.H2O/h;1H/q+1;/p-1` | ✅ |
| InChIKey | `HEMHJVSKTPXQMS-UHFFFAOYSA-M` | ✅ |
> Only IUPAC systematic names are accepted as text input. Trade names and common aliases (e.g. "caustic soda") are not supported — they cannot be reliably resolved.
---
## Feature flags
| *(none)* | Rule-based + SMILES engine (Priorities 1–3) | — |
| `pubchem` | PubChem identifier enrichment | `reqwest`, `moka`, `governor` |
| `llm` | `LlmClassifier` trait + pipeline Priority 4 | — |
| `mock` | `MockLlmClassifier` for unit testing | — |
```toml
[dependencies]
hs-predict = { version = "0.4", features = ["pubchem"] }
```
---
## Example chemicals (static rule table)
| 1310-73-2 | Sodium hydroxide | Solid | 2815.11 |
| 1310-73-2 | Sodium hydroxide | Solution | 2815.12 |
| 7664-93-9 | Sulphuric acid | Any | 2807.00 |
| 7697-37-2 | Nitric acid | ≥ 98% | 2808.10 |
| 7697-37-2 | Nitric acid | < 98% | 2808.90 |
| 7664-41-7 | Ammonia | Gas | 2814.10 |
| 7664-41-7 | Ammonia | Solution | 2814.20 |
| 7429-90-5 | Aluminium | Ingot ≥ 99% | 7601.10 |
| 7429-90-5 | Aluminium | Powder | 7603.10 |
| 7429-90-5 | Aluminium | Foil | 7607.11 |
| 67-56-1 | Methanol | Liquid | 2905.11 |
| 64-17-5 | Ethanol | Liquid | 2207.10 |
| 67-64-1 | Acetone | Liquid | 2914.11 |
98 compounds total across Chapters 28, 29, 72–81. See [`src/rules/static_table.rs`](src/rules/static_table.rs) for the full list.
---
## Roadmap
| 0.1.0 | ✅ Released | Core rule engine + Akinator session + Japan tariff codes |
| 0.2.0 | ✅ Released | PubChem API integration |
| 0.3.0 | ✅ Released | SMILES functional-group detection (20 groups, Priority 3) |
| 0.4.0 | ✅ Released | `LlmClassifier` trait hook + `PromptBuilder` (EN/JA) + `MockLlmClassifier` |
---
## Minimum Supported Rust Version (MSRV)
Rust **1.75**.
---
## Contributing
Bug reports, rule-table additions, and PRs are welcome.
For new entries in the static rule table, please cite the HS 2022 nomenclature chapter/note that supports the classification.
---
## License
Licensed under either of:
- [MIT License](LICENSE-MIT)
- [Apache License 2.0](LICENSE-APACHE)
at your option.