docs.rs failed to build content-semantic-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
UCFP Semantic Fingerprinting
This crate handles turning text into meaning-aware vectors. Given canonicalized text, it spits out dense embeddings you can use for similarity search, clustering, or whatever semantic stuff you're into.
We support a few modes:
- ONNX mode - Run models locally. Requires model files.
- API mode - Call out to Hugging Face (router endpoint is your friend)
- Stub mode - For testing. Generates fake but consistent vectors.
The nice thing is the fallback behavior. If a model file is missing or an API call fails, we fall back to stub mode instead of panicking. Saved our bacon more than once in production.
Threading notes
Tokenizers and ONNX sessions get cached per-thread. First call on any thread does the expensive setup. After that, it's fast. Batches work too.
Quick example
use semantic::{semanticize, SemanticConfig};
use std::path::PathBuf;
#[tokio::main]
async fn main() {
let cfg = SemanticConfig {
model_path: PathBuf::from("path/to/your/model.onnx"),
tokenizer_path: Some(PathBuf::from("path/to/your/tokenizer.json")),
..Default::default()
};
let embedding = semanticize("doc-1", "This is a test.", &cfg).await.unwrap();
}
API mode example
use semantic::{semanticize, SemanticConfig};
#[tokio::main]
async fn main() {
let cfg = SemanticConfig {
mode: "api".into(),
api_url: Some("https://router.huggingface.co/hf-inference/models/BAAI/bge-small-en-v1.5/pipeline/feature-extraction".into()),
api_auth_header: Some("Bearer YOUR_HF_TOKEN".into()),
api_provider: Some("auto".into()),
..Default::default()
};
let embedding = semanticize("doc-2", "Another test.", &cfg).await.unwrap();
}
Env vars to know
UFP_SEMANTIC_API_URL- Override the API endpointUFP_SEMANTIC_API_TOKEN- Your HF token
Full example at examples/api_embed.rs.