# flodl-hf playground
This directory was scaffolded by `fdl add flodl-hf` inside a flodl project. It's a standalone cargo crate that depends on `flodl-hf` and shows the one-liner `AutoModel` API over BERT / RoBERTa / DistilBERT.
## Getting started
```bash
fdl flodl-hf classify
# or, bypassing the parent fdl.yml:
cd flodl-hf && cargo run --release
```
The playground takes an optional HuggingFace repo id as its first argument. With none, it loads `cardiffnlp/twitter-roberta-base-sentiment-latest`. Any fine-tuned BERT / RoBERTa / DistilBERT classification checkpoint works — try:
```bash
cargo run --release -- nlptown/bert-base-multilingual-uncased-sentiment
cargo run --release -- lxyuan/distilbert-base-multilingual-cased-sentiments-student
```
## Feature flavors
`flodl-hf` ships with three profiles, selected via cargo features in `Cargo.toml`:
| Full | `default` (`hub` + `tokenizer`) | Load any model from the Hub, encode text, predict |
| Vision-only | `default-features = false, features = ["hub"]`| ViT / CLIP towers — no `tokenizers` crate pulled in |
| Offline | `default-features = false` | `safetensors` loader only, air-gapped pipelines |
Edit `Cargo.toml`'s `flodl-hf = "=X.Y.Z"` line to pick one. The default (full) is what the playground uses.
## `.bin`-only repos
Some older checkpoints (e.g. `nateraw/bert-base-uncased-emotion`) ship only `pytorch_model.bin` and no `model.safetensors`. Convert once by hand:
```bash
pip install torch transformers safetensors
python - <<'PY'
from transformers import AutoModel
from safetensors.torch import save_file
import os, pathlib
repo_id = "nateraw/bert-base-uncased-emotion"
model = AutoModel.from_pretrained(repo_id)
dest = pathlib.Path(os.environ.get("HF_HOME", pathlib.Path.home() / ".cache/huggingface")) / "flodl-converted" / repo_id
dest.mkdir(parents=True, exist_ok=True)
state = {k: v.contiguous() for k, v in model.state_dict().items()}
save_file(state, dest / "model.safetensors")
print(f"wrote {dest / 'model.safetensors'}")
PY
```
After conversion, `AutoModel::from_pretrained(repo_id)` picks up the local safetensors transparently via the `$HF_HOME/flodl-converted/<repo_id>/` cache. You only need to convert each checkpoint once.
If you prefer the committed script from the flodl repo, grab
[`convert_bin_to_safetensors.py`](https://github.com/fab2s/floDl/blob/main/flodl-hf/scripts/convert_bin_to_safetensors.py) and run it directly:
```bash
pip install torch transformers safetensors
python convert_bin_to_safetensors.py nateraw/bert-base-uncased-emotion
```
## Wiring flodl-hf into your main code
This scaffold is a side project for exploration. When you're ready to call flodl-hf from your actual training code, add the same dep to your main `Cargo.toml`:
```toml
flodl-hf = "=X.Y.Z" # same version as the flodl you already have
```
Example imports:
```rust
use flodl_hf::models::auto::AutoModelForSequenceClassification;
use flodl_hf::models::bert::BertModel;
use flodl_hf::tokenizer::HfTokenizer;
```
## Making this part of a cargo workspace (optional)
By default this playground has its own `target/` dir. To share compilation with your main crate, add a `[workspace]` table to your project's root `Cargo.toml`:
```toml
[workspace]
members = [".", "flodl-hf"]
```
## Docs and next steps
- **Architecture reference**: <https://flodl.dev/guide>
- **`AutoModel` module**: `flodl_hf::models::auto`
- **Per-family modules**: `flodl_hf::models::{bert, roberta, distilbert}` for task-head constructors and custom loading
- **Hub + tokenizer**: `flodl_hf::hub` / `flodl_hf::tokenizer`