openai-reassembler 0.1.0

Reassemble OpenAI-compatible SSE streaming responses into non-streaming format
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14

# Reassemble OpenAI
A package to take completion and chat_completion SSEs and reconstruct them back into a non-streamed equivalent.

## Testing
`cargo test` will run the tests with existing fixtures generated using vLLM with this configuration:

```
docker run -d --name vllm-cpu --cpus 4 -p 8000:8000 -v ~/.cache/huggingface:/root/.cache/huggingface vllm/vllm-openai-cpu:latest --model Qwen/Qwen2-0.5B-Instruct --dtype float32 --max-model-len 128 --max-num-seqs 4 --enable-auto-tool-choice --tool-call-parser hermes
```

The tests use a deterministic request to get a streamed and non-streamed response. We then run the reassembly algorithm on the streamed version and compare it to the non-streamed version.

You can regenerate the fixtures by running `BASE_URL=xxxx MODEL=xxxx cargo test`