commitbee 0.3.1

AI-powered commit message generator using tree-sitter semantic analysis and local LLMs
Documentation
<!--
This file contains the content for GitHub issue #1.
Create with: gh issue create --title "..." --body-file ISSUE_THINKING_MODELS.md
Delete this file after the issue is created.
-->

# Ollama thinking models exhaust token budget before producing output

## The Problem

Models with thinking/reasoning enabled (e.g., `qwen3:4b` with default settings) prepend `<think>...</think>` blocks before their actual JSON response. With the default `num_predict: 256` token budget, the thinking tokens consume most or all of the output budget, leaving insufficient tokens for the commit message JSON.

What happens under the hood: the model spends all 256 tokens writing its internal reasoning inside `<think>...</think>` tags, and the token budget runs out before it ever gets to the actual JSON commit message. The result is either an empty response or a truncated one that can't be parsed.

In practice, this is what you'll see:

```txt
 WARN empty response from LLM, skipping candidate=1
commitbee::provider::error

  × Provider 'ollama' error: No valid commit messages generated
```

You can verify this by calling the Ollama API directly with the same 256-token budget:

```bash
curl -s http://localhost:11434/api/generate -d '{
  "model": "qwen3:4b",
  "prompt": "Respond with ONLY this JSON: {\"type\": \"docs\", \"scope\": null, \"subject\": \"update README badges\", \"body\": null, \"breaking_change\": null}",
  "stream": false,
  "options": { "num_predict": 256 }
}' | python3 -m json.tool
```

The response shows exactly what's happening:

```json
{
    "response": "",
    "thinking": "We are given a git diff for README.md. The summary says: 1 file modified... The change is adding new badges (and removing old ones? but the diff shows old",
    "done": true,
    "done_reason": "length"
}
```

- **`"response": ""`** — completely empty, no JSON produced
- **`"thinking"`** — the entire 256-token budget was spent on internal reasoning
- **`"done_reason": "length"`** — the model hit the token limit mid-thought, before it ever started writing the actual output

## Who Is Affected

Anyone using `qwen3:4b` with its default thinking mode enabled, which is the model listed in the README's quick start section.

## The Recommended Model

The default model is now **`qwen3.5:4b`**, which does not use thinking mode and works reliably with the default 256-token budget. It's smaller (3.4GB vs 4.3GB for qwen3:4b), produces clean JSON output, and has a simpler tag.

To use it:

```bash
ollama pull qwen3.5:4b
```

Then in your config (`commitbee init` to create one):

```toml
model = "qwen3.5:4b"
```

## Fix

The default `num_predict` has been bumped from 256 to 1024, giving thinking models enough room for both the `<think>` block and the JSON response. The sanitizer also strips `<think>` and `<thought>` blocks from LLM output before parsing, so the thinking content doesn't interfere with JSON extraction.

If you're on v0.3.0, you can work around the issue by setting `num_predict = 1024` in your config file (`commitbee init` to create one).

## Notes on Cloud Providers

I've also tested CommitBee with the **Anthropic API** (Claude Sonnet 4.6), which generally works and produces higher-quality commit messages than local Ollama models. However, there are some provider-specific edge cases I'm still investigating. Every provider and model has its own quirks — I'd encourage users to try different combinations and find what works best for their workflow. Feedback on provider/model experiences is welcome.