Ollama Chat Provider¶

Ollama local LLM chat provider for the conversational interface.

Ollama chat provider using the OpenAI-compatible /v1 endpoint.

Ollama exposes an OpenAI-compatible API, so this provider uses the openai Python SDK. The main complexity is format conversion: the rest of the codebase uses Anthropic-style tool definitions and message structures, so this module translates between the two formats on every request and response.

Useful for local or self-hosted LLM inference where Anthropic API access is unavailable or cost-prohibitive.

Classes¶

OllamaChatProvider ¶

OllamaChatProvider(model: str = 'qwen2.5:32b-instruct-q3_K_M', base_url: str = 'http://localhost:11434/v1', keep_alive: str = '1h')

Bases: ChatProvider

Chat provider using Ollama's OpenAI-compatible endpoint.

Source code in src/chat_providers/ollama.py

def __init__(
    self,
    model: str = "qwen2.5:32b-instruct-q3_K_M",
    base_url: str = "http://localhost:11434/v1",
    keep_alive: str = "1h",
):
    from openai import AsyncOpenAI

    self._client = AsyncOpenAI(base_url=base_url, api_key="ollama")
    self._model = model
    self._keep_alive = keep_alive
    # Derive Ollama API root by stripping /v1 suffix
    self._ollama_api_root = base_url.rstrip("/").removesuffix("/v1")

Functions¶

is_model_loaded `async` ¶

is_model_loaded() -> bool

Check if the configured model is currently loaded in Ollama.

Hits /api/ps to list running models. Returns False if the model is not in the list (cold start expected). Fail-open: returns True on any error so callers never block on a failed probe.

Source code in src/chat_providers/ollama.py

async def is_model_loaded(self) -> bool:
    """Check if the configured model is currently loaded in Ollama.

    Hits ``/api/ps`` to list running models.  Returns ``False`` if the
    model is not in the list (cold start expected).  Fail-open: returns
    ``True`` on any error so callers never block on a failed probe.
    """
    def _probe() -> bool:
        url = f"{self._ollama_api_root}/api/ps"
        req = urllib.request.Request(url, method="GET")
        req.add_header("Accept", "application/json")
        with urllib.request.urlopen(req, timeout=5) as resp:
            data = json.loads(resp.read())
        # Model name in /api/ps may include tag — compare base names
        model_base = self._model.split(":")[0]
        for entry in data.get("models", []):
            entry_name = entry.get("name", "").split(":")[0]
            if entry_name == model_base:
                return True
        return False

    try:
        return await asyncio.to_thread(_probe)
    except Exception:
        return True  # fail-open

Ollama Chat Provider¶

Classes¶

OllamaChatProvider ¶

Functions¶

is_model_loaded async ¶

Functions¶

is_model_loaded `async` ¶