LiteLLM passes extra parameters as top-level JSON fields in the request
body. _extract_agent_name() now reads agent_id and agent_name from the
body first, then falls back to X-Agent-Name / X-Agent-Id headers.
Critically, both fields are stripped from the body before any upstream
call — otherwise Claude/LM Studio reject the unknown parameters.
Applied to all four route handlers: /v1/chat/completions, /v1/messages,
/api/chat, /api/generate.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When X-Agent-Name or X-Agent-Id is present and matches an agent_models
entry, Festinger routes the main inference request to the configured
provider — not just the memory-writing utility model.
Protocol translation:
- Incoming OpenAI → outgoing Claude: system-message extraction,
max_tokens defaulting, response translated back to OpenAI format
- Incoming OpenAI → outgoing LM Studio/OpenAI: model + base_url swap
- All responses returned as OpenAI-compatible JSON or SSE
Also adds streaming synthesis for /v1/chat/completions (OpenAI SSE)
and X-Agent-Id fallback in _agent_name_from_headers so numeric
AGENT_ID env vars work without needing AGENT_NAME.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Festinger now reads X-Agent-Name from every intercepted request and
resolves the utility LLM model in priority order:
1. agent_models table — agent-specific (e.g. gunnar → claude, rind → qwen)
2. write_model_id config — global default
3. Request mirror — same provider/model Agent0 is currently using
New API: GET/PUT/DELETE /agent-models
New admin UI: "Agent models" section with assignment form and table.
Agent0 side: add a custom header X-Agent-Name: <name> in the LLM
provider config per agent container (AGENT_NAME env var can drive this).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Festinger now extracts provider/model/api-key from every intercepted
request and passes it to the context-discover queue as a fallback_model.
_process_context_discover uses it when write_model_id is not configured,
so Agent0's current model (LM Studio, Ollama, Anthropic) is automatically
reused for utility LLM calls without any extra setup.
Priority: write_model_id (explicit override) > fallback_model (request mirror)
Also updates upstream_openai default in config.yaml to LM Studio's
local address (host.docker.internal:1234).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>