- All agents now use lm_studio provider → http://festinger:11434
- ctx_length set to 32768 for Omega13 (128GB RAM); reduce for smaller machines
- Model: qwen2.5-7b-instruct (update to larger model on Omega13)
- Each agent has a unique A0_PERSISTENT_RUNTIME_ID for stable mcp_server_token
- agent_profile=agent0 and mcp_server_enabled=true set in all settings.json
- agents/agent0/prompts/ placeholder created for pull-on-start persona override
- pull-agent-identity.py now writes to usr/agents/agent0/prompts/ (correct override path)
- festinger: agent_frameworks table auto-seeded on startup with all 5 agents
- festinger: num_ctx injection, agent_frameworks CRUD + admin UI, /chat endpoint
- festinger: removed debug system_prompt logging
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LM Studio and Ollama run one model on one GPU — concurrent requests
cause crashes. Two fixes:
1. Per-upstream semaphore (concurrency=1) in _route_agent_chat for
lm-studio/ollama providers. All agent-routed calls to the same
base URL queue instead of hitting the GPU simultaneously.
2. skip_discovery=True when routing to a local model. Context discovery
would fire a second LM Studio call alongside the main inference.
Novel words are still registered in SOAS (low saliency) but the
LLM confirmation step waits. Configure write_model_id or a separate
agent model pointing at a cloud/remote model to re-enable live
context discovery.
3. _LLM_CONCURRENCY 2 → 1 in write_queue for the same reason.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two fixes:
1. Add /chat/completions alias (no /v1 prefix) — LiteLLM custom_openai
and openai_like providers post here directly.
2. Passthrough now redirects any path ending in chat/completions to the
proper /v1/chat/completions handler instead of forwarding blindly.
This catches v1/messages/chat/completions and other wrong paths that
result from misconfigured api_base in Agent0.
Both routes get full agent routing, recollection injection, and loop
detection — they're not raw passthroughs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
call_openai() (httpx-based) appends /v1/chat/completions to the upstream
URL. But base_url in the models table typically ends in /v1 (matching the
OpenAI SDK convention used by the resolution job). Combining them produced
/v1/v1/chat/completions → 404 from LM Studio.
Strip a trailing /v1 from the stored base_url before passing it to
call_openai() in the agent routing path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LiteLLM passes extra parameters as top-level JSON fields in the request
body. _extract_agent_name() now reads agent_id and agent_name from the
body first, then falls back to X-Agent-Name / X-Agent-Id headers.
Critically, both fields are stripped from the body before any upstream
call — otherwise Claude/LM Studio reject the unknown parameters.
Applied to all four route handlers: /v1/chat/completions, /v1/messages,
/api/chat, /api/generate.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When X-Agent-Name or X-Agent-Id is present and matches an agent_models
entry, Festinger routes the main inference request to the configured
provider — not just the memory-writing utility model.
Protocol translation:
- Incoming OpenAI → outgoing Claude: system-message extraction,
max_tokens defaulting, response translated back to OpenAI format
- Incoming OpenAI → outgoing LM Studio/OpenAI: model + base_url swap
- All responses returned as OpenAI-compatible JSON or SSE
Also adds streaming synthesis for /v1/chat/completions (OpenAI SSE)
and X-Agent-Id fallback in _agent_name_from_headers so numeric
AGENT_ID env vars work without needing AGENT_NAME.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>