Two-Level Concurrency Model

Per-repo issue queues + per-LLM-backend connection pools

Level 1 — Per-repo issue queues
# repos.yaml
repos:
  - repo: owner/repo-A
    parallel_issues: 2    # process 2 issues at once
  - repo: owner/repo-B
    parallel_issues: 1    # sequential per repo
  - repo: owner/repo-C
    parallel_issues: false # also sequential
Repo A issues run 2 at a time, but they don't block Repo B's queue at all. Each repo has an independent worker pool.
Repo A
Queue: [#3, #5, #8]
Running: #3, #5
Repo B
Queue: [#12]
Running: #12
Repo C
Queue: []
Running: —
Level 2 — Per-LLM backend pools
# config.yaml
llm:
  pools:
    ollama:     1   # only 1 call at a time
    openai:     10  # plenty of capacity
    anthropic:  5
    opencode:   3
Each agent knows its backend. Before making an LLM call, it acquires a slot from that backend's pool semaphore. Released after response.
🟡 Ollama pool (slots: 1/1 used)
EngineerAgent #3 ▶ calling... [BLOCKED: other agents wait]
🟢 OpenAI pool (slots: 3/10 used)
PMAgent #5 ▶ calling...
ArchAgent #8 ▶ calling...
ReviewAgent #12 ▶ calling...
Mixed-backend scenario

You could configure your feature pipeline to use OpenAI for PM/Architect (fast) and Ollama for Engineer (free, local). The Ollama pool ensures only one local call happens at a time even if 5 pipelines are running simultaneously.

agents:
  product_manager: { backend: openai, model: gpt-4.1 }
  architect:        { backend: openai, model: gpt-4.1 }
  engineer:         { backend: ollama, model: qwen2.5-coder }  ← joins Ollama pool
  code_reviewer:    { backend: openai, model: gpt-4.1-mini }
What changes
repos.yaml gains:
parallel_issues: N per repo entry
config.yaml gains:
llm.pools.<backend>: N
watcher.py refactored to:
per-repo thread pools (not one global loop)
base_agent.py gains:
acquire/release from global LLM pool semaphore