# Ask LLM — Full Reference for AI Agents

> MCP servers for AI-to-AI collaboration — bridge your AI client with Gemini, Codex, and Ollama.

This file contains everything an AI agent needs to integrate with Ask LLM MCP servers: tool schemas, configuration examples, provider details, and plugin setup.

## Architecture

Ask LLM is a monorepo with 4 published npm packages + 1 shared library + 1 Claude Code plugin:

```
packages/shared/       @ask-llm/shared     Internal shared code (registry, logger, progress tracker)
packages/gemini-mcp/   ask-gemini-mcp      MCP server for Google Gemini CLI
packages/codex-mcp/    ask-codex-mcp       MCP server for OpenAI Codex CLI
packages/ollama-mcp/   ask-ollama-mcp      MCP server for local Ollama LLMs
packages/llm-mcp/      ask-llm-mcp         Unified MCP server (auto-detects providers)
packages/claude-plugin @ask-llm/plugin     Claude Code plugin (agents, skills, hooks)
```

## Prerequisites

- Node.js >= 20.0.0
- For Gemini: `npm install -g @google/gemini-cli && gemini login`
- For Codex: Codex CLI installed and authenticated
- For Ollama: Ollama running locally (`https://ollama.com`) with a model pulled

## Installation

### Claude Code (user scope — available across all projects)

```bash
claude mcp add --scope user gemini -- npx -y ask-gemini-mcp
claude mcp add --scope user codex -- npx -y ask-codex-mcp
claude mcp add --scope user ollama -- npx -y ask-ollama-mcp
```

### Claude Code (project scope)

```bash
claude mcp add gemini -- npx -y ask-gemini-mcp
claude mcp add codex -- npx -y ask-codex-mcp
claude mcp add ollama -- npx -y ask-ollama-mcp
```

### Claude Desktop (claude_desktop_config.json)

```json
{
  "mcpServers": {
    "gemini": {
      "command": "npx",
      "args": ["-y", "ask-gemini-mcp"]
    },
    "codex": {
      "command": "npx",
      "args": ["-y", "ask-codex-mcp"]
    },
    "ollama": {
      "command": "npx",
      "args": ["-y", "ask-ollama-mcp"]
    }
  }
}
```

### Cursor (.cursor/mcp.json)

```json
{
  "mcpServers": {
    "gemini": {
      "command": "npx",
      "args": ["-y", "ask-gemini-mcp"]
    },
    "codex": {
      "command": "npx",
      "args": ["-y", "ask-codex-mcp"]
    },
    "ollama": {
      "command": "npx",
      "args": ["-y", "ask-ollama-mcp"]
    }
  }
}
```

### Codex CLI (~/.codex/config.toml)

```toml
[mcp_servers.gemini]
command = "npx"
args = ["-y", "ask-gemini-mcp"]
```

### Any MCP Client (STDIO transport)

```json
{
  "transport": {
    "type": "stdio",
    "command": "npx",
    "args": ["-y", "ask-gemini-mcp"]
  }
}
```

### Unified Server (all providers in one)

```bash
claude mcp add ask-llm -- npx -y ask-llm-mcp
```

The unified server auto-detects installed providers at startup and registers only available tools.

## Tool Reference

### ask-gemini

Send prompts to Google Gemini CLI. Supports @ file syntax for including files in context.

- **Package:** ask-gemini-mcp
- **Parameters:**
  - `prompt` (string, required): The question, code review request, or analysis task. Use @ syntax to include files (e.g., "@src/main.ts explain this code").
  - `model` (string, optional): Do not set unless user explicitly requests it. Default: gemini-3.1-pro-preview. Falls back to gemini-3-flash-preview on quota errors.
- **Returns:** Gemini's text response with optional stats footer (model, tokens, thinking tokens, session ID).
- **Annotations:** readOnlyHint=false, destructiveHint=false, openWorldHint=true

### ask-gemini-edit

Send a code edit request to Gemini and get structured OLD/NEW edit blocks. Gemini analyzes files and returns precise, applicable code changes.

- **Package:** ask-gemini-mcp
- **Parameters:**
  - `prompt` (string, required): Describe the code changes you want. Reference files with @ syntax.
  - `model` (string, optional): Default: gemini-3.1-pro-preview.
  - `includeDirs` (string[], optional): Additional directories to include in Gemini's context. Useful for monorepos.
- **Returns:** Structured edit format with OLD/NEW blocks, or chunked response with cache key for large edits.
- **Annotations:** readOnlyHint=false, destructiveHint=false, openWorldHint=true

### fetch-chunk

Retrieve subsequent chunks from cached large responses (used after ask-gemini-edit returns chunked output).

- **Package:** ask-gemini-mcp
- **Parameters:**
  - `chunkIndex` (number, required): 1-based index of the chunk to retrieve.
  - `chunkCacheKey` (string, required): Cache key returned by the original chunked response.
- **Returns:** The requested chunk content.
- **Annotations:** readOnlyHint=true, idempotentHint=true, openWorldHint=false

### ask-codex

Send prompts to OpenAI Codex CLI.

- **Package:** ask-codex-mcp
- **Parameters:**
  - `prompt` (string, required): The question, code review request, or analysis task.
  - `model` (string, optional): Do not set unless user explicitly requests it. Default: gpt-5.4. Falls back to gpt-5.4-mini on quota errors.
- **Returns:** Codex's text response.
- **Annotations:** readOnlyHint=false, destructiveHint=false, openWorldHint=true

### ask-ollama

Send prompts to a local Ollama LLM via HTTP. No API keys or network calls needed.

- **Package:** ask-ollama-mcp
- **Parameters:**
  - `prompt` (string, required): The question, code review request, or analysis task.
  - `model` (string, optional): Do not set unless user explicitly requests it. Default: qwen2.5-coder:7b. Falls back to qwen2.5-coder:1.5b if not found.
- **Returns:** Ollama's text response.
- **Annotations:** readOnlyHint=false, destructiveHint=false, openWorldHint=false
- **Environment:** Set OLLAMA_HOST to customize the Ollama server address (default: http://localhost:11434).

### ping

Test MCP server connectivity. Available in all packages.

- **Parameters:**
  - `message` (string, optional): A message to echo back.
- **Returns:** The echoed message or a default pong response. Ollama's ping lists locally available models.
- **Annotations:** readOnlyHint=true, idempotentHint=true, openWorldHint=false

## Models

### Gemini
| Model | Use Case |
|-------|----------|
| gemini-3.1-pro-preview | Default — best quality reasoning, 1M+ token context |
| gemini-3-flash-preview | Automatic fallback on quota errors — faster, large codebases |

### Codex
| Model | Use Case |
|-------|----------|
| gpt-5.4 | Default — highest capability |
| gpt-5.4-mini | Automatic fallback on quota errors |

### Ollama
| Model | Use Case |
|-------|----------|
| qwen2.5-coder:7b | Default — good balance of speed and capability |
| qwen2.5-coder:1.5b | Automatic fallback if 7b not available |

## Usage Patterns

### Code review (second opinion)
```
Ask Gemini to review @src/auth.ts for security issues
```

### Codebase analysis
```
Ask Gemini to summarize @. the current directory
```

### Architecture debate
```
Ask Codex: should we use a message queue or direct HTTP calls for this service?
```

### Local private review
```
Ask Ollama to review the changes in @src/payments.ts
```

### Multi-provider review (Claude Code plugin)
```
/multi-review
```
Launches Gemini and Codex reviews in parallel with consensus highlighting.

## Claude Code Plugin

### Installation

```
/plugin marketplace add Lykhoyda/ask-llm
/plugin install ask-llm@ask-llm-plugins
```

### Skills (slash commands)

| Skill | Description |
|-------|-------------|
| /multi-review | Parallel Gemini + Codex code review with validation pipeline and consensus highlighting |
| /gemini-review | Gemini-only code review with confidence filtering |
| /codex-review | Codex-only code review with confidence filtering |
| /ollama-review | Local Ollama code review — no data leaves machine |
| /brainstorm [providers] topic | Multi-LLM brainstorm (default: gemini,codex) |
| /brainstorm-all topic | Brainstorm with all three providers |

### Agents

| Agent | Color | Description |
|-------|-------|-------------|
| gemini-reviewer | cyan | 4-phase review: context, prompt, synthesis, validation |
| codex-reviewer | green | 4-phase review: context, prompt, synthesis, validation |
| ollama-reviewer | yellow | 4-phase review: context, prompt, synthesis, validation (local) |
| brainstorm-coordinator | magenta | Parallel multi-LLM consultation with synthesis |

### Hooks

| Hook | Trigger | Action |
|------|---------|--------|
| Stop | Session end | Sends worktree diff to Gemini for 3-bullet advisory review |
| PreToolUse (Bash) | Before git commit | Reviews staged changes, warns about critical issues |

## Error Handling

All MCP servers handle errors gracefully:
- **Quota errors:** Automatic fallback to cheaper model (Pro→Flash, gpt-5.4→mini, 7b→1.5b)
- **CLI not found:** Clear error message with installation instructions
- **Timeout:** 5-minute default, configurable via GMCPT_TIMEOUT_MS environment variable
- **Large responses:** Automatic chunking with fetch-chunk retrieval (Gemini only)

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| GMCPT_TIMEOUT_MS | 300000 (5 min) | Process timeout for CLI commands |
| GMCPT_LOG_LEVEL | warn | Log level: debug, info, warn, error |
| OLLAMA_HOST | http://localhost:11434 | Ollama server address |

## Links

- Source: https://github.com/Lykhoyda/ask-llm
- Docs: https://lykhoyda.github.io/ask-llm/
- npm (gemini): https://www.npmjs.com/package/ask-gemini-mcp
- npm (codex): https://www.npmjs.com/package/ask-codex-mcp
- npm (ollama): https://www.npmjs.com/package/ask-ollama-mcp
- npm (unified): https://www.npmjs.com/package/ask-llm-mcp
