# Dive

Dive is a Go library for building AI agents and LLM-powered applications. Use it
to build CLIs, add AI to back-end services, or run agents within workflow
orchestrators. Apache 2.0, Go 1.25+.

Dive gives you consistent access to 8+ LLM providers, a tool-calling system
aligned with Claude Code patterns, and an agent loop with hooks. Images,
documents, and structured output work across all providers. The library is
unopinionated. It has no hidden prompts or library-imposed behaviors. You
provide the system prompt and decide which tools and hooks to install.

A single `CreateResponse` call runs the full agent loop. It generates, calls
tools, feeds results back to the LLM, and repeats until the model produces a
final response or hits the iteration limit. Agents are stateless by default.
Set `AgentOptions.Session` to enable automatic history loading and saving, or
accumulate messages manually across calls.

Module: `github.com/deepnoodle-ai/dive`

Install: `go get github.com/deepnoodle-ai/dive`

Everything outside `experimental/` is stable. Everything inside may change.

## Credentials

Each provider reads its API key from an environment variable:

- `ANTHROPIC_API_KEY` - Anthropic
- `OPENAI_API_KEY` - OpenAI, OpenAI Completions
- `GEMINI_API_KEY` or `GOOGLE_API_KEY` - Google
- `XAI_API_KEY` or `GROK_API_KEY` - Grok
- `MISTRAL_API_KEY` - Mistral
- `OLLAMA_API_KEY` - Ollama (optional; defaults to "ollama")
- `OPENROUTER_API_KEY` - OpenRouter

## Agent with tools

```go
agent, _ := dive.NewAgent(dive.AgentOptions{
    SystemPrompt: "You are a senior software engineer.",
    Model:        anthropic.New(),
    Tools: []dive.Tool{
        toolkit.NewReadFileTool(),
        toolkit.NewTextEditorTool(),
        toolkit.NewBashTool(),
    },
})
response, _ := agent.CreateResponse(ctx, dive.WithInput("Fix the failing test"))
fmt.Println(response.OutputText())
```

## Streaming events

```go
agent.CreateResponse(ctx,
    dive.WithInput("Generate a report"),
    dive.WithEventCallback(func(ctx context.Context, item *dive.ResponseItem) error {
        switch item.Type {
        case dive.ResponseItemTypeModelEvent:
            if item.Event != nil && item.Event.Delta != nil {
                fmt.Print(item.Event.Delta.Text)
            }
        case dive.ResponseItemTypeToolCall:
            fmt.Printf("Tool: %s\n", item.ToolCall.Name)
        case dive.ResponseItemTypeToolCallResult:
            if item.ToolCallResult.Result != nil {
                for _, c := range item.ToolCallResult.Result.Content {
                    fmt.Printf("Result: %s\n", c.Text)
                }
            }
        }
        return nil
    }),
)
```

## Direct LLM usage (no agent loop)

```go
model := google.New(google.WithModel("gemini-3-flash-preview"))
response, _ := model.Generate(ctx,
    llm.WithMessages(llm.NewUserMessage(
        llm.NewTextContent("What is in this image?"),
        llm.NewImageContent(llm.ContentURL("https://example.com/photo.jpg")),
    )),
    llm.WithMaxTokens(1024),
)
fmt.Println(response.Message().Text())
```

## Custom tool (FuncTool — recommended for simple tools)

Use `FuncTool` for tools without struct state. Schema is auto-generated from struct tags:

```go
type WeatherInput struct {
    City  string `json:"city" description:"City name"`
    Units string `json:"units,omitempty" description:"Temperature units" enum:"celsius,fahrenheit"`
}

weatherTool := dive.FuncTool("get_weather", "Get current weather for a city",
    func(ctx context.Context, input *WeatherInput) (*dive.ToolResult, error) {
        return dive.NewToolResultText(fmt.Sprintf("72°F in %s", input.City)), nil
    },
    dive.WithFuncToolAnnotations(&dive.ToolAnnotations{ReadOnlyHint: true}),
)
```

## Custom tool (TypedTool — for tools with struct state)

Use `TypedTool[T]` when the tool needs dependencies (DB clients, API clients, config):

```go
type WeatherTool struct{
    DB *sql.DB
}
type WeatherInput struct {
    City string `json:"city"`
}

func (t *WeatherTool) Name() string        { return "get_weather" }
func (t *WeatherTool) Description() string { return "Get current weather for a city" }
func (t *WeatherTool) Annotations() *dive.ToolAnnotations {
    return &dive.ToolAnnotations{ReadOnlyHint: true}
}
func (t *WeatherTool) Schema() *dive.Schema {
    return &dive.Schema{
        Type: "object", Required: []string{"city"},
        Properties: map[string]*dive.SchemaProperty{
            "city": {Type: "string", Description: "City name"},
        },
    }
}
func (t *WeatherTool) Call(ctx context.Context, input WeatherInput) (*dive.ToolResult, error) {
    var temp float64
    err := t.DB.QueryRowContext(ctx, "SELECT temp FROM weather WHERE city = ?", input.City).Scan(&temp)
    if err != nil {
        return dive.NewToolResultError(fmt.Sprintf("city not found: %s", input.City)), nil
    }
    return dive.NewToolResultText(fmt.Sprintf("%.1f°F in %s", temp, input.City)), nil
}

// Register: dive.ToolAdapter(&WeatherTool{DB: db})
```

## Dynamic tools (Toolset)

Use `Toolset` for tools resolved at runtime (MCP servers, permission-filtered, etc.):

```go
agent, _ := dive.NewAgent(dive.AgentOptions{
    Model: anthropic.New(),
    Toolsets: []dive.Toolset{
        &dive.ToolsetFunc{
            ToolsetName: "mcp-tools",
            Resolve: func(ctx context.Context) ([]dive.Tool, error) {
                return discoverMCPTools(ctx)
            },
        },
    },
})
```

## Hooks

All hooks receive `*dive.HookContext`. Hooks are grouped in the `dive.Hooks` struct on `AgentOptions`.

```go
agent, _ := dive.NewAgent(dive.AgentOptions{
    Model: anthropic.New(),
    Hooks: dive.Hooks{
        PreGeneration: []dive.PreGenerationHook{
            func(ctx context.Context, hctx *dive.HookContext) error {
                hctx.SystemPrompt += "\nToday is Monday."
                return nil
            },
        },
        PreToolUse: []dive.PreToolUseHook{
            func(ctx context.Context, hctx *dive.HookContext) error {
                if hctx.Tool.Annotations() != nil && hctx.Tool.Annotations().ReadOnlyHint {
                    return nil // allow read-only tools
                }
                return fmt.Errorf("tool %s requires approval", hctx.Tool.Name())
            },
        },
        PostGeneration: []dive.PostGenerationHook{
            dive.UsageLogger(func(usage *llm.Usage) {
                slog.Info("done", "in", usage.InputTokens, "out", usage.OutputTokens)
            }),
        },
    },
})
```

Hook types:
- `PreGenerationHook` — runs before the LLM generation loop
- `PostGenerationHook` — runs after the generation loop completes
- `PreToolUseHook` — runs before each tool execution (can modify input via `hctx.UpdatedInput`)
- `PostToolUseHook` — runs after a tool call succeeds
- `PostToolUseFailureHook` — runs after a tool call fails
- `StopHook` — runs when the agent is about to stop; can return `Continue: true` to re-enter the loop
- `PreIterationHook` — runs before each LLM call within the loop
- `OnSuspendHook` — runs when a tool returns `SuspendResult`, before `PostGeneration` and before the suspended turn is persisted

Hook flow: PreGeneration -> [PreIteration -> LLM -> PreToolUse -> Execute -> PostToolUse]* -> Stop -> PostGeneration
On suspend: OnSuspend -> PostGeneration -> persist -> return

Hook helpers: `InjectContext`, `CompactionHook`, `UsageLogger`, `UsageLoggerWithSlog`, `MatchTool`, `MatchToolPost`, `MatchToolPostFailure`

## Extensions

Extensions bundle tools, hooks, and system prompt rules into a composable unit.
Any type implementing `dive.Extension` can be passed to `AgentOptions.Extensions`.
`NewAgent` merges all extensions in order after direct fields.

```go
type Extension interface {
    Tools() []Tool
    Hooks() Hooks
    Rules() string
}
```

## Skills

Skills are markdown-based instruction sets that extend agent behavior. They can
be auto-invoked by the agent based on triggers/descriptions or manually triggered
by users via `/name` syntax. Place skill files in `.dive/skills/`, `.claude/skills/`,
or `.agents/skills/`.

The skill `Loader` implements `dive.Extension`:

```go
skills, _ := skill.Load(ctx, skill.LoaderOptions{
    ProjectDir:     ".",
    ShellExpansion: true, // enable !{command} substitution
})

agent, _ := dive.NewAgent(dive.AgentOptions{
    Model:      anthropic.New(),
    Tools:      tools,
    Extensions: []dive.Extension{skills},
})
```

Skills support variable expansion (`$ARGUMENTS`, `$1`-`$9`, `!{command}`),
trigger matching (keyword and regex), and pluggable providers for loading
from custom sources. See [Skills Guide](docs/guides/skills.md).

## Key types

- `dive.Agent` - created via `dive.NewAgent(AgentOptions)`, runs the generate-call-repeat loop
- `dive.Extension` - interface: Tools, Hooks, Rules; composable agent capability bundle; set on AgentOptions.Extensions
- `dive.Session` - interface: ID, Messages, SaveTurn; set on AgentOptions or pass via WithSession
- `dive.Tool` - interface: Name, Description, Schema, Annotations, Call
- `dive.TypedTool[T]` - generic tool with typed input; wrap with `dive.ToolAdapter()`
- `dive.FuncTool[T]` - create a Tool from a function with auto-generated schema from T's struct tags
- `dive.Toolset` - interface: Name, Tools(ctx) — dynamic tool resolution per LLM request
- `dive.ToolsetFunc` - adapts a function into a Toolset
- `dive.Response` - returned by CreateResponse; use OutputText() or Items
- `dive.ResponseItem` - message, tool_call, tool_call_result, or model_event
- `dive.ToolCallResult` - completed tool call; fields: ID (string), Name (string), Input (any), Result (*ToolResult), Error (error)
- `dive.ToolResult` - tool output; tagged union (regular fields XOR `Suspend`); create with NewToolResultText(), NewToolResultError(), or NewSuspendResult()
- `dive.SuspensionState` - payload describing a suspended turn; carried on `Response.Suspension`
- `dive.PendingToolCall` - tool call awaiting external result; fields: ID, Name, Input, Prompt, Metadata
- `dive.SuspendableSession` - optional `Session` extension: LoadSuspension, SaveSuspendedTurn, SaveResumedTurn
- `llm.LLM` - interface: Name, Generate
- `llm.StreamingLLM` - extends LLM with Stream()
- `llm.Message` - create with NewUserTextMessage(), NewUserMessage(), NewAssistantMessage()
- `llm.Content` - interface; primary types listed below
- `dive.Dialog` - interface for user prompts; built-ins: AutoApproveDialog, DenyAllDialog

## ModelSettings

Set via `AgentOptions.ModelSettings`. All fields are optional (pointer types).

```go
&dive.ModelSettings{
    MaxTokens:         dive.Ptr(16000),
    Temperature:       dive.Ptr(0.7),
    ReasoningBudget:   dive.Ptr(4000),        // token budget for extended thinking
    ReasoningEffort:   llm.ReasoningEffortHigh, // or Medium, Low
    Caching:           dive.Ptr(true),         // enable prompt caching
    ParallelToolCalls: dive.Ptr(true),
    PresencePenalty:   dive.Ptr(0.1),
    FrequencyPenalty:  dive.Ptr(0.1),
}
```

## Response

`dive.Response` is returned by `agent.CreateResponse()`.

```go
response.OutputText()       // text from the last assistant message
response.ToolCallResults()  // all tool call results
response.Items              // []*ResponseItem in chronological order
response.OutputMessages     // []*llm.Message — assistant + tool result messages for conversation continuation
response.Usage              // *llm.Usage (InputTokens, OutputTokens, CacheCreationInputTokens, CacheReadInputTokens)
response.Model              // model name string
response.CreatedAt          // time.Time
response.FinishedAt         // *time.Time
```

Each `ResponseItem` has a `Type` and one populated field:

- `ResponseItemTypeMessage` - `item.Message` (*llm.Message)
- `ResponseItemTypeToolCall` - `item.ToolCall` (*llm.ToolUseContent)
- `ResponseItemTypeToolCallResult` - `item.ToolCallResult` (*dive.ToolCallResult)
- `ResponseItemTypeModelEvent` - `item.Event` (*llm.Event, for streaming deltas)

## Sessions

Sessions provide persistent conversation state. The agent automatically loads
history before generation and saves new messages after.

```go
// In-memory session
sess := session.New("my-session")
agent, _ := dive.NewAgent(dive.AgentOptions{
    SystemPrompt: "You are a helpful assistant.",
    Model:        anthropic.New(),
    Session:      sess,
})

resp, _ := agent.CreateResponse(ctx, dive.WithInput("Hi, my name is Alice."))
resp, _ = agent.CreateResponse(ctx, dive.WithInput("What's my name?"))
// "Your name is Alice."

// Persistent session (JSONL files)
store, _ := session.NewFileStore("~/.myapp/sessions")
sess, _ := store.Open(ctx, "my-session")
agent, _ := dive.NewAgent(dive.AgentOptions{
    Model:   anthropic.New(),
    Session: sess,
})

// Per-call override (one agent, many sessions)
resp, _ := agent.CreateResponse(ctx,
    dive.WithInput("Hello"),
    dive.WithSession(userSession),
)
```

Session types (`session` package):
- `session.New(id)` - in-memory session
- `session.NewMemoryStore()` - in-memory store
- `session.NewFileStore(dir)` - JSONL file store
- `store.Open(ctx, id)` - open/create session connected to store
- `store.Put(ctx, sess)` - save session (e.g. after Fork)
- `store.List(ctx, opts)` - list session summaries
- `store.Delete(ctx, id)` - delete session
- `sess.Fork(newID)` - deep-copy into new session
- `sess.Compact(ctx, summarizer)` - replace events with summary
- `sess.TotalUsage()` - sum token usage
- `session.ForkSession(ctx, store, fromID, newID)` - fork + persist

## Suspend and resume

A tool can pause the agent mid-turn by returning `dive.NewSuspendResult(prompt, metadata)`.
`CreateResponse` returns `(*Response, nil)` with `Status == ResponseStatusSuspended` and
`Suspension *SuspensionState` describing the pending tool calls, completed siblings, and
the in-progress turn messages. The caller resumes later — possibly in a different process —
by invoking `CreateResponse` again with `WithToolResults` (session-backed) or
`WithResume(state, results)` (stateless). Partial resumes are allowed: supply a subset of
pending results and the agent re-suspends with a shrunk pending list.

```go
// Tool signals suspension
func (t *DeployTool) Call(ctx context.Context, in DeployInput) (*dive.ToolResult, error) {
    return dive.NewSuspendResult("Approve deploy?", map[string]any{"req": in.ID}), nil
}

// Observe a suspended response
resp, _ := agent.CreateResponse(ctx, dive.WithInput("Please deploy v1.4.2"))
if resp.Status == dive.ResponseStatusSuspended {
    for _, p := range resp.Suspension.PendingToolCalls {
        // route p.ID + p.Input + p.Prompt to a review queue and return
    }
    return
}

// Resume later (session-backed)
final, _ := agent.CreateResponse(ctx, dive.WithToolResults(map[string]*dive.ToolResult{
    pendingID: dive.NewToolResultText("approved"),
}))

// Resume later (stateless — no Session)
final, _ := agent.CreateResponse(ctx,
    dive.WithMessages(preHistory...),
    dive.WithResume(savedSuspensionState, results),
)
```

Key types and helpers:
- `dive.SuspendResult{Prompt, Metadata}` — signal returned from a tool; construct with `NewSuspendResult`
- `dive.ToolResult.Suspend` — tagged union; never set alongside `Content`/`Display`/`IsError`
- `dive.Response.Status` — `ResponseStatusCompleted` or `ResponseStatusSuspended`
- `dive.Response.Suspension *SuspensionState` — `PendingToolCalls`, `CompletedToolCalls`, `TurnMessages`
- `dive.PendingToolCall.UnmarshalInput(&v)` / `dive.DecodePendingInput[T](p)` — decode original tool input
- `dive.WithResume(state, results)` / `dive.WithToolResults(results)` — resume options
- `dive.OnSuspendHook` — fires before persistence; abort to prevent the transition
- `dive.SuspendableSession` — optional `Session` extension for auto-persistence
- `session.ListOptions{Suspended: dive.Ptr(true)}` — sweep for stale suspended sessions
- Errors: `ErrResumeRequired`, `ErrInputOnSuspendedSession`, `ErrNoSuspendedTurn`, `ErrUnknownPendingToolCall`
- Streaming: a terminal `ResponseItemTypeSuspended` item is emitted carrying the same pending/completed lists

See [Suspend & Resume Guide](docs/guides/suspend-resume.md) and the `examples/suspend/` directory.

## Multi-turn without sessions

Accumulate messages manually using `response.OutputMessages`:

```go
var messages []*llm.Message
resp, _ := agent.CreateResponse(ctx, dive.WithInput("Hi, my name is Alice."))
messages = append(messages, llm.NewUserTextMessage("Hi, my name is Alice."))
messages = append(messages, resp.OutputMessages...)
messages = append(messages, llm.NewUserTextMessage("What's my name?"))
resp, _ = agent.CreateResponse(ctx, dive.WithMessages(messages...))
```

`OutputMessages` includes both assistant messages and tool result messages
in the correct order for the LLM.

## Content types (llm/)

All implement `llm.Content` and are used in `llm.Message.Content`.

- `TextContent` - Plain text (most common)
- `ImageContent` - Image from URL or inline bytes
- `DocumentContent` - Document (PDF, etc.) from URL or inline bytes, with optional citations
- `ToolUseContent` - Tool call requested by the model (ID, Name, Input)
- `ToolResultContent` - Result returned to the model after a tool call
- `ThinkingContent` - Extended thinking / chain-of-thought from the model
- `RedactedThinkingContent` - Encrypted thinking flagged by safety systems
- `RefusalContent` - Model declined to respond
- `SummaryContent` - Compacted conversation summary replacing full history

## Providers

All support tool calling. Each self-registers via init(). Providers with their own go.mod require a separate `go get`.

- `providers/anthropic` - Claude models
- `providers/openai` - OpenAI Responses API (own go.mod)
- `providers/google` - Gemini models (own go.mod)
- `providers/openaicompletions` - OpenAI Chat Completions API
- `providers/grok` - X.AI Grok models
- `providers/mistral` - Mistral models
- `providers/ollama` - Local models via Ollama
- `providers/openrouter` - Multi-provider proxy

## Built-in tools (toolkit/)

All constructors return `*dive.TypedToolAdapter[T]` (satisfies `dive.Tool`).

- `NewReadFileTool` - Read file contents with optional offset/limit for large files
- `NewWriteFileTool` - Create or overwrite a file
- `NewEditTool` - Exact string replacement in files (unique match or replace_all)
- `NewGlobTool` - Find files matching glob patterns (**, *, ?, {a,b})
- `NewGrepTool` - Search file contents with regex, glob filters, and context lines
- `NewListDirectoryTool` - List directory contents
- `NewBashTool` - Execute shell commands with optional timeout
- `NewTextEditorTool` - Multi-command editor: view, create, str_replace, insert
- `NewWebFetchTool` - Fetch webpage contents as markdown
- `NewWebSearchTool` - Web search returning URLs, titles, and descriptions
- `NewAskUserTool` - Collect user input: confirm, select, multiselect, or free-form text

## Docs and examples

- [README](README.md)
- [Quick Start](docs/guides/quick-start.md)
- [Agents](docs/guides/agents.md)
- [Tools](docs/guides/tools.md)
- [Custom Tools](docs/guides/custom-tools.md)
- [Hooks](docs/guides/hooks.md)
- [Suspend & Resume](docs/guides/suspend-resume.md)
- [LLM Providers](docs/guides/llm-guide.md)
- [Permissions](docs/guides/permissions.md)
- [Skills](docs/guides/skills.md)
- [Tracing](docs/guides/tracing.md) - OpenTelemetry adapter (separate module: `github.com/deepnoodle-ai/dive/otel`)
- [GoDoc](https://pkg.go.dev/github.com/deepnoodle-ai/dive)
- [examples/](examples/) - runnable examples for each provider and feature

## Experimental (experimental/)

Compaction, Subagent, Sandbox, MCP, Settings, Todo, extended Toolkit, CLI.

Guides: docs/guides/experimental/

## Related projects

- [Wonton](https://github.com/deepnoodle-ai/wonton) - companion Go library for CLIs: TUI framework, HTML-to-Markdown, HTTP utilities
- [Workflow](https://github.com/deepnoodle-ai/workflow) - lightweight Go library for composing multi-step agent workflows
