what is accuretta?
a local-first LLM wrapper. it runs entirely on your machine — no cloud API, no telemetry, no account. the model lives in llama.cpp, the bridge is a small python server, the UI is served on localhost:8787 and is reachable from any device on your LAN at http://<your-ip>:8787.
what are the modes?
IDE — model replies with one HTML document and the right pane renders it live, saving every version. good for building landing pages, mocks, dashboards.
Agent — model calls tools to list files, read, write, run powershell. every write/run asks for your approval first.
Auto — bridge picks IDE vs. Agent based on what you asked.
what is sniff mothership?
a one-click prompt under the button that asks the model to scan this machine's active TCP/UDP connections, listening ports, and recent DNS cache, then flag anything weird (unknown processes phoning home, suspicious remote IPs, etc.). a chart card renders inline showing connection counts, top processes, and top remote endpoints — favicons fetched from DuckDuckGo. all data is gathered locally via PowerShell — nothing leaves your machine.
does it render markdown?
yes — pipe tables, ATX headings (#…######), ordered/unordered lists, horizontal rules, inline **bold**/*italic*/`code`, and fenced code blocks all render properly in chat bubbles.
can i preview an existing HTML file from the workspace?
yes. add a folder in the Workspace panel, expand it, and click the lightning bolt next to any .html file. the page loads into the right pane with its real CSS, JS, and images intact. the bridge serves files through /api/wsfs/<token>/<rel> with strict path-traversal hardening — the resolved path is checked against the workspace root via os.path.commonpath on every request, so the iframe (and any script in it) can never reach a file outside that folder. symlinks, .. escapes, absolute paths, and ignored files (.accurettaignore) are all rejected. external CDN scripts still load normally — turn off "Allow web resources" in Settings to lock that down too.
can i syntax-check a python file?
yes — click the checkmark next to any .py file in the workspace tree. the bridge runs compile(code, name, "exec") on it, which builds the AST and validates structure without executing a single line — no imports run, no side effects, no risk. you get a green banner if it parses, or a red one with line + column + message and the offending line highlighted. doesn't catch import errors, type errors, or runtime errors — that would need to actually run the code, which is intentionally out of scope.
how do i fit a 35B MoE model on a 12 GB GPU?
the one-click path: open Settings, pick the model, pick your VRAM budget under Auto-tune (it tries to detect your GPU automatically), click Suggest values for current model. the suggester reads the GGUF header directly — actual layer count, expert count, GQA config — so the math is exact, not eyeballed. it fills in context, GPU layers, KV cache quantization, batch size, and --n-cpu-moe (the killer flag that keeps N expert tensors on system RAM so the rest fits in VRAM). it also disables speculative decoding for MoE models because public benchmarks show it's net-negative there. save and the model reloads.
if the suggester sees that the model needs more than half its layers offloaded, it shows a quant downshift banner pointing at a smaller quant from the same repo — usually a 3-5x throughput win. the willbnu/Qwen3-16G repo is a good reference: their 27-30B-A3B at Q3_K_S runs all-GPU on a 16 GB card at 80-130 tok/s, vs. Q4_K_M with offload at 30-50 tok/s. same model, same card.
everything is overrideable in the Advanced llama-server section right below the suggester. anything the UI doesn't surface goes in the Extra flags field, e.g. --alias my-model --rope-scaling linear.
what is ACCURETTA.md?
a scan of your machine (OS, known folders, drives) written to data/ACCURETTA.md on first boot. it's injected into every chat so the model knows where your Desktop, Documents, Screenshots live. edit it freely — it's re-read each turn. rescan from Settings.
why tokens and not streaming text?
the model itself decides when to speak. small local models often spend their whole reply budget inside <think> tags and never produce a visible answer. bump Max reply tokens in Settings if that happens, or switch to a different model.
how is my data stored?
everything lives in ./data: chat history, workspace, settings, versions, system context. nothing leaves the machine unless you explicitly allow it.
how does memory work in accuretta?
every local model gets three layers of memory:
working memory — the live context window for this turn. shown as the ring gauge in the sidebar. when it fills up, the bridge silently drops the oldest non-system messages so you never have to start a new chat. hover any assistant bubble's meta line to see how many tokens that reply used.
short-term memory — the model's prior <think> reasoning, kept alive between turns inside the same chat. most local chat templates (Qwen, DeepSeek, GLM…) discard prior thinking by default; accuretta rewraps it as a "short-term memory" entry so the model's own train of thought survives. shown inline as a small italic block with a notepad icon. toggle with preserve_prior_thinking in data/settings.json.
long-term memory — durable lessons the model saves across sessions via the remember tool: a working command, a file layout, a user preference. manage them in Settings → Long-term memory.
what does the working-memory ring show?
how full the current session's prompt is relative to num_ctx. the count is conservative (≈3 chars per token) so it errs on the safe side. once it hits the outer ring, the bridge silently drops the oldest non-system messages to stay under budget — you don't need to start a new session.
what is kv cache quant?
controls the precision of the model's memory cache. q4_0 uses half the VRAM of q8_0 with barely noticeable quality loss — ideal for 16GB cards. f16 is best quality but uses full VRAM.
does the composer save drafts?
yes — whatever you type is saved to browser storage per chat. if you refresh the page or switch chats, your draft is restored when you come back. it is cleared when you hit send.
keyboard shortcuts?
⌘⏎ / Ctrl⏎ send · Shift⏎ newline · Ctrl+K / ⌘K session switcher · click the model pill (bottom-right of composer) to swap models.
on mobile / iPad?
the UI adapts: composer chips collapse into the button, send/stop fill the bottom row, and tap targets follow the iOS 44px minimum. swipe left or right on the sidebar to dismiss it back to the chat. the preview pane is hidden on mobile — open the page on a desktop browser to use IDE mode.
can i use a different browser?
yes. start.bat firefox launches in Firefox, start.bat chrome / edge / brave / opera / vivaldi all work. start.bat none skips auto-launch entirely. the value can also be set as an ACCURETTA_BROWSER environment variable.