Agent harness. BYOLLM.
Runs on your machine.

Agentic, private, zero telemetry. Powered by Ollama, LM Studio, Anthropic, OpenAI, Groq, Mistral, Gemini, and any OpenAI-compatible API.

Install for VS Code View on GitHub

Everything you need, nothing you don't

💬

Chat

Streaming chat with PLAN and BUILD modes. Multiple sessions, compact history, export to Markdown. Type / for filterable slash commands — explain, refactor, commit, and more.

🤖

Agentic Loop

Enable the ⚡ Tools toggle in BUILD mode and the model reads, writes, searches, and runs terminal commands in your workspace. Off by default — turn it on when you need it. Per-action approval for destructive operations.

🔍

RAG

Your codebase is indexed with BM25 + optional semantic embeddings. Relevant files are automatically attached to every message.

✦

Inline Autocomplete

Ghost-text completions as you type. Adaptive debounce, word-by-word accept, per-language model routing.

✏️

Inline Edit

Select code, press Ctrl+Shift+I, describe what you want. Review the diff, accept or reject.

🔌

MCP Support

Connect any Model Context Protocol server. Tools are namespaced and gated behind per-action approval.

📎

@ Context

Attach files, selected code, git diffs, terminal output, web pages, and VS Code diagnostics with @.

📓

Notebook Support

Jupyter notebooks are indexed and readable by the agent. Cell content is extracted and understood as code.

👁️

Vision

Paste or upload images into chat. Works with Ollama vision models (llava, qwen2-vl) and all cloud providers that support it.

🧠

Memory

Persistent instructions injected into every chat. Auto-saves as you type. Amber dot on the icon when active.

🎙️

Voice Input

Speak your prompts. Audio is transcribed on-device using Whisper — nothing is sent to a server. Six models from Tiny EN (~40 MB) to Small (~244 MB). Push-to-talk with model pre-warming for instant first use.

Floating Panel

Pop Grom out of the sidebar into a standalone window. Ideal for multi-monitor setups — keep your file tree and editor visible while Grom floats on a second screen. Persists across restarts.

Voice Input

Grom can transcribe your voice directly into the prompt input — entirely on your device, with no cloud involvement.

Enable the mic button

Open Settings → Voice Input and toggle Enable voice input. The mic button appears in the toolbar.

Download ffmpeg (once)

Grom uses ffmpeg for audio capture. On first use it offers to download a ~50 MB static binary into its own storage folder — nothing is added to your system PATH. Remove it anytime from Settings → Voice Input.

Choose your Whisper model

Pick from six models in Settings → Voice Input — from Tiny EN (~40 MB, fast) up to Small (~244 MB, best accuracy). English-only .en variants are more accurate for English speakers. Models download on demand and are cached locally. Download multiple and switch at any time.

Record

Click the mic button or press Ctrl+Shift+M to start recording. Click again to stop — the transcript is appended to your prompt. The model pre-warms on startup so the first utterance transcribes without delay.

Audio is transcribed entirely on your device using Transformers.js and OpenAI Whisper. Nothing leaves your machine. Voice input is optional and designed for those who want or need it as an accessibility tool.

Platform support: Windows (DirectShow), macOS (avfoundation), Linux (PulseAudio / PipeWire / ALSA).

Floating Panel

Pop Grom out of the sidebar into its own window — perfect for multi-monitor setups where you want Grom on a second screen without sacrificing your file tree.

Click the float button

Press the button in the Grom header to detach the panel into a standalone window. The sidebar shows a floating pill and a banner confirming it's active.

Use Grom normally

Chat, voice input, model switching, PLAN and BUILD modes — everything works in the floating window. Grom's icon switches to the cloud variant so you always know which panel is live.

Close when done

Close the floating window or click Close floating in the sidebar banner to return to the sidebar. The panel persists across VS Code restarts and reopens automatically.

Your choice of model

Local models work out of the box with no account required. Cloud providers are optional — bring your own key.

Provider	Notes
Ollama	Local — `127.0.0.1:11434`, recommended for privacy
LM Studio	Local — `127.0.0.1:1234`
Open Code	`api.opencode.ai` — API key stored in OS keychain
OpenAI	GPT-4o, o1, o3-mini — API key stored in OS keychain
Anthropic	Claude Sonnet, Claude Opus — API key stored in OS keychain
Groq	Llama 3, Mixtral — fast cloud inference
Mistral	Mistral Large, Small, Codestral
Gemini	Gemini 2.5 Pro, Flash — Google AI API
Custom	Any OpenAI-compatible or Anthropic-compatible endpoint

API keys are stored in the OS keychain (Windows Credential Manager, macOS Keychain, libsecret on Linux) — never in settings.json.

@ Context mentions

Mention	What it includes
`@selection`	Currently selected text in the active editor
`@filename`	Any workspace file — open tabs shown first
`@problems`	All current VS Code errors and warnings
`@git`	Your current uncommitted diff
`@terminal`	Recent output from the integrated terminal
`@url:https://...`	Fetches and strips a web page
`@docs`	Searches all indexed documentation sources
`@docs:name`	Searches a specific documentation source by name

Setting up documentation sources

Add URLs to grom.docSources in your VS Code settings. Grom crawls each URL, follows same-origin links (up to 40 pages per source), and builds a searchable index. Any HTTP/HTTPS URL works — external doc sites or a local dev server running on localhost.

"grom.docSources": [
  { "name": "react",  "url": "https://react.dev/reference" },
  { "name": "mdn",    "url": "https://developer.mozilla.org/en-US/docs/Web/API" },
  { "name": "mylib",  "url": "http://localhost:3000/docs" }
]

Each entry needs a name (short identifier used with @docs:name) and a url (the root page to start crawling from). Grom follows links within the configured path only, up to 40 pages per source. Indexing runs at startup and whenever the setting changes.

Note: Grom fetches pages directly and does not run JavaScript. Sites that render content client-side (pure SPAs) will yield little or no usable text. Use a server-side rendered URL, a static export, or a local dev server instead.

Quick start

Install Grom

Search for Grom in the VS Code Extensions panel, or install from the marketplace.

Pick a provider

For local: install Ollama and pull a model — ollama pull qwen2.5-coder. For cloud: select Anthropic, OpenAI, Groq, or Mistral and add your API key when prompted.

Start chatting

Open the Grom panel from the activity bar. Switch to BUILD mode and hit the ⚡ Tools button when you want Grom to read and write files in your workspace.

Agent harness. BYOLLM.Runs on your machine.

Everything you need, nothing you don't

Chat

Agentic Loop

RAG

Inline Autocomplete

Inline Edit

MCP Support

@ Context

Notebook Support

Vision

Memory

Voice Input

Floating Panel

Voice Input

Enable the mic button

Download ffmpeg (once)

Choose your Whisper model

Record

Floating Panel

Click the float button

Use Grom normally

Close when done

Your choice of model

@ Context mentions

Setting up documentation sources

Quick start

Install Grom

Pick a provider

Start chatting

Agent harness. BYOLLM.
Runs on your machine.