Vibe Coding with Continue.dev

Local AI coding assistant with Ollama + VS Code

Replaces DGX Spark: Vibe Coding in VS Code

toolscoding

Basic idea

"Vibe coding" means using an AI assistant to write code by describing what you want in natural language, letting it handle the boilerplate and syntax while you focus on the logic and architecture. Continue.dev is an open-source VS Code extension that connects to your local Ollama models — providing tab autocomplete, an in-editor chat panel, and context-aware code generation, all running locally on your Mac with no data sent to any server.

This is a fully local alternative to GitHub Copilot. The tradeoff is quality vs privacy: a local 32B model is good but not at the level of GPT-4o, while the 7B autocomplete model is fast enough to feel real-time.

What you'll accomplish

VS Code with Continue.dev configured with two Ollama models: qwen2.5-coder:32b for high-quality chat and code generation, and qwen2.5-coder:7b for fast inline tab autocomplete. Plus nomic-embed-text for the @codebase feature that lets Continue search your entire codebase semantically. All responses are local — no API keys, no usage limits, no data leaving your machine.

What to know before starting

LLM inference latency for code completion: Tab autocomplete needs to feel instant — under 200ms to not break flow. A 7B model responds in ~100-150ms on M2 Pro. A 32B model takes 500-800ms — too slow for autocomplete but fine for chat.

Context window: Continue sends the current file, cursor position, and recently-viewed files as context with each request. Larger context = better suggestions, but more tokens = slower response. Continue automatically trims context to fit the model's window.

RAG for codebases: `@codebase` uses Retrieval-Augmented Generation — Continue indexes your codebase into a local vector database (using the embedding model), then retrieves the most relevant files for each query. This is how it can answer "how is authentication implemented?" over a large codebase.

Chat templates: Coding models are fine-tuned with specific prompt formatting (ChatML, Alpaca, etc.). Ollama handles this automatically — you don't need to configure it, but it's why using the Ollama provider matters.

Prerequisites

• VS Code installed (see VS Code playbook)

• Ollama running (`ollama serve`)

• `qwen2.5-coder:7b` pulled (for autocomplete)

• 16GB+ unified memory (32B chat model needs ~20GB)

Time & risk

Duration:: 15 minutes

Risk level:: None — extension can be disabled or uninstalled from VS Code Extensions panel at any time