How to Run LLMs Locally on Your Old Computer (Yes, It's Possible)
Hey folks! If you’re a student or junior dev and think running LLMs locally requires an RTX 4090, I’ve got good news: it doesn’t.
The truth is the open source community has made massive progress in recent months. Today there are small, optimized, and quantized models that run on pure CPU—yes, that 8GB RAM laptop you use for college.
In this article, I’ll show you how to set everything up from scratch, which models to choose for your hardware, and how to have a 100% local and private coding assistant.
Why Run Local?
Before we dive in, let’s understand the benefits:
- Total privacy — Your code never leaves your machine
- No recurring costs — No API fees, tokens, or billing surprises
- Works offline — Perfect for unstable internet connections
- Learning — You actually understand how LLMs work under the hood
What You’ll Need
Don’t worry, nothing crazy:
| Hardware | Minimum | Recommended |
|---|---|---|
| RAM | 4GB | 8GB+ |
| Storage | 5GB free | 20GB+ |
| CPU | Any x64 | Recent multi-core |
| GPU | Not required | Any (helps but not mandatory) |
If your computer boots and runs VS Code, it can probably run a small LLM.
Step 1: Installing Ollama
Ollama is the simplest way to run LLMs locally. It handles everything: model downloads, quantization, and chat interface.
On macOS/Linux:
curl -fsSL https://ollama.ai/install.sh | sh
On Windows:
Download the installer from ollama.ai and follow the wizard.
To verify installation:
ollama --version
Step 2: Choosing the Right Model for Your Hardware
Here’s the secret. There’s no point downloading the most hyped model if it’s going to freeze your computer.
If You Have 4GB RAM (Very Basic Computer)
ollama run gemma:2b
Google’s Gemma 2B weighs ~1.4GB and works surprisingly well for simple tasks.
If You Have 8GB RAM (Average Laptop)
ollama run phi3:mini
Microsoft’s Phi-3 Mini has 3.8B parameters in just ~2.3GB. Excellent for code.
Another option:
ollama run llama3.2:3b
Meta’s Llama 3.2 3B is great for general use.
If You Have 16GB+ RAM
ollama run codellama:7b
Now you can run larger code-focused models.
Step 3: Testing Your Model
After downloading, just run:
ollama run phi3:mini
A chat interface opens in your terminal. Test something simple:
>>> Explain recursion in Python with an example
If it responded, congrats! You have an LLM running locally.
Step 4: Integrating with Your Editor
Terminal chat is nice, but the real power is in workflow integration.
VS Code with Continue
- Install the Continue extension
- Configure it to use local Ollama
- Select the model you downloaded
Now you have AI autocomplete and chat directly in VS Code, 100% local.
Neovim
If you use Neovim, the ollama.nvim plugin handles integration.
Performance Tips
A few things that make a difference:
- Close other programs — LLMs use a lot of RAM
- Use quantized models — Ollama does this automatically
- Prefer smaller models — A fast 3B beats a laggy 7B
- Monitor RAM usage — If it goes past 90%, the model will slow down
Real-World Performance Comparison
Tested on my old laptop (i5 8th gen, 8GB RAM, no dedicated GPU):
| Model | Tokens/second | Usability |
|---|---|---|
| Gemma 2B | ~15 t/s | Very smooth |
| Phi-3 Mini | ~8 t/s | Good |
| Llama 3.2 3B | ~6 t/s | Acceptable |
| CodeLlama 7B | ~2 t/s | Slow but works |
For context: 8 tokens/second is already comfortable for interactive use.
What to Expect (And What Not to Expect)
Works well for:
- Explaining code
- Generating simple functions
- Answering syntax questions
- Suggesting small refactors
Don’t expect:
- GPT-4 level performance
- Huge context (small models cap at ~4k tokens)
- Generating entire projects
But look, for a student learning to code? It’s more than enough.
Next Steps
Once you get the hang of it:
- Try other models —
ollama listshows available ones - Create custom prompts — Modelfiles customize behavior
- Explore local RAG — Add your own documents as context
- Contribute to the community — Share your findings
The Bottom Line
Running LLMs locally is no longer a privilege for those with expensive hardware. With Ollama and optimized models, any student can have a private, free, and offline coding assistant.
The secret is choosing the right model for your hardware. Start small, test, and scale up as your computer can handle it.
References:
- Ollama - Run Large Language Models Locally
- Best Local LLMs for Low-End Computers 2025
- Small Local LLMs Under 8GB RAM
Questions or want to share your setup? Find me on Twitter or LinkedIn.
Happy building!