How to Run LLMs Locally on Your Old Computer (Yes, It's Possible)

Hey folks! If you’re a student or junior dev and think running LLMs locally requires an RTX 4090, I’ve got good news: it doesn’t.

The truth is the open source community has made massive progress in recent months. Today there are small, optimized, and quantized models that run on pure CPU—yes, that 8GB RAM laptop you use for college.

In this article, I’ll show you how to set everything up from scratch, which models to choose for your hardware, and how to have a 100% local and private coding assistant.

Why Run Local?

Before we dive in, let’s understand the benefits:

Total privacy — Your code never leaves your machine
No recurring costs — No API fees, tokens, or billing surprises
Works offline — Perfect for unstable internet connections
Learning — You actually understand how LLMs work under the hood

What You’ll Need

Don’t worry, nothing crazy:

Hardware	Minimum	Recommended
RAM	4GB	8GB+
Storage	5GB free	20GB+
CPU	Any x64	Recent multi-core
GPU	Not required	Any (helps but not mandatory)

If your computer boots and runs VS Code, it can probably run a small LLM.

Step 1: Installing Ollama

Ollama is the simplest way to run LLMs locally. It handles everything: model downloads, quantization, and chat interface.

On macOS/Linux:

curl -fsSL https://ollama.ai/install.sh | sh

On Windows:

Download the installer from ollama.ai and follow the wizard.

To verify installation:

ollama --version

Step 2: Choosing the Right Model for Your Hardware

Here’s the secret. There’s no point downloading the most hyped model if it’s going to freeze your computer.

If You Have 4GB RAM (Very Basic Computer)

ollama run gemma:2b

Google’s Gemma 2B weighs ~1.4GB and works surprisingly well for simple tasks.

If You Have 8GB RAM (Average Laptop)

ollama run phi3:mini

Microsoft’s Phi-3 Mini has 3.8B parameters in just ~2.3GB. Excellent for code.

Another option:

ollama run llama3.2:3b

Meta’s Llama 3.2 3B is great for general use.

If You Have 16GB+ RAM

ollama run codellama:7b

Now you can run larger code-focused models.

Step 3: Testing Your Model

After downloading, just run:

ollama run phi3:mini

A chat interface opens in your terminal. Test something simple:

>>> Explain recursion in Python with an example

If it responded, congrats! You have an LLM running locally.

Step 4: Integrating with Your Editor

Terminal chat is nice, but the real power is in workflow integration.

VS Code with Continue

Install the Continue extension
Configure it to use local Ollama
Select the model you downloaded

Now you have AI autocomplete and chat directly in VS Code, 100% local.

Neovim

If you use Neovim, the ollama.nvim plugin handles integration.

Performance Tips

A few things that make a difference:

Close other programs — LLMs use a lot of RAM
Use quantized models — Ollama does this automatically
Prefer smaller models — A fast 3B beats a laggy 7B
Monitor RAM usage — If it goes past 90%, the model will slow down

Real-World Performance Comparison

Tested on my old laptop (i5 8th gen, 8GB RAM, no dedicated GPU):

Model	Tokens/second	Usability
Gemma 2B	~15 t/s	Very smooth
Phi-3 Mini	~8 t/s	Good
Llama 3.2 3B	~6 t/s	Acceptable
CodeLlama 7B	~2 t/s	Slow but works

For context: 8 tokens/second is already comfortable for interactive use.

What to Expect (And What Not to Expect)

Works well for:

Explaining code
Generating simple functions
Answering syntax questions
Suggesting small refactors

Don’t expect:

GPT-4 level performance
Huge context (small models cap at ~4k tokens)
Generating entire projects

But look, for a student learning to code? It’s more than enough.

Next Steps

Once you get the hang of it:

Try other models — ollama list shows available ones
Create custom prompts — Modelfiles customize behavior
Explore local RAG — Add your own documents as context
Contribute to the community — Share your findings

The Bottom Line

Running LLMs locally is no longer a privilege for those with expensive hardware. With Ollama and optimized models, any student can have a private, free, and offline coding assistant.

The secret is choosing the right model for your hardware. Start small, test, and scale up as your computer can handle it.

References:

Questions or want to share your setup? Find me on Twitter or LinkedIn.

Happy building!