Ollama is the easiest way to run powerful local AI models on your Mac. It supports Llama 3.3, Mistral, DeepSeek-R1, Phi-4, and thousands more open-source models. Here's everything you need to know to get started in 2026.
What You'll Need
- A Mac with Apple Silicon (M1, M2, M3, or M4)
- macOS 11 Big Sur or later
- At least 8GB of available RAM (16GB+ recommended)
- 10-30GB of free disk space for models
Note: Ollama runs best on Apple Silicon. Intel Macs are supported but performance is significantly slower due to lack of Metal GPU acceleration.
Method 1: Install Ollama Directly
The simplest way to get Ollama running on your Mac.
Step 1: Download Ollama
Go to ollama.com/download and download the macOS app. It's a ~150MB DMG file.
Step 2: Install
Double-click the DMG, drag Ollama into your Applications folder, then launch it. You'll see the Ollama icon appear in your menu bar.
Step 3: Verify It's Running
Open Terminal and run:
ollama --version
You should see something like ollama version 0.5.x. If you get a command not found error, try logging out and back in, or restart your Mac.
Step 4: Pull Your First Model
In Terminal, run:
ollama pull llama3.2
This downloads the Llama 3.2 model (~2GB). It might take 5-10 minutes depending on your internet speed.
Once done, verify:
ollama list
You should see llama3.2 in the list along with its size (about 2GB).
Step 5: Start Chatting
Run:
ollama run llama3.2
You're now chatting with Llama 3.2 running entirely on your Mac. Type your question, press Enter, and get a response. Type /bye to exit.
Method 2: Install via Homebrew
If you use Homebrew (the Mac package manager), installing Ollama is even easier:
brew install ollama
Then launch it:
brew services start ollama
Or just run ollama in the background.
Method 3: MacMind (GUI Launcher)
If you want a polished graphical interface without touching the terminal, MacMind handles everything — Ollama installation, model management, and chat — in a native macOS window.
Steps with MacMind:
- Buy a license ($9.99) and download the app
- Launch MacMind — it guides you through Ollama installation with one click
- Enter a model name (like
llama3.2ordeepseek-r1:8b) - Click download, then start chatting in the built-in panel
Which Model Should You Use?
Here's a quick guide to the most popular Ollama models in 2026:
| Model | Size | Best For | RAM Needed |
|---|---|---|---|
| llama3.2 | 2GB | Fast, lightweight, good all-rounder | 4GB+ |
| llama3.3 | 8GB | Best quality for general use | 8GB+ |
| deepseek-r1:8b | 5GB | Reasoning, coding, math | 8GB+ |
| phi4 | 2.3GB | Fast, good for code | 4GB+ |
| mistral | 4GB | Balanced, good instruction following | 6GB+ |
| qwen3:8b | 5GB | Multilingual, reasoning | 8GB+ |
For most users, llama3.2 is the best starting point — fast, small, and surprisingly capable. Upgrade to llama3.3 or deepseek-r1:8b when you want more quality.
How Much RAM Do You Need?
Apple Silicon unified memory is shared between CPU, GPU, and Neural Engine. Here's a rough guide:
- 8GB RAM — Can run 3B-4B models comfortably. 7B models may be tight.
- 16GB RAM — Can run up to 13B models. Llama 3.3 8B runs smoothly.
- 32GB RAM — Can run 33B+ models, or use larger context windows on smaller models.
- 64GB+ RAM — Full flexibility. 70B+ models at good speeds.
Ollama shows real-time memory usage when you run a model. If your Mac starts swapping (using disk as RAM), the model is too large for your available memory.
Speeding Things Up
Use the right model size
A smaller model (3B-4B) on Apple Silicon is often faster than a larger model on cloud. Don't default to the biggest model — find the smallest one that answers your questions well.
Keep Ollama running
Model loading takes 3-10 seconds. Once loaded, Ollama keeps the model in memory. If you quit Ollama, you pay the load time again on the next run.
Use quantization
By default, Ollama uses quantized models (Q4_K_M is a common choice — good quality, smaller size). You don't need to think about this, but if you want to dig deeper, Ollama supports custom model files with different quantization levels.
Using Ollama with External Clients
Ollama runs a local API server that any OpenAI-compatible client can use. Start the server:
ollama serve
Then in another terminal, test it:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false
}'
This opens up integrations with Cursor, VS Code extensions, and any app that supports the OpenAI API. The base URL is http://localhost:11434/v1.
Troubleshooting
"ollama: command not found"
Ollama isn't in your PATH. Try restarting your Mac, or add it manually: echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.zshrc && source ~/.zshrc
Model downloads fail or are slow
Try a different network, or use a VPN. Ollama downloads models directly from their CDN. On some networks, CDN throttling can occur.
Out of memory errors
The model is too large for your available RAM. Try a smaller model like llama3.2 (2GB) instead of llama3.3 (8GB).
Ollama runs slow on Intel Mac
This is expected. Apple Silicon has Metal GPU acceleration for AI that Intel Macs don't have. Consider upgrading to an M-series Mac for the best local AI experience.
Want the easiest way to run Ollama on Mac?
MacMind gives you a native macOS GUI for Ollama — install in one click, manage models visually, and chat without any terminal commands.
Buy MacMind — $9.99What's Next
Once you have Ollama running, explore these directions:
- Try different models — Experiment with
deepseek-r1for reasoning,mistralfor creative writing, orphi4for fast code completion. - Connect to external clients — Use Ollama as a local API for Cursor AI, Continue.dev, or any OpenAI-compatible app.
- Set up Open WebUI — If you want a full web interface with conversation history and RAG, install Docker and set up Open WebUI.
- Use prompt presets — Save common prompts as templates for repeated tasks like code review or shell command generation.
MacMind is a $9.99 native macOS launcher for Ollama. Ollama is free and open-source. This guide is not affiliated with Ollama or any model providers.