How to Run Ollama on Mac — Complete Setup Guide 2026

Ollama is the easiest way to run powerful local AI models on your Mac. It supports Llama 3.3, Mistral, DeepSeek-R1, Phi-4, and thousands more open-source models. Here's everything you need to know to get started in 2026.

What You'll Need

A Mac with Apple Silicon (M1, M2, M3, or M4)
macOS 11 Big Sur or later
At least 8GB of available RAM (16GB+ recommended)
10-30GB of free disk space for models

Note: Ollama runs best on Apple Silicon. Intel Macs are supported but performance is significantly slower due to lack of Metal GPU acceleration.

Method 1: Install Ollama Directly

The simplest way to get Ollama running on your Mac.

Step 1: Download Ollama

Go to ollama.com/download and download the macOS app. It's a ~150MB DMG file.

Step 2: Install

Double-click the DMG, drag Ollama into your Applications folder, then launch it. You'll see the Ollama icon appear in your menu bar.

Step 3: Verify It's Running

Open Terminal and run:

ollama --version

You should see something like ollama version 0.5.x. If you get a command not found error, try logging out and back in, or restart your Mac.

Step 4: Pull Your First Model

In Terminal, run:

ollama pull llama3.2

This downloads the Llama 3.2 model (~2GB). It might take 5-10 minutes depending on your internet speed.

Once done, verify:

ollama list

You should see llama3.2 in the list along with its size (about 2GB).

Step 5: Start Chatting

Run:

ollama run llama3.2

You're now chatting with Llama 3.2 running entirely on your Mac. Type your question, press Enter, and get a response. Type /bye to exit.

Method 2: Install via Homebrew

If you use Homebrew (the Mac package manager), installing Ollama is even easier:

brew install ollama

Then launch it:

brew services start ollama

Or just run ollama in the background.

Method 3: MacMind (GUI Launcher)

If you want a polished graphical interface without touching the terminal, MacMind handles everything — Ollama installation, model management, and chat — in a native macOS window.

Steps with MacMind:

Buy a license ($9.99) and download the app
Launch MacMind — it guides you through Ollama installation with one click
Enter a model name (like llama3.2 or deepseek-r1:8b)
Click download, then start chatting in the built-in panel

Which Model Should You Use?

Here's a quick guide to the most popular Ollama models in 2026:

Model	Size	Best For	RAM Needed
llama3.2	2GB	Fast, lightweight, good all-rounder	4GB+
llama3.3	8GB	Best quality for general use	8GB+
deepseek-r1:8b	5GB	Reasoning, coding, math	8GB+
phi4	2.3GB	Fast, good for code	4GB+
mistral	4GB	Balanced, good instruction following	6GB+
qwen3:8b	5GB	Multilingual, reasoning	8GB+

For most users, llama3.2 is the best starting point — fast, small, and surprisingly capable. Upgrade to llama3.3 or deepseek-r1:8b when you want more quality.

How Much RAM Do You Need?

Apple Silicon unified memory is shared between CPU, GPU, and Neural Engine. Here's a rough guide:

8GB RAM — Can run 3B-4B models comfortably. 7B models may be tight.
16GB RAM — Can run up to 13B models. Llama 3.3 8B runs smoothly.
32GB RAM — Can run 33B+ models, or use larger context windows on smaller models.
64GB+ RAM — Full flexibility. 70B+ models at good speeds.

Ollama shows real-time memory usage when you run a model. If your Mac starts swapping (using disk as RAM), the model is too large for your available memory.

Speeding Things Up

Use the right model size

A smaller model (3B-4B) on Apple Silicon is often faster than a larger model on cloud. Don't default to the biggest model — find the smallest one that answers your questions well.

Keep Ollama running

Model loading takes 3-10 seconds. Once loaded, Ollama keeps the model in memory. If you quit Ollama, you pay the load time again on the next run.

Use quantization

By default, Ollama uses quantized models (Q4_K_M is a common choice — good quality, smaller size). You don't need to think about this, but if you want to dig deeper, Ollama supports custom model files with different quantization levels.

Using Ollama with External Clients

Ollama runs a local API server that any OpenAI-compatible client can use. Start the server:

ollama serve

Then in another terminal, test it:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

This opens up integrations with Cursor, VS Code extensions, and any app that supports the OpenAI API. The base URL is http://localhost:11434/v1.

Troubleshooting

"ollama: command not found"

Ollama isn't in your PATH. Try restarting your Mac, or add it manually: echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.zshrc && source ~/.zshrc

Model downloads fail or are slow

Try a different network, or use a VPN. Ollama downloads models directly from their CDN. On some networks, CDN throttling can occur.

Out of memory errors

The model is too large for your available RAM. Try a smaller model like llama3.2 (2GB) instead of llama3.3 (8GB).

Ollama runs slow on Intel Mac

This is expected. Apple Silicon has Metal GPU acceleration for AI that Intel Macs don't have. Consider upgrading to an M-series Mac for the best local AI experience.

Want the easiest way to run Ollama on Mac?

MacMind gives you a native macOS GUI for Ollama — install in one click, manage models visually, and chat without any terminal commands.

Buy MacMind — $9.99

What's Next

Once you have Ollama running, explore these directions:

Try different models — Experiment with deepseek-r1 for reasoning, mistral for creative writing, or phi4 for fast code completion.
Connect to external clients — Use Ollama as a local API for Cursor AI, Continue.dev, or any OpenAI-compatible app.
Set up Open WebUI — If you want a full web interface with conversation history and RAG, install Docker and set up Open WebUI.
Use prompt presets — Save common prompts as templates for repeated tasks like code review or shell command generation.

MacMind is a $9.99 native macOS launcher for Ollama. Ollama is free and open-source. This guide is not affiliated with Ollama or any model providers.