Tutorial May 2026

How to Run Ollama on Mac

The complete step-by-step guide to installing Ollama on macOS Apple Silicon and getting your first local AI model running in under 10 minutes.

Ollama is the easiest way to run powerful local AI models on your Mac. It supports Llama 3.3, Mistral, DeepSeek-R1, Phi-4, and thousands more open-source models. Here's everything you need to know to get started in 2026.

What You'll Need

Note: Ollama runs best on Apple Silicon. Intel Macs are supported but performance is significantly slower due to lack of Metal GPU acceleration.

Method 1: Install Ollama Directly

The simplest way to get Ollama running on your Mac.

Step 1: Download Ollama

Go to ollama.com/download and download the macOS app. It's a ~150MB DMG file.

Step 2: Install

Double-click the DMG, drag Ollama into your Applications folder, then launch it. You'll see the Ollama icon appear in your menu bar.

Step 3: Verify It's Running

Open Terminal and run:

ollama --version

You should see something like ollama version 0.5.x. If you get a command not found error, try logging out and back in, or restart your Mac.

Step 4: Pull Your First Model

In Terminal, run:

ollama pull llama3.2

This downloads the Llama 3.2 model (~2GB). It might take 5-10 minutes depending on your internet speed.

Once done, verify:

ollama list

You should see llama3.2 in the list along with its size (about 2GB).

Step 5: Start Chatting

Run:

ollama run llama3.2

You're now chatting with Llama 3.2 running entirely on your Mac. Type your question, press Enter, and get a response. Type /bye to exit.

Method 2: Install via Homebrew

If you use Homebrew (the Mac package manager), installing Ollama is even easier:

brew install ollama

Then launch it:

brew services start ollama

Or just run ollama in the background.

Method 3: MacMind (GUI Launcher)

If you want a polished graphical interface without touching the terminal, MacMind handles everything — Ollama installation, model management, and chat — in a native macOS window.

Steps with MacMind:

  1. Buy a license ($9.99) and download the app
  2. Launch MacMind — it guides you through Ollama installation with one click
  3. Enter a model name (like llama3.2 or deepseek-r1:8b)
  4. Click download, then start chatting in the built-in panel

Which Model Should You Use?

Here's a quick guide to the most popular Ollama models in 2026:

Model Size Best For RAM Needed
llama3.2 2GB Fast, lightweight, good all-rounder 4GB+
llama3.3 8GB Best quality for general use 8GB+
deepseek-r1:8b 5GB Reasoning, coding, math 8GB+
phi4 2.3GB Fast, good for code 4GB+
mistral 4GB Balanced, good instruction following 6GB+
qwen3:8b 5GB Multilingual, reasoning 8GB+

For most users, llama3.2 is the best starting point — fast, small, and surprisingly capable. Upgrade to llama3.3 or deepseek-r1:8b when you want more quality.

How Much RAM Do You Need?

Apple Silicon unified memory is shared between CPU, GPU, and Neural Engine. Here's a rough guide:

Ollama shows real-time memory usage when you run a model. If your Mac starts swapping (using disk as RAM), the model is too large for your available memory.

Speeding Things Up

Use the right model size

A smaller model (3B-4B) on Apple Silicon is often faster than a larger model on cloud. Don't default to the biggest model — find the smallest one that answers your questions well.

Keep Ollama running

Model loading takes 3-10 seconds. Once loaded, Ollama keeps the model in memory. If you quit Ollama, you pay the load time again on the next run.

Use quantization

By default, Ollama uses quantized models (Q4_K_M is a common choice — good quality, smaller size). You don't need to think about this, but if you want to dig deeper, Ollama supports custom model files with different quantization levels.

Using Ollama with External Clients

Ollama runs a local API server that any OpenAI-compatible client can use. Start the server:

ollama serve

Then in another terminal, test it:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

This opens up integrations with Cursor, VS Code extensions, and any app that supports the OpenAI API. The base URL is http://localhost:11434/v1.

Troubleshooting

"ollama: command not found"

Ollama isn't in your PATH. Try restarting your Mac, or add it manually: echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.zshrc && source ~/.zshrc

Model downloads fail or are slow

Try a different network, or use a VPN. Ollama downloads models directly from their CDN. On some networks, CDN throttling can occur.

Out of memory errors

The model is too large for your available RAM. Try a smaller model like llama3.2 (2GB) instead of llama3.3 (8GB).

Ollama runs slow on Intel Mac

This is expected. Apple Silicon has Metal GPU acceleration for AI that Intel Macs don't have. Consider upgrading to an M-series Mac for the best local AI experience.

Want the easiest way to run Ollama on Mac?

MacMind gives you a native macOS GUI for Ollama — install in one click, manage models visually, and chat without any terminal commands.

Buy MacMind — $9.99

What's Next

Once you have Ollama running, explore these directions:


MacMind is a $9.99 native macOS launcher for Ollama. Ollama is free and open-source. This guide is not affiliated with Ollama or any model providers.