Can I run local LLMs on iPhone and Mac?

Yes. On Device AI supports local models on iPhone, iPad, Mac, and Vision Pro, with available models shaped by device capability.

What engines does On Device AI use?

On Device AI supports GGUF through llama.cpp and MLX models optimized for Apple Silicon.

Can I import custom GGUF models?

Yes. On Device AI supports custom GGUF imports from sources such as Hugging Face.

Run a Local LLM on iPhone, iPad and Mac

Local LLM settings and GGUF parameters on macOS

1. Choose Your Inference Engine (GGUF vs. MLX)

On Device AI supports the two major local inference frameworks in Apple architectures, letting you choose the optimal engine for your hardware:

GGUF (via llama.cpp): Offers broad model compatibility and operates universally across modern iOS, iPadOS, macOS, and visionOS devices. Perfect for general open-weight models.
MLX (Apple Silicon native): Apple's machine learning framework, engineered specifically for Apple hardware. MLX provides enhanced memory management and lightning-fast inference on Apple Silicon Macs, utilizing unified memory to its fullest potential.

2. Choose a Model Based on Device Memory

Because local processing relies heavily on physical RAM (or Unified Memory in Apple Silicon), On Device AI helps you choose compatible models tailored to your specific hardware configurations:

For iPhones/iPads (6GB - 8GB RAM): Select compact, optimized models such as DeepSeek-R1 1.5B, Qwen 2.5 1.5B/3B, or Gemma 2 2B. These fit easily inside mobile memory footprints without triggering OS memory pressure limits.
For iPads/Macs (8GB - 16GB RAM): Comfortably execute high-reasoning models like Llama 3 8B, Phi-4 14B, or Qwen 2.5 7B.
For Pro Macs (24GB - 128GB Unified Memory): Experience massive reasoning models up to 32B or 70B parameters locally at high tokens-per-second, entirely offline.

3. Custom Hugging Face GGUF Imports

Not limited to the built-in model library? On Device AI includes a custom downloader: simply copy a direct GGUF model download link from repositories like Hugging Face, paste it in the app's Import section, and download it natively. Your custom model becomes immediately available in chat and subagent workflows.

4. Private, Performant, and Pure Native

Written purely in SwiftUI, On Device AI bypasses sluggish Electron wrappers to ensure native hardware acceleration. Because models run directly on your neural engine and local GPU cores, no text, conversations, or files ever leave your device.

Download On Device AI Read Model Setup Guide →