Apple Intelligence vs sending it to OpenAI — the actual tradeoffs

Most “Apple Intelligence vs OpenAI” comparisons end at marketing. Privacy on one side, capability on the other, vague hand-waving about the future of AI. Useless if you are deciding which one to ship in a product this quarter.

Here is the matrix that actually matters when you sit down to build.

the comparison

Same task: take a one-minute voice note, transcribe it, extract sentiment, pull named entities, tag the topic. Run it once a day for a year of journaling.

capability        │ Apple Intelligence            │ OpenAI (Whisper + GPT-4o)
────────────────────────────────────────────────────────────────────────────
transcription     │ SFSpeechRecognizer, on-device │ Whisper API, cloud
sentiment         │ NLTagger, on-device           │ GPT-4o call, cloud
entity extraction │ NLTagger, on-device           │ GPT-4o call, cloud
topic clustering  │ NLEmbedding, on-device        │ embeddings API, cloud
summarization     │ Foundation Models, on-device  │ GPT-4o call, cloud
context limit     │ 8k-32k tokens (by device)     │ 128k tokens
languages         │ ~12 with full on-device       │ 90+ all cloud
accuracy ceiling  │ ~94% WER, mid-tier reasoning  │ best-in-class
latency           │ 200-500 ms per task           │ 800-2500 ms per task
cost per entry    │ $0 ongoing                    │ ~$0.003 per minute
privacy class     │ Data Not Collected            │ data sent to OpenAI
offline           │ works                         │ does not

The “cost per entry” line is the one most product decks miss. For a free journaling app with a million users journaling daily, the OpenAI path costs around $1,100 per day in API fees. The Apple Intelligence path costs zero ongoing. That number alone reshapes the unit economics of any privacy-focused consumer app.

where Apple Intelligence wins

Latency for short tasks. Anything under 1,000 tokens runs faster on-device than the network round-trip allows. For a journaling app where the user is watching transcription appear live, this is the difference between a smooth experience and a janky one.

Privacy as a strict guarantee. When the audio never leaves the device, you do not have to convince the user that you will not look at it. The architecture refuses to permit it. The skeptical user can verify with Charles Proxy. There is no “trust the privacy policy” step.

Offline. On the train, on a flight, in a building with no signal. The on-device model just works. The cloud model fails or queues.

Cost. Zero marginal cost per request. For a consumer app charging $5/month with no in-app upsell, the cloud cost would consume the entire margin within months. On-device is free forever.

where the cloud wins

Capability ceiling. GPT-4o or Claude can write a multi-paragraph summary of a year’s journal entries with thematic synthesis that the on-device foundation models cannot. The gap is not 20%. It is the difference between “summary” and “novel observation.”

Language coverage. Whisper handles 90+ languages well. Apple’s on-device coverage is roughly 12 languages well. If your user base is global, the gap matters.

Long context. 128k tokens vs the on-device 8-32k. For “summarize my whole year,” only the cloud can hold the whole thing at once.

Stability of API surface. Apple’s on-device APIs are still moving. Foundation Models in particular is a year old. OpenAI’s API has been stable for two years. For a venture-backed app racing to ship, API churn matters.

the hybrid pattern

Most apps that take privacy seriously and need cloud-level capability end up here.

on-device           │ transcription, sentiment, entity, topic, encryption
                    │ daily / weekly digests (small context)
                    │ realtime UX (anything user-facing during write)
                    │
cloud (rare)        │ "annual review" with opt-in upload
                    │ user-triggered, explicit, ephemeral (no retention)
                    │ end-to-end encrypted in flight

The user knows when the cloud is being called. The cloud only sees the data the user explicitly sends. The default is on-device. The upgrade is on-demand.

For a journaling app this hybrid is hard to get right. The pressure from competitive products is to invisibly send everything to the cloud so the AI features are always magical. The discipline is to refuse. The value proposition is the privacy, and the privacy disappears the moment the cloud sees any of it.

for journaling specifically

Five reasons on-device wins this category, against the general assumption that cloud wins everywhere.

Entry data is intimate. Journal entries describe health, relationships, finances, frustrations. The category fails if users self-censor because they think the data is being read.
Latency matters for capture. Anything that makes the open-and-talk loop feel laggy increases abandonment. On-device is faster on the user-facing path.
Cost would eat the unit economics. A daily journaling habit is 365 cloud calls a year per user. At cloud pricing, the cost-per-user is meaningful even at a subscription price point.
Data Not Collected is a privacy claim that compounds. Apple’s privacy nutrition labels surface the difference at install time. A “Data Not Collected” badge converts better than a “Data Collected, here’s our privacy policy” page.
The capability gap is small for the actual journal use case. Sentiment, entity extraction, weekly summarization — the on-device models handle these fine. The only thing they cannot do is the long-context annual review, which is a once-a-year opt-in feature, not a daily one.

the take

Apple Intelligence wins for products where privacy is the differentiator and the task fits in the on-device capability envelope. Cloud wins for products where capability is the differentiator and privacy is acceptable to compromise. The wrong answer is to assume cloud is always better. For a category like journaling, on-device is the only coherent architecture.

Pick your category. Pick your provider. Do not try to be both.

→ Earlier: SFSpeechRecognizer deep dive — what requiresOnDeviceRecognition really gets you → Next in this series: why “Data Not Collected” is a moat in 2026