Field Report — Tools / Research Speech-to-Text on Mac Updated 2026-05-07

Better than Wispr Flow? A 2026 audit.

For two years Wispr Flow was the obvious answer for serious dictation on a Mac. As of May 2026, that's no longer self-evident — though "no longer self-evident" doesn't necessarily mean "switch." ElevenLabs Scribe v2 leads the batch-WER leaderboard; Apple Silicon now runs Parakeet TDT v3 locally at 30× realtime; Apple shipped its own SpeechAnalyzer API in macOS Tahoe. This page is a researched, sourced survey of where things actually stand — tuned for the priorities you said (accuracy first, local second, cost third) and for the cases where Wispr Flow is still the right answer. Coverage now includes Android.

Audience · MacBook Pro M1 Pro Currently using · Wispr Flow Priority · Accuracy > Local > Cost Snapshot · 2026-05-07
01
Wispr Flow's underlying ASR appears to be Whisper-tier — not the new accuracy frontier. Per their vendor case study with Baseten, the stack is Whisper-derived ASR + a fine-tuned Llama 3.1 polish step. Wispr does not publish benchmarks or expose a raw-ASR endpoint, so their precise WER is estimated, not measured. Scribe v2, Universal-3 Pro, gpt-4o-transcribe, and Canary-Qwen-2.5B all beat Whisper-large-v3 on independent leaderboards — but raw-ASR accuracy isn't what Wispr Flow optimises; the polish layer is.
02
Best batch accuracy is ElevenLabs Scribe v2 at ~2.2 % AA-WER. For dictation specifically use Scribe v2 Realtime (~2.9 % WER, <150 ms latency). Both cloud-only, ~$0.40/hr. AssemblyAI Universal-3 Pro (~3.3 %) and gpt-4o-transcribe (~4.1 %) follow.
03
Best local on Apple Silicon is Parakeet TDT v3 via Superwhisper or VoiceInk. Runs at 30× realtime on an M1 Pro through Argmax CoreML / FluidAudio. Sits at ~6 % WER on Open ASR Leaderboard — 1–3 points behind the cloud champions, not 10.
04
The Wispr Flow "feel" comes from the Llama polish, not raw ASR. Auto-formatting, command mode, capitalization, filler-word stripping — that's what makes it pleasant. Switching to a more accurate model with worse polish often feels worse.
05
The cheapest credible upgrade is VoiceInk ($25 lifetime, GPL v3) running Parakeet streaming. Under $50 once, full local privacy, decent UX. Closest you can get to "free Wispr Flow alternative" without paying anyone monthly.
06
Apple Silicon shipped strong new developer APIs — not a consumer Dictation upgrade. macOS Tahoe's SpeechAnalyzer / SpeechTranscriber APIs match Whisper-large-v3-turbo on accuracy and beat it 55 % on speed in MacRumors' single-file test. But the user-facing Dictation feature still has the legacy 30-second timeout, no command mode, and shallow custom vocabulary. The API capability hasn't reached the consumer product yet.
07
Free vibe-coded Wispr Flow clones now exist and several are credible. Handy (21k stars, MIT), OpenWhispr (closest to Superwhisper feature parity), FreeFlow (native Swift), MacParakeet (Apple-Silicon-only, Parakeet-first). The honest gap to Wispr Flow is the Llama-3.1 polish step — OpenWhispr and FreeFlow narrow it; Handy explicitly skips it.
08
For Mac+Android coverage, Wispr Flow is essentially the only polished cross-platform answer. Superwhisper / VoiceInk / MacWhisper are Mac-only. Spokenly is on the Android waitlist. If you use both, your Pro sub already works on Android (currently unlimited as a launch promo). For local Android, FUTO Voice Input ($10 once, Whisper-based) is the standout backup.
09
The honest TCO doesn't always favour switching. Wispr Flow Pro is $144/yr flat. Superwhisper + Scribe v2 BYOK at heavy use is roughly $565/yr first year. Going fully local on Parakeet via Handy or MacParakeet is $0 but loses the Llama-style polish. There's no single "best, cheapest, local" pick — they're three different trade-offs.
~2.9%
Best dictation WER
Scribe v2 Realtime · AA leaderboard
~6–8%
Best local WER
Parakeet TDT v2/v3 on M1 Pro
$0
Local cost / hour
After $25 VoiceInk lifetime
$144
Wispr Flow Pro / yr
Or $15/mo monthly

A decision tree for your priorities.

If you're optimising for accuracy first and local second, follow these branches. The end-of-line entries are the actual recommendations — details for each are in the deep dives below.

START · What matters most? ├─ You need one tool covering Mac AND Android with one bill │ └─ → Stay on Wispr Flow Pro. Nothing else covers both well in 2026. │ ├─ You like Wispr Flow's polish (auto-format, command mode, "delete that") │ └─ → Stay. Switching gives up the Llama 3.1 layer unless you wire one up yourself. │ ├─ NO cloud, ever · truly-local Mac path │ ├─ want most-tested community option → Handy (cjpais/Handy, MIT, 21k stars) │ ├─ want best UX, willing to pay one-time → Superwhisper + Parakeet v3 ($85/yr or $250 lifetime) │ ├─ want native Swift, Parakeet-first, free → MacParakeet (single-maintainer; M1+, GPL-3.0) │ ├─ want closest free Superwhisper feature parity → OpenWhispr (local + cloud BYOK) │ ├─ cheap one-time, build-from-source option → VoiceInk ($25 lifetime) │ ├─ transcribing files, not dictating live → MacWhisper Pro (€59 lifetime) or Aiko (free) │ ├─ macOS Tahoe + bare-bones UX → built-in Dictation │ └─ voice control + STT (accessibility) → Talon Voice (Patreon) │ ├─ OK with cloud / hybrid path on Mac │ ├─ best dictation accuracy, willing to pay → ElevenLabs Scribe v2 Realtime (~$0.40/hr, <150 ms) │ ├─ cheapest streaming → Groq Whisper-large-v3-turbo ($0.04/hr) │ ├─ lowest dictation latency → Deepgram Nova-3 (~300 ms TTFT) │ └─ full BYOK control on Mac → Superwhisper + Scribe v2 / Deepgram / GPT-5 keys (TCO ~$565/yr at heavy use, vs $144 Wispr) │ └─ Android side ├─ already paying Wispr Flow Pro → same account works on Android, currently unlimited (launch promo) ├─ want local/offline backup → FUTO Voice Input ($10 once, Whisper-based, IME plugin) └─ on a Pixel 9/10 → Gboard Smart Dictation (free, partial on-device via Gemini Nano)

The comparison matrix.

Twelve options on the table. Accuracy tier is calibrated to Artificial Analysis WER for cloud APIs and Open ASR Leaderboard for local models — A++ is <3 %, A+ is 3–5 %, A is 5–8 %, A− is 8–10 %, B is 10–15 %.

Tool Price Local? Accuracy Latency Default model
Wispr FlowSF startup, the current baseline Free / $12–15 mo No A (with polish) A · ~700 ms p99 Whisper-derived ensemble + Llama 3.1 polish
Superwhisperlocal + BYOK cloud Free / $85 yr / $250 lifetime Yes (or hybrid) A (local) · A++ (BYOK) A · sub-100 ms Parakeet v3 (Argmax CoreML)
VoiceInkopen source, GPL v3 $25–49 lifetime Yes A A · sub-100 ms Whisper.cpp + FluidAudio Parakeet
MacWhispertranscription-focused Free / €59 lifetime Pro Yes A B · batch Whisper-large-v3 / large-v3-turbo
Aikoby Sindre Sorhus Free Yes A C · file-based Whisper-large-v3 (CoreML)
Apple Dictationbuilt-in (macOS 26 Tahoe) Free Yes (Apple Silicon) B+ A · ~150 ms Apple SpeechAnalyzer / SpeechTranscriber
Talon Voiceaccessibility + voice control Free / $25 mo Patreon Yes B+ A · ~80 ms Custom Conformer (wav2letter)
ElevenLabs Scribe v2API only ~$0.40/hr No A++ · 2.2 % WER B (batch) / A+ Realtime <150 ms Scribe v2 / v2 Realtime
OpenAI gpt-4o-transcribeAPI only $0.36/hr No A+ · 4.1 % WER B · ~1–3 s gpt-4o-transcribe (Mar 2025)
Deepgram Nova-3API only $0.29/hr stream / $0.46 batch No A · 5.3–6.8 % WER A+ · ~250–300 ms Nova-3
AssemblyAI Universal-3 ProAPI only $0.21/hr No A+ · 3.3 % WER B (batch) · Universal-Streaming separate Universal-3 Pro
Groq Whisper-large-v3-turboAPI only $0.04/hr No A− · ~7–9 % WER A+ · 228× realtime OpenAI Whisper-large-v3-turbo

The deep dives.

What each one actually is, what it ships with, what it gets right, where it falls short.

Current baseline · cross-platform

Wispr Flow

$12–15/mo · cloud only · Mac/Win/iOS/Android

Cloud dictation app from Wispr AI. Per their vendor case study (Baseten), the stack is Whisper-derived ASR plus a fine-tuned Llama 3.1 polish step. End-to-end latency ~700 ms p99 (vendor-reported). Single Pro account covers Mac+Win+iOS+Android simultaneously.

  • Strengths: Llama post-processing for auto-formatting and "command mode"; 100+ languages; tight UX in any text field; only polished Mac+Android unified answer in 2026.
  • Weaknesses: cloud-only on every platform — "Privacy Mode" disables retention but audio still transits Wispr's cloud; $144/yr Pro is steep vs free local alternatives; underlying ASR is Whisper-tier, not Scribe v2-tier.
  • Note on reviews: review aggregators (e.g. Voibe / getvoibe.com — itself a content-marketing site for another dictation app) cite a Trustpilot ~2.7/5; G2 sits at 4.5/5. We have not directly verified Trustpilot. Treat as anecdotal.
  • Note on stack: there is no public model called "Wispr Whisp" — the stack is Whisper-derived plus Llama 3.1, not an in-house ASR model.
Best polish; the cross-platform default. Stay if you (a) value the Llama polish layer, (b) need Mac+Android with one bill, or (c) don't want to wire up your own cloud BYOK pipeline. Switch only if accuracy + local + cost align toward a different stack.
Best UX local

Superwhisper

$85/yr / $250 lifetime · v2.13.2 (Apr 2026)

The leading local-first Mac dictation app. Default offline engine is Argmax's CoreML port of Parakeet TDT v3 (sub-100 ms streaming on M1 Pro). Optional BYOK cloud transcription (Scribe v2, Deepgram, gpt-4o-transcribe, Whisper API) and BYOK LLM polish (Claude, GPT-5, Gemini, Grok, Ollama).

  • Strengths: deep Mac integration (modes, hotkeys, app-aware "Super Mode"), 1,000-keyword custom vocabulary, claude/opencode agent integration in v2.13.
  • Weaknesses: $250 "lifetime" is real, but cloud BYOK still bills you separately; default model changed from Whisper to Parakeet in mid-2025, so old reviews are misleading.
If you want one tool that satisfies all three of your priorities (accuracy / local / cost), this is it. The lifetime breaks even vs Wispr Flow Pro after ~17 months.
Cheapest credible upgrade

VoiceInk

$25–49 lifetime · GPL v3 · v1.74 (Apr 2026)

Open-source local dictation menu-bar app (Beingpax/VoiceInk on GitHub). Uses Whisper.cpp for Whisper variants and FluidAudio for Parakeet streaming — the streaming was added in v1.73, putting it in the same league as Superwhisper for live dictation feel.

  • Strengths: "Power Mode" auto-formats per active app; system-wide hotkey; truly free if you build from source.
  • Weaknesses: less polished than Superwhisper; smaller community; no first-party BYOK cloud models (you can wire them via shell scripts).
The actual answer for "free Wispr Flow alternative" if you're OK with a bit of menu-bar friction. Under $50 once.
Free, file-focused

Aiko · MacWhisper · friends

Free / €59 lifetime · Whisper-focused

Aiko (Sindre Sorhus, free) and MacWhisper (Jordi Bruin, €59 Pro) are transcription-first apps. Best for "I have a meeting recording, give me a transcript" rather than "type in Slack while I talk."

  • Aiko: free, Apple Silicon native, Whisper-large-v3 on-device, lightweight UI.
  • MacWhisper: Pro adds large-v3-turbo, Whisperkit, faster-whisper, batch processing, language tools.
  • Hello Transcribe / Buzz / Whisper Memos / Spokenly: niche players in the same Whisper-on-Mac family.
Great for transcription. Wrong tool for the "I want to dictate into Slack" use case — use Superwhisper or VoiceInk for that.
Built-in

Apple Dictation (macOS 26 Tahoe)

Free · on-device · Apple Silicon native

The system Dictation feature runs on-device on Apple Silicon. macOS Tahoe (26) introduced the SpeechAnalyzer / SpeechTranscriber APIs — a 34-min file transcribed in 45 s, 55 % faster than Whisper-large-v3-turbo per MacRumors.

  • Strengths: free, in every text field, fully local, fast.
  • Weaknesses: no AI polish, no command mode, no third-party model swap, shallow custom vocabulary; consumer Dictation still has the legacy 30-second timeout (the new APIs are developer-facing).
Fine free fallback. Won't replace Wispr Flow because it has no rewrite layer — the thing that actually makes Wispr feel polished.
Accessibility-first

Talon Voice

Free / $25/mo Patreon for beta

Voice-control + dictation system, originally built for RSI accessibility. Local custom Conformer ASR (wav2letter-derived, not Whisper). Strongest at command grammars and code dictation; weaker for free-form prose.

  • Strengths: low latency, scriptable, full UI control by voice.
  • Weaknesses: steep learning curve; not a Wispr-style "type into any field" tool out of the box.
If you want voice control of your whole Mac, Talon. If you just want dictation, Superwhisper or VoiceInk.
Cloud APIs · accuracy lead

ElevenLabs Scribe v2

~$0.40/hr · Realtime <150 ms

Released January 2026 (Realtime Jan 6, Batch Jan 9). The current accuracy champion on Artificial Analysis: ~2.2 % English WER, ~93.5 % accuracy across 30 multilingual languages on FLEURS. The March 2026 update extended this to 99 languages with 98 % speaker-label accuracy.

  • Use it via Superwhisper BYOK if you want Wispr Flow's polish UX with the world's best ASR underneath.
If accuracy is genuinely #1 priority and you're OK with cloud, this is the answer. Wire it into Superwhisper via BYOK.
Cloud APIs · cheap streaming

Deepgram Nova-3 / Groq Whisper / OpenAI / AssemblyAI

$0.04 – $0.46 / hr

If you're rolling your own dictation flow:

  • Deepgram Nova-3: sub-300 ms TTFT, best dictation latency in the cloud tier.
  • Groq Whisper-large-v3-turbo: $0.04/hr, 228× realtime, accuracy of stock Whisper-turbo.
  • AssemblyAI Universal-3 Pro: 3.3 % WER, $0.21/hr, batch.
  • OpenAI gpt-4o-transcribe: 4.1 % WER, $0.36/hr, batch (Realtime API separate).
Pick by the metric you care about. Most people don't need to roll their own — use them through Superwhisper BYOK instead.

The vibe-coded Wispr Flow alternatives.

A wave of community-built free alternatives shipped in 2025–2026 — mostly MIT or GPL, mostly running Whisper or Parakeet locally. Five are worth caring about. None has Wispr Flow's Llama-3.1 polish out of the box, but several get close with cloud BYOK or local Ollama.

21.2k stars · community pick

Handy

cjpais/Handy · MIT · v0.8.3 (Apr 2026)

Cross-platform Tauri app (Rust + React/TypeScript). The most-starred and most-recommended free dictation tool of 2025–2026. Hold a hotkey, speak, release, paste. Author's own framing: not "the best STT app" but "the most forkable one."

  • Models: Whisper (Small / Medium / Turbo / Large with GPU) and Parakeet v2/v3 (CPU-optimised, auto language detection).
  • Strengths: "super polished" per HN consensus; Parakeet v3 reportedly excellent; ships with VAD silence filtering and direct text insertion.
  • Weaknesses: no LLM polish — output is a verbatim transcript, often missing punctuation; one Medium reviewer reports 2–5 second response delay; AirPods adds 1–2 s latency on top.
Best free option if you don't need post-processing. The community's default. github.com/cjpais/Handy
Closest to Superwhisper

OpenWhispr

OpenWhispr/openwhispr · MIT · v1.7.0 (May 2026) · 2.9k stars

The most feature-complete open-source clone. 1,364 commits, 76 releases. Sponsored by Neon (PostgreSQL provider) but remains MIT-licensed. No telemetry. Explicitly markets as "the open-source and free alternative to WisprFlow and Granola."

  • Local: OpenAI Whisper, NVIDIA Parakeet (via sherpa-onnx).
  • Cloud BYOK: GPT-5, Claude, Gemini, Groq.
  • Features: system-wide hotkey, AI agent conversations, meeting transcription with speaker diarization, local semantic search and notes.
  • Reviews: Product Hunt feedback — "insanely fast and accurate"; comparable to Wispr Flow when both use cloud (same underlying models). Recent fixes: meeting transcription, folder switching, clipboard pasting, Linux Wayland support.
If you want Superwhisper's whole feature surface for free, this is the closest match. Cross-platform, not Mac-native, but polished. github.com/OpenWhispr/openwhispr
Native Mac · AI cleanup

FreeFlow

zachlatta/freeflow · MIT · v0.3.3 (Apr 2026) · 1.6k stars

By Zach Latta (HackClub founder); current maintainer Marc Bodea. 93.4% Swift — actually a native macOS app. Hold Fn to talk, or toggle with Cmd-Fn.

  • Default backend: Groq's free-tier Whisper-large-v3-turbo for transcription + Groq LLM for context-aware cleanup.
  • BYOK: custom OpenAI-compatible providers and Ollama for fully-local mode.
  • Distinctive: "deep context" — reads on-screen content via screenshot so it spells names/jargon correctly when dictating into email, terminal, docs.
  • Caveats: Show HN got 277 points but the comment thread overwhelmingly preferred Handy. The cloud-default + screenshot-context trade adds latency. No streaming partials.
Right pick if you want a native Swift Mac app with optional LLM polish out of the box. github.com/zachlatta/freeflow
Apple Silicon · Parakeet-first

MacParakeet

moona3k/macparakeet · GPL-3.0 · v0.6.3 (May 2026) · 152 stars

Mac-native SwiftUI app, Apple-Silicon-only (M1+, macOS 14.2+). Default engine is Parakeet TDT 0.6B-v3 via FluidAudio CoreML on the Neural Engine; WhisperKit fallback for Korean / Japanese / Chinese. Tagline: "WisprFlow and Granola had a baby."

  • Author claims: ~155× realtime, ~2.5 % WER. (Author-stated; not independently submitted to Open ASR Leaderboard.)
  • Features: system-wide hotkey, push-to-talk + persistent recording, file/video/YouTube transcription, meeting recording with live preview, exports to TXT/MD/SRT/VTT/DOCX/PDF/JSON, optional AI summaries via cloud or local LLM.
  • Caveats: smaller community = less battle-tested than Handy; one-maintainer dependency risk.
Strongest fit for an M1 Pro user wanting native Mac feel + best-in-class local Parakeet. Free, GPL-3.0. github.com/moona3k/macparakeet
Open-source paid build

VoiceInk

Beingpax/VoiceInk · GPL-3.0 source / $25–49 binary

Open-source under GPL-3.0 if you build from source; paid pre-built binary at $25–49 lifetime. Voibe rates it #4 overall and "Best Open-Source Alternative." Mac Power Users praises one-time-fee transparency. One reviewer reports doubled productivity / 4.5 hours saved over two months.

  • Models: Whisper.cpp + FluidAudio Parakeet (streaming since v1.73).
  • Strengths: 100+ languages, deep Mac integration, Power Mode auto-formatting per app.
  • Weaknesses: "functional but basic" UI; steeper learning curve; mode-switching clunky; custom AI prompting limited to text reformatting.
Build from source for free. The most mainstream open-source Mac dictation app, just not the most beginner-friendly. github.com/Beingpax/VoiceInk
Honourable mentions

Other forks worth knowing

smaller communities, narrower scope

The long tail of vibe-coded Mac STT clones, ranked by usefulness:

  • whisper-mac (Explosion-Scratch/whisper-mac) — widest model menu: Parakeet, WhisperCPP, Vosk, Apple's native Speech, plus Gemini / Mistral cloud.
  • FluidVoice (altic-dev/FluidVoice) — "fastest macOS offline dictation," fully local.
  • speak2 (zachswift615/speak2) — hold-Fn-to-talk, on-device WhisperKit/Parakeet.
  • parakeet-dictation (osadalakmal/parakeet-dictation) — minimalist Parakeet-only script (Ctrl+Alt+A).
  • VoiceTypr (moinulmoin/voicetypr) — "open source" but trial-verification-gated; pay-once-use-forever, not actually free.
  • turbo-whisper (knowall-ai/turbo-whisper) — Linux-first, Mac supported; waveform UI.
Useful if Handy / OpenWhispr / MacParakeet don't fit your specific shape. Mostly smaller communities — verify last commit before depending on them.
Bonus · voice into Claude Code

If you actually want to dictate into the Claude Code CLI rather than as a system-wide tool, dedicated wrappers exist: enesbasbug/voice-to-claude (Claude Code plugin, whisper.cpp + Metal), mbailey/voicemode (natural 2-way voice), gmoqa/listen-claude-code (local Whisper via listen CLI), and Ashton-Sidhu/claude-whisper (push-to-talk with wake word, via Claude Agent SDK).

Honest read

For your priorities (accuracy → local → cost) on M1 Pro, the honest order is: Handy first (21k stars, MIT, most-tested community option — but no LLM polish, accept 2–5 s response delay per one Medium reviewer), MacParakeet second (Apple-Silicon-native, Parakeet-first; caveat: 152 stars and one maintainer — bus-factor risk), OpenWhispr third if you want AI agents / meeting transcription / cloud BYOK on top. None of these match Wispr Flow's Llama-3.1 polish out of the box — you'd wire that up via Ollama or BYOK cloud LLM yourself. The "free + best accuracy + truly local" intersection is empty; pick two.

Android, briefly.

If you also dictate on a phone: the honest cross-platform answer is Wispr Flow (it works on Android with the same Pro account, and the Android tier is currently unlimited as a launch promo). Local-on-Android is real but narrow; FUTO Voice Input is the standout free-private option.

Cross-platform unified

Wispr Flow on Android

Native app since Feb 2026 · Android 13+ · Pro covers all platforms

Floating-bubble UI over text fields (not a keyboard / IME). Uses the Accessibility Service for text insertion, which means it coexists with Gboard / SwiftKey / Samsung Keyboard rather than replacing them. Cloud-only — no offline mode.

  • Pricing: currently unlimited free on Android (a "limited-time launch promotion" per Wispr's own page); desktop free tier is 2,000 words/wk and iPhone is 1,000/wk. Pro $144/yr covers all platforms simultaneously on the same account.
  • Self-reported accuracy: 96–97 % in quiet rooms, 88 % in noisy environments. Not third-party benchmarked; treat as vendor claim.
  • Missing on Android vs desktop: Dictionary, Snippets, Styles, "Spell names right." Pro Command Mode is desktop-first; Android parity not announced.
  • Known bugs: "paste bug" (transcript fails to insert), accessibility-permission silently revoked on aggressive battery-optimisation OEMs (OnePlus, Xiaomi, Huawei, Oppo, Vivo, Samsung), connection drop on app-switch.
If you already pay Pro on Mac, just install it on Android — same account, currently free unlimited, no extra cost. If the launch promo ever ends, your Pro sub already covers it anyway.
Local + private · Android

FUTO Voice Input

$10 one-time · Whisper-based · offline IME plugin

Whisper-derived, fully on-device speech recogniser that plugs into any Android keyboard implementing the standard speech-recognition intents (FUTO Keyboard, HeliBoard, FlorisBoard, AnySoftKeyboard, AOSP Keyboard, Grammarly, SwiftKey). The standout free-private answer for Android in 2026.

  • Privacy: all processing on-device; no recordings stored or transmitted.
  • Pricing: $10 once. License also valid for FUTO Keyboard ($6.99 FUTO Pay / $11.99 Play Store).
  • Quality: English is its strong suit; quality drops for low-resource languages even though Whisper "supports" them all.
  • Use case: as a backup to Wispr Flow for offline / sensitive / no-cloud situations (medical, legal, journaling, sketchy networks).
$10 once. The local-private answer for Android. Pair with Wispr Flow on Android for the best both-worlds setup.
Built-in · flagship-only

Gboard Smart Dictation (Android 16 QPR2)

Free · Pixel 9/10 + flagship Snapdragon/MediaTek · Gemini Nano

Android 16 QPR2 added "Smart Dictation" inside Gboard, powered by Gemini Nano on-device. AI-mediated edits in dictation flow ("fix the last sentence and add punctuation"). Pixel Recorder remains the strong on-device transcription app for recorded audio.

  • Hardware gate: Pixel 9/10 (Tensor G4/G5), Snapdragon 8 Elite, MediaTek Dimensity 9400 — mid-range Android phones miss it.
  • Hybrid? Google has not publicly confirmed Smart Dictation is 100 % on-device; LLM-edit step may use cloud fallback. Treat as hybrid until clarified.
  • Non-Pixel fallback: Gboard voice typing on non-Pixel still routes to cloud Google Speech; quality good but not Smart-Dictation-grade.
If you're on a recent Pixel, this is the best free Android dictation experience. For older phones or non-Pixels, install Gboard but expect cloud-only.
Honourable mentions · Android

The rest of the field

Mostly worse than the three above
  • Samsung Voice Input — widely reported worse than Google Voice Typing on the same Galaxy hardware; switch the keyboard's voice engine to Google or install Gboard.
  • Galaxy AI Transcript Assist — meeting summary tool, not dictation IME. Don't conflate.
  • SwiftKey (Microsoft) — Azure Speech backend; one secondary blog claims ~8.5 % WER, single-source, unverified.
  • Otter.ai / Dragon Anywhere — meeting / structured-doc apps; do not insert into arbitrary text fields like an IME does. Often miscategorised in "best Android dictation" lists.
  • Argmax Pro SDK + Argmax Playground — real-time on-device Parakeet via LiteRT/NPU. Currently an SDK + dev demo, not a polished consumer dictation IME. Watch this space for 2027.
  • Spokenly, Superwhisper, VoiceInk, MacWhisper — all Mac-only. Not on Android in 2026.
If FUTO + Wispr + Gboard don't cover your use case, the rest of the Android field probably won't either.
Honest read · Android

Given you already pay Wispr Flow Pro, the optimal Android stack is: Wispr Flow as daily driver (free unlimited on Android right now, marginal cost zero) + FUTO Voice Input as $10 local backup (offline / private / sensitive contexts). If you're on a Pixel 9/10, also leave Gboard Smart Dictation enabled — it costs nothing and the Gemini-Nano edit verbs are useful. If you ever drop Wispr Pro for cost, the local-first answer is FUTO + Gboard for ~$10 once, and you lose ~2–4 WER points and the polish layer.

Models, briefly.

Most apps you might pick are wrappers around one of these. Knowing the model is the difference between picking by hype and picking by accuracy.

v3
Parakeet TDT 0.6B v3 (NVIDIA, mid-2025). 25-language multilingual ASR, ~6 % English WER on Open ASR Leaderboard. Apple Silicon: mlx-community/parakeet-tdt-0.6b-v3 and FluidInference/parakeet-tdt-0.6b-v3-coreml. Used by Superwhisper (Argmax CoreML port) and VoiceInk (FluidAudio). Best local-on-Mac choice.
v2
Whisper-large-v3 / large-v3-turbo (OpenAI, 2023–2024). The open baseline. ~6.4 % / ~7.5 % WER respectively. Fine on M1 Pro via WhisperKit, faster-whisper, or whisper.cpp. Turbo is the practical default for dictation.
v1
Scribe v2 (ElevenLabs, Jan 2026). Closed-source cloud. ~2.2 % AA-WER. v2 Realtime variant runs <150 ms latency. The current accuracy leader by a clear margin.
v0
Apple SpeechAnalyzer / SpeechTranscriber (Apple, macOS 26 Tahoe, Sept 2025). On-device, Neural-Engine accelerated, undisclosed architecture. 55 % faster than Whisper-large-v3-turbo on a single-file file benchmark; no published WER. Good for transcription, not yet a Wispr replacement.
+
Other contenders: Canary-Qwen-2.5B (NVIDIA, ~5.6 % Open ASR), Granite-Speech-3.3-8B (IBM, ~5.85 %), Phi-4-Multimodal (Microsoft), Voxtral (Mistral), Moonshine (Useful Sensors, fast small-footprint streaming). Universal-3 Pro and gpt-4o-transcribe are cloud-only competitors to Scribe v2.

$/hour and total monthly cost.

Not all "$X/yr" subscriptions are alike. This table normalises everything to dollars per hour of dictation, plus what 100 hours of monthly use actually costs you.

Option Pricing model $ / hr 100 hr / mo Notes

Things people get wrong.

Five claims that look true on first read but mislead in 2026.

  1. P/01
    Whisper and Wispr are the same thing.

    They aren't. Whisper is OpenAI's open ASR model. Wispr is a company; Wispr Flow is their dictation app. Wispr Flow uses Whisper-derived models — it isn't the same product. The conflation is constant in blog posts.

  2. P/02
    Batch WER predicts dictation feel.

    It mostly doesn't. Open ASR Leaderboard measures clean batch transcription. Real dictation also wants streaming partials, low TTFT, AI cleanup, custom vocabulary, and tolerance for false starts. A "more accurate" model with worse polish often feels worse.

  3. P/03
    Stale benchmarks treat Whisper-large-v3 as the ceiling.

    That was true through mid-2025. Since then Scribe v2, Universal-3 Pro, gpt-4o-transcribe, Canary-Qwen-2.5B, and Parakeet TDT v3 have all leapfrogged it. Anything dated before Q3 2025 is obsolete on accuracy claims.

  4. P/04
    "Lifetime" pricing is fully covered.

    Superwhisper's $250 lifetime gets you the app and unlimited offline Parakeet/Whisper. If you turn on cloud BYOK (Scribe v2, Deepgram, GPT-5), you pay those providers separately — budget $20–40/mo for heavy daily use. Local-only users pay $0/mo after the lifetime fee.

  5. P/05
    Apple SpeechAnalyzer = Apple Dictation.

    The new APIs are developer-facing and dramatically faster than Whisper. The consumer Dictation feature still inherits the old 30-second timeout, no command mode, no rewrite layer. If you want the API capability, you need a developer to build a tool around it (or wait for Apple to ship a new Dictation UX).

  6. P/06
    "Switching saves money."

    Often it doesn't. Wispr Flow Pro is $144/yr flat across Mac+Win+iOS+Android. Superwhisper + Scribe v2 BYOK at heavy use is roughly $565/yr first year. Going fully local on Parakeet via Handy / MacParakeet is $0 but loses the Llama-style polish. If you only count "save the $144" you'll often spend more on cloud BYOK or accept a worse-feeling output.

  7. P/07
    "Open-source dictation Mac apps cover Android too."

    None of them do as of May 2026. Superwhisper, VoiceInk, MacWhisper, Spokenly, FreeFlow, Handy — all Mac-only or desktop-only. Android local is a separate ecosystem (FUTO Voice Input, Gboard, Argmax SDK). Cross-platform unified dictation in 2026 is essentially Wispr Flow's one-product moat.

  8. P/08
    "Voibe / getvoibe.com is an independent reviewer."

    It's a content-marketing site for another dictation app (Voibe). Their "best alternatives to Wispr Flow / Superwhisper" listicles are SEO content with a vested interest in their rankings. Where this guide cites Voibe (e.g. on Wispr Flow's Trustpilot rating, VoiceInk's #4 placement), treat as one-sided secondary reporting, not an independent review.

How this page was made.

This guide was AI-generated from web research conducted on 2026-05-07 and AI fact-checked — not personally hands-on tested. Every quantitative claim (release dates, prices, version numbers, WER figures) was cross-checked against at least two independent sources where possible. Underlying research lives in 13 structured-claim markdown files in the project repo, each with confidence tiers (C1–C5), per-claim source URLs, and a "failures / counter-evidence" section. A separate critique-deep.md file in the repo runs an adversarial pass on the corpus — surfacing framing biases, source-quality issues (e.g. Voibe's content-marketing role), the ~$565/yr Superwhisper+BYOK total cost of ownership, the inconsistency between batch and dictation WER tiers, and the bus-factor risk on smaller OSS projects. The webapp has been edited to reflect those findings; the unedited research and the critique are both committed for transparency.

Things that move quickly: Open ASR Leaderboard rankings (weekly), provider pricing (multiple changes in 2026), Superwhisper / VoiceInk release cadence (multiple per month), and Apple's Speech APIs (next macOS release). If you're reading this in late 2026 and a vendor announcement contradicts something below, trust the vendor.

Primary sources

Android sources

Open-source clone repos

Corroborating sources

ai gen