23 voices across 6 engines · Rate 1-10, review aggregates, pipeline tracks moves · M4 Pro 64GB
Click 1-10 under each card to rate. Ratings persist locally (localStorage). Sort/filter at top, but listening order stays shuffled until you opt in.
Aggregated from your ratings. Updates live.
Per-engine: what it is, how it works, license, Apple Silicon viability, the invocation that produced its sample.
fishaudio/s2-pro, 906 downloads, last updated Mar 2026). Zero-shot clone works.nanovllm-voxcpm (CUDA fork). Clone mode produces "fluent-foreigner accent" (HF discussion #14).speaker_kv_scale) — first public test of the post-PR-#18 configuration nextHF discussion #14 + VoxCPM issue #222 maintainer-confirmed:
--mode default --instruct "...") not clone--cfg-value 1.5)speaker_kv_scale for ref-adherence vs naturalness<laugh>, <breath>, <sigh>, +7lang="na"[whisper], [excited], [angry], etc.Local-only mandate. Default: Irodori v3 male clone (matsukaze ref) — JP-native consensus pick across April-May 2026 YouTube reviewers, MIT license, MLX 8-bit. Confirmed in Round-8 (2026-05-25).
Voice variety: VoxCPM2 Voice Design with character-instructs for non-narrator lines.
Avoid: AivisSpeech / SBV2 (transitive AGPL), all cloud TTS.
Default: top-rated MIT/Apache engine from your blind A/B.
Personal viewing = AGPL is acceptable if needed, but Irodori MIT / Supertonic MIT are cleaner defaults.
Local default: Irodori v3 with Voice Design instruct prompts for narration style. MIT license keeps personal-use simple.
Air-gapped Silo machine: no cloud. Local-only mandate.
Bet on top-rated local Apache/MIT engine. Chunk per-character with distinct captions/instructs.
Things that aren't changing soon.
ml-explore/mlx team has zero in-flight TTS work in 2026.