mic
Type: Software Status: In Progress Deadline: TBD Created: 2026-04-24
What is this?
A lightweight native iOS app for on-device voice transcription. Tap a Lock Screen control to start recording, tap again to stop. Transcriptions are timestamped, stored locally as markdown files, and synced via iCloud Documents. Works fully offline — no internet required. Automatically identifies speakers when multiple people are talking.
Motivation
Quick, frictionless voice capture with zero cloud dependency. The flashlight metaphor: always available, one tap, no app to open. Transcriptions persist and sync silently across devices.
What does success look like?
- One-tap recording from Lock Screen (no app launch needed)
- Accurate on-device transcription via WhisperKit (Whisper Small)
- Speaker diarization — identifies and labels different speakers
- Transcriptions stored as timestamped markdown files in iCloud Documents
- Works completely offline
- Nothing-inspired UI: monochrome, industrial minimalism, Space Grotesk/Mono typography
- Battery-efficient recording
Technical
- Stack: SwiftUI, WhisperKit (Whisper Small ~460MB), SpeakerKit (Pyannote ~30MB), AVFoundation
- Repo: github.com/djt53/mic
- Deploy target: iOS 18+ (Lock Screen controls require ControlWidget API)
- Intended users: Personal use, power users who want fast voice-to-text
Architecture
Storage Format
Transcription files stored in iCloud Documents container (iCloud~com~davidtingle~mic):
Documents/
2026-04-24T14-30-00.md
2026-04-24T15-45-22.md
Single speaker:
---
date: 2026-04-24T14:30:00Z
duration: 45s
title: Groceries
---
Transcribed text goes here.
Multi-speaker:
---
date: 2026-04-24T10:15:00Z
duration: 2m 8s
title: Launch Sync
speakers: true
---
**Speaker 1:** Let's push the launch to next Thursday.
**Speaker 2:** Sure, I'll have a draft by Monday.
App Structure
mic/
├── micApp.swift # Entry point
├── Models/
│ └── Transcription.swift # Model + markdown serialization + speaker turns
├── Services/
│ ├── AudioRecorder.swift # AVAudioEngine recording (native rate, post-convert)
│ ├── TranscriptionEngine.swift # WhisperKit + SpeakerKit wrapper
│ ├── StorageService.swift # iCloud Documents read/write + local eviction
│ ├── RecordingManager.swift # Lock Screen control ↔ app bridge
│ └── SharedContainer.swift # iCloud container paths
├── Views/
│ ├── HomeView.swift # List + search + record button
│ ├── TranscriptionView.swift # Detail with speaker turn rendering
│ ├── TranscriptionRow.swift # List row
│ ├── RecordingCard.swift # Active recording state
│ └── ModelDownloadView.swift # First-launch model download
├── Theme/
│ └── Theme.swift # Nothing-inspired design tokens
├── micControl/
│ ├── RecordControl.swift # Lock Screen ControlWidget
│ └── micControlBundle.swift # Widget bundle
└── Resources/Fonts/
├── SpaceGrotesk-Variable.ttf
├── SpaceMono-Regular.ttf
└── SpaceMono-Bold.ttf
Key Components
- Lock Screen Control — ControlWidget (iOS 18+) toggles recording via App Group UserDefaults
- AudioRecorder — Records at native sample rate, converts to 16kHz mono WAV after stop (battery optimization)
- TranscriptionEngine — WhisperKit (transcription) + SpeakerKit (diarization), both on-device
- StorageService — FileManager reads/writes markdown to iCloud Documents; delete = local eviction (file stays in iCloud)
- Search — Filters transcriptions by title and body text
Battery Optimizations
- Record at native sample rate → single batch conversion after stop (no real-time resampling)
- Timer at 1Hz (not 10Hz) for elapsed time
- Audio level computed every 4th buffer via vDSP (not every buffer, not manual loop)
- Larger audio buffer (8192 frames) for fewer callbacks
Speaker Diarization
- Uses SpeakerKit (Pyannote) — bundled with WhisperKit, ~30MB additional models
- Models download in background after WhisperKit loads (non-blocking)
- WhisperKit runs with word-level timestamps + VAD chunking
- SpeakerKit diarizes the audio array, then aligns speaker segments with transcription using subsegment strategy
- Consecutive segments from the same speaker are merged into turns
- Speaker turns stored in markdown as
**Speaker N:** textblocks
Design System (Nothing-inspired)
- Typography: Space Grotesk (body), Space Mono (timestamps/metadata)
- Palette: Monochrome — OLED black backgrounds, white/gray text
- Hierarchy: Display → Body → Metadata (three layers, strictly enforced)
- Accent: Signal red (#D71921) — recording state, speaker 1 label
- Speaker colors: Red, Blue, Green, Amber, Purple (cycled for 5+ speakers)