I built Dictator because I wanted a lightweight, highly controllable voice-to-text tool for macOS that uses my own OpenAI API key instead of a monthly subscription service.
It’s a Lua-based extension for Hammerspoon.
How it works:
Hold Fn (or a custom hotkey) to record.
Release to transcribe.
The text is auto-pasted into your active application (or copied to clipboard).
Technical details & optimizations:
Audio Pipeline: Uses SoX to record directly to FLAC (16kHz mono). This reduces upload size by ~50% compared to WAV, which significantly speeds up the Whisper API response time.
Reliability: Implements a token bucket rate limiter to prevent API abuse and exponential backoff for handling 429/5xx errors gracefully.
Debouncing: I added strict debouncing logic to ignore accidental short taps (<0.4s) and prevent double-triggers.
Security: Your API key is stored locally and sent directly to OpenAI; there is no intermediate server.
MS Copilot (obviously) understands dictation a lot better, but that speech recognition using Copilot is not (and should not be) available in every text box or editor on Windows.
Hey HN,
I built Dictator because I wanted a lightweight, highly controllable voice-to-text tool for macOS that uses my own OpenAI API key instead of a monthly subscription service.
It’s a Lua-based extension for Hammerspoon.
How it works:
Hold Fn (or a custom hotkey) to record.
Release to transcribe.
The text is auto-pasted into your active application (or copied to clipboard).
Technical details & optimizations:
Audio Pipeline: Uses SoX to record directly to FLAC (16kHz mono). This reduces upload size by ~50% compared to WAV, which significantly speeds up the Whisper API response time.
Reliability: Implements a token bucket rate limiter to prevent API abuse and exponential backoff for handling 429/5xx errors gracefully.
Debouncing: I added strict debouncing logic to ignore accidental short taps (<0.4s) and prevent double-triggers.
Security: Your API key is stored locally and sent directly to OpenAI; there is no intermediate server.
Repo: https://github.com/Glossardi/Dictator-Speech-to-Text
I’d love to hear your thoughts on the push-to-talk UX versus a toggle approach, and if anyone has ideas on further reducing latency!
This is cool.
MS Windows (Win10 or Win11) has this dictate-speech-to-text feature built-in, but its accurate isn't up to the mark.
Win11 users can use Win+H hotkeys for dictation: https://support.microsoft.com/en-us/windows/use-voice-typing...
MS Copilot (obviously) understands dictation a lot better, but that speech recognition using Copilot is not (and should not be) available in every text box or editor on Windows.
Mac has it as well, but it's very slow and has poor accuracy.