Caption Overview
Caption is a macOS subtitle workspace built around four connected workflow stages, with a local-first boundary around sensitive material:
- Capture audio and video sources reliably.
- Follow the conversation with live captions during the session.
- Generate, revise, and export subtitles after the session.
- Find the exact moment again with timecode-level search.
This documentation focuses on the workflows already reflected in the product page and macOS app structure:
- Multi-source capture from microphone, system audio, app output, and camera.
- Real-time captions for meetings, calls, classes, and demos.
- Offline subtitle generation for local audio and video.
- Search and session history for locating the exact time a line appeared.
- Local-first handling for sensitive or internal material.
Start here
Who Caption is for
Caption is a better fit when your work has both a live phase and a delivery phase.
Examples:
- Meetings where you need to understand the conversation while it is still happening.
- Demos, classes, or internal training where capture and later subtitle export both matter.
- Long-form media that needs revision, burn-in, archive, and later search.
Product boundary
Caption is not positioned as a generic cloud transcription service. The core story is a local-first macOS workflow:
- Capture first.
- Understand during the session.
- Finish the subtitle work after the session.
- Keep the result searchable later.