An AI artifact for long video conference minutes, interview program editing, and course overview: whisperVideo

WhisperVideo is an AI tool that can convert speech in long videos into text, automatically identify the speaker, match the content with the face in the picture, and generate subtitles and visualization panels with speaker ID. It supports hour-level materials, automatically divided into scenes and paragraphs, and is suitable for interviews, film editing and meeting minutes review. Use WhisperX for text transcription, Pyannote for speaker separation, and SAM3 for face detection. Finally, all information is integrated to generate subtitles and panel views.

52 views0 stars3/5/2026