How Essence Cut AI works
A step-by-step walkthrough of the entire workflow — from importing a long video to publishing your first viral short. Every feature explained, including how to use the Master Prompt and how to get clip JSON from any AI chat of your choice.
The 7-step workflow
1. Import your video
On the Home screen of the app, you have two ways to bring in a source video:
- Paste a YouTube URL in the input field and click Get clips in 1 click. The app downloads the best-available quality plus the thumbnail using yt-dlp.
- Click Upload and pick a local video file (mp4, mov, mkv, webm). Drag-and-drop also works anywhere on the Home screen.
A new project is created instantly and appears in your All projects grid. Every project is stored in a local SQLite database, so you can come back to it anytime.
2. Automatic local transcription
The app transcribes the audio 100% on your computer using Faster-Whisper Large-v3 Turbo with Silero voice-activity detection. Every word gets a precise timestamp, which is what makes the rest of the pipeline (animated captions, audio cuts, AI clip selection) work cleanly.
During transcription you can already toggle:
- Auto Censor — beep over profanity in English, French and German.
- Remove Filler Words — drop "um", "uh", "like" automatically.
- Remove Pauses — collapse silences longer than a threshold you control.
No audio, transcript or video is uploaded to any cloud service during this step.
3. Generate the Master Prompt
The Master Prompt is the heart of Essence Cut AI. It's a structured text containing your full transcript with sentence IDs plus crystal-clear instructions for an AI to pick the best viral moments and return a JSON response in the exact format the app expects.
On the left side of the Prompt Editor you'll find toggles for:
- Target clip duration (15s / 30s / 60s / 90s presets).
- Video genre (podcast, sermon, vlog, educational, comedy, news…).
- Auto Hook — generate a punchy top-bar title for each clip.
- Hook Intro — cold-open the first second with the hook line for a stronger thumb-stopper.
- Auto Captions — burn animated word-by-word captions into each clip.
- Smart Trimming — let the AI return
omitted_rangesso it can tighten clips beyond just start/end.
The prompt updates live as you flip toggles. When you're happy, click Copy prompt.
4. Paste the prompt into any AI chat (you choose!)
Important: Essence Cut AI does not embed any AI API. You're free to use whichever chatbot you prefer — the same prompt format works everywhere:
- ChatGPT (OpenAI) — free GPT-4o tier handles 30–60 minute transcripts fine.
- Claude (Anthropic) — Claude 3.5 Sonnet on the free tier is excellent for long context.
- Gemini (Google) — Gemini Pro has a huge context window, perfect for long podcasts.
- Mistral, DeepSeek, Grok, or any other LLM with a long context window also works.
Open your favourite chat, paste the prompt, and send. The AI will reply with a JSON block describing the clips it picked — typically 5 to 15 clips, depending on your settings and source video length.
Tip: if the AI wraps the JSON in markdown fences or extra commentary, just ask: "return ONLY valid JSON, no markdown, no extra text".
5. Paste the JSON back into the app
Copy the JSON from the chat and paste it into the right-hand panel of the Prompt Editor. Click Parse JSON. The app:
- Validates the JSON structure.
- Resolves the sentence IDs back to precise audio timestamps using the local ID map saved during transcription.
- Applies configurable padding (default 0.2 s before / 0.3 s after each cut).
- Computes the final duration of each clip including any
omitted_ranges. - Renders a card for every clip with title, duration and a preview button.
You can edit each clip before rendering — change the hook title, tweak start/end, override caption style, lock face-crop mode, or remove the clip entirely.
6. Render with your own hardware
Click Render N clips. The app uses FFmpeg with the best available encoder on your machine:
- NVIDIA NVENC (h264_nvenc / hevc_nvenc) — fastest on GeForce / RTX cards.
- AMD AMF — hardware encoding on Radeon GPUs.
- Intel QSV — Quick Sync on modern Intel CPUs/iGPUs.
- CPU x264 (libx264) — universal fallback that works everywhere.
The pipeline per clip: input keyframe cut → smart reframe (dynamic
face-crop or letterbox + blur) → burn .ass captions →
encode. Live progress is streamed per clip via WebSocket.
Output files are saved under data/projects/<project>/
and can be opened with the Open folder button on each clip card.
7. Publish or schedule
From the Social accounts section in the sidebar, connect your social channels:
- YouTube — official Google OAuth, uploads to your channel via the YouTube Data API v3.
- TikTok — Login Kit + Content Posting API, supports both inbox (draft) and direct publish modes.
- Facebook — Graph API v21, publishes to any Page you administer including pages owned by a Business Portfolio.
Then, on any rendered clip click Publish, pick the target accounts (multi-select supported), write a caption, and either publish now or schedule for a later date/time.
Scheduled posts run on our backend server, so you can close the app or shut down your PC — the post still fires on time. The server auto-deletes uploaded clip files within 24 hours.
Why no built-in AI API?
We made the copy / paste workflow a deliberate design choice. Here's why:
- You stay in control. No surprise bills, no usage caps, no API key to manage.
- You can use free tiers. ChatGPT, Claude, and Gemini all have free web tiers powerful enough for this task.
- Best model for the job. When a smarter model ships next month, you just use it — no app update required.
- Privacy. Your transcript only goes to the chat you personally chose. We never see it.
- Future-proof. The Master Prompt format is provider-agnostic. It will keep working as new AI products launch.
Everything the app can do
- YouTube import via yt-dlp (best quality + thumbnail).
- Local AI transcription (Faster-Whisper Large-v3 Turbo + Silero VAD).
- Multi-language profanity filter (English, French, German) with audio bleeping.
- Filler word and pause removal with adjustable thresholds.
- Smart 9:16 reframing with MediaPipe face detection and text-density fallback.
- Word-level animated captions burned in via FFmpeg subtitles filter.
- Hook title overlays (top-bar pill, configurable duration).
- Inline clip preview directly in the app.
- Hardware-aware encoding (NVENC / AMF / QSV / x264 auto-select).
- Multi-platform publishing (YouTube Shorts, TikTok, Facebook Pages) via official OAuth.
- Server-side scheduling — posts go out even with the PC off.
- Project manager with SQLite persistence, rename, delete.
- Per-clip locked settings so re-renders never reset your tweaks.
- Right-click context menu with cut / copy / paste / select all.
- View Logs button for easy debugging and support.
Tips for the best results
- Use a long-form talking-head video — interviews, podcasts, sermons, webinars, monologues. Music videos and gameplay won't transcribe well.
- If the AI returns invalid JSON, paste it back and ask "return ONLY valid JSON, no markdown fences".
- Render a single clip first to verify settings, then re-open the project and render the rest in bulk.
- Connect your social accounts before rendering, so the Publish button is ready the moment clips finish.
- For best caption readability, keep the source video at 1080p or higher.
- If your source has heavy on-screen text or graphics, the face-tracker will auto-switch to 16:9 letterbox to preserve readability.
Your data
All video processing — download, transcription, cleaning, rendering — runs locally on your computer. Nothing leaves your machine during those steps.
The only data uploaded to our backend is:
- Finished MP4 clips you choose to publish (auto-deleted within 24 hours).
- OAuth tokens for the social accounts you connect (encrypted at rest).
Disconnect a social account at any time from the Social accounts screen to wipe its tokens from our server immediately. Read the full Privacy Policy for details.