Toolbox · Vibe Edit

Video AI Repos, 2026 (Prompt Library)

The open-source GitHub stack powering AI video work right now. Pick how much code you want to write — we hide anything above your level so the catalog stays useful instead of overwhelming. Every repo ships with a starter Claude Code prompt you can copy and run.

How comfortable are you with code?

You can clone a repo, follow a README, run a workflow, paste commands.

Video generation models

Open-weight foundation models you can run locally or self-host. The closed-source equivalents are Sora 2, Veo 3, Runway, Kling.

Wan 2.1
Alibaba's Apache-licensed text-to-video. The 1.3B variant runs on a mid-range gaming GPU.
Tiered model sizes (1.3B → 14B). The 1.3B fits in 8 GB VRAM and is the gentlest entry to local video gen. Multilingual, supports image-to-video and video-to-audio.
Tinkerer8 GB VRAM (1.3B)Apache 2.0
Wan 2.2
Mixture-of-Experts upgrade — separate high-noise and low-noise experts handle layout vs detail.
Currently the gold standard for open-source quality. Worth the bigger VRAM ask if you have a 24 GB+ card.
Tinkerer24 GB+ VRAMApache 2.0
HunyuanVideo
Tencent's 13B unified text-to-video, image-to-video, video-to-video model.
Best-in-class temporal consistency past 10 seconds. Strong character/face stability and prompt-driven camera moves. Quantized v1.5 with offload runs on 14 GB.
Tinkerer14 GB VRAM (quantized)Tencent Hunyuan Community
LTX-Video
Lightricks' speed-tier model. Multiscale rendering — fast low-res motion pass, then refined.
Real-time-ish generation. 4-second clips at 720p in under 30 seconds on a 4090. The pick when you want to iterate fast.
Tinkerer8 GB VRAMLightricks LTX-Video License
Mochi 1
10B Asymmetric Diffusion Transformer with a custom VAE that compresses video 128x.
Best-in-class motion realism — water, fabric, jitterless camera. Apache 2.0, so safe for commercial pipelines.
Tinkerer16–24 GB VRAMApache 2.0
CogVideoX
5B model from Zhipu AI focused on prompt adherence and reproducibility.
The most researcher-friendly option. Solid Diffusers support, quantizes cleanly to 8-bit. Strong for cinematic story-driven content.
Tinkerer16 GB VRAMApache 2.0
SkyReels V1
HunyuanVideo fine-tuned on 10M+ film and TV clips for human-centric scenes.
If your output needs realistic people, this is the pick. Cinematic-quality faces and motion.
Tinkerer24 GB+ VRAMOpen — see repo

Workflow engines

The orchestration layer. Most AI video work happens inside one of these — they wire models, audio, and post-processing together.

ComfyUI
Node-based GUI/backend for diffusion models. The de facto orchestration layer.
Every new video model — Wan, Hunyuan, LTX, Mochi — ships with a ComfyUI workflow file as standard. Used by Netflix, Apple, Ubisoft. Raised $30M at $500M valuation in April 2026.
TinkererAny GPUGPL-3.0
HyperFrames
HeyGen's open-source HTML-to-video framework. Compose videos by writing HTML, CSS, and JS — no React, no DSL.
Explicitly agent-first. LLMs already know HTML inside-out, which means Claude Code and Cursor write compositions natively. Deterministic renders, swappable animation runtimes (GSAP, Lottie, Three.js). Inspired by Remotion but lighter to author.
TinkererCPU / browserApache 2.0

Transcription & video understanding

Pull existing video into text the LLMs can chew on. Foundation of every video-to-LLM pipeline.

Whisper
OpenAI's reference speech-to-text. The default for serious transcription.
Foundation of nearly every video-to-LLM pipeline. Outperforms human transcribers in most conditions except heavy background noise.
TinkererCPU or GPUMIT
whisper.cpp
C/C++ port of Whisper. Runs on Apple Silicon, Raspberry Pi, anywhere — no Python.
Embeds transcription into desktop and mobile apps. The pick when Python isn't an option.
TinkererCPU (incl. Apple Silicon)MIT
yt-dlp
Downloads video and audio from YouTube, TikTok, Instagram, Twitter, ~1000 other sites.
First step of every 'make AI summarize this video' pipeline. Pair with Whisper for an LLM-ready transcript from any URL.
TinkererCPUUnlicense
Whishper
100% local transcription + subtitling suite with a web UI. Powered by Faster-Whisper.
Self-hosted alternative to Otter.ai or Rev. Pulls from any yt-dlp-supported URL, edits subtitles, translates via LibreTranslate. Offline.
Non-coderCPU or GPUAGPL-3.0
AI-Video-Transcriber
Transcribe + summarize videos through any OpenAI-compatible API.
Subtitle-first architecture — uses native YouTube subtitles when present (instant) and falls back to Whisper. Much faster than naive pipelines.
TinkererCPU or GPUMIT

End-to-end AI pipelines

AI agents that handle the full loop: script → voiceover → visuals → assemble → upload.

Video Use
Edit videos by chatting with Claude Code. Drop raw footage in a folder, get a finished mp4.
Cuts filler words and dead space, auto color-grades, burns subtitles, generates animation overlays via HyperFrames / Remotion / Manim, and self-evaluates each cut for jumps and audio pops. Agent-native end-to-end editing — no presets, no menus.
TinkererCPU + ElevenLabs APIOpen — see repo
Toonflow
Open-source desktop app that turns novels and scripts into animated short dramas.
Integrates AI scriptwriting, storyboarding, character generation, and video gen into one cross-platform GUI. Aimed at indie creators batching content cheaply.
Non-coderDesktop (CPU/GPU)Open — see repo

Avatars & talking heads

Lip-sync, dubbing, AI presenters. Powers most multilingual-video and faceless-channel pipelines.

Wav2Lip
Lip-sync any audio to any face video.
Foundation tech behind half the AI dubbing startups. Pair with Whisper + GPT translation + ElevenLabs voice cloning for full multilingual pipelines.
TinkererGPUResearch / non-commercial
VideoReTalking
Modern lip-sync that handles head pose and lighting better than Wav2Lip.
The drop-in upgrade when Wav2Lip outputs look too obviously synced. Better quality on natural-talking footage.
TinkererGPUOpen — see repo

Honorable mentions

Not glamorous, but every serious video pipeline depends on at least one of these.

AnimateDiff
Adds motion to Stable Diffusion outputs.
Plugs into existing SD workflows — your LoRAs and checkpoints still work. Cheapest way to get animation out of an image-gen stack you already run.
Tinkerer8–16 GB VRAMApache 2.0
Stable Video Diffusion
Image-to-video from Stability AI. Older but still in active use.
When you need a known-quantity image-to-video baseline. Mature ComfyUI integrations.
Tinkerer16 GB VRAMStability Community License

Hardware cheat sheet

VRAM is the gating factor for almost every model in this catalog. Match your tier to the picks below.

Tier	VRAM	Best models
Entry	8–12 GB	Wan 2.1 (1.3B), LTX-Video, AnimateDiff
Mid	16–24 GB	LTX-Video, CogVideoX-5B, HunyuanVideo (quantized), Mochi 1
High	24 GB+	Wan 2.2 (14B), HunyuanVideo (full), SkyReels V1
Cloud	40–80 GB	Mochi 1 (full), Open-Sora-Plan, HunyuanVideo (uncompressed)

About this list

Stars, licenses, and VRAM numbers move fast — verify the repo before betting a pipeline on it. Found something missing? Tell us and we'll add it.