Toolbox · Vibe Edit
Video AI Repos, 2026 (Prompt Library)
The open-source GitHub stack powering AI video work right now. Pick how much code you want to write — we hide anything above your level so the catalog stays useful instead of overwhelming. Every repo ships with a starter Claude Code prompt you can copy and run.
How comfortable are you with code?
You can clone a repo, follow a README, run a workflow, paste commands.
Video generation models
Open-weight foundation models you can run locally or self-host. The closed-source equivalents are Sora 2, Veo 3, Runway, Kling.
Alibaba's Apache-licensed text-to-video. The 1.3B variant runs on a mid-range gaming GPU.
Tiered model sizes (1.3B → 14B). The 1.3B fits in 8 GB VRAM and is the gentlest entry to local video gen. Multilingual, supports image-to-video and video-to-audio.
Tinkerer8 GB VRAM (1.3B)Apache 2.0Mixture-of-Experts upgrade — separate high-noise and low-noise experts handle layout vs detail.
Currently the gold standard for open-source quality. Worth the bigger VRAM ask if you have a 24 GB+ card.
Tinkerer24 GB+ VRAMApache 2.0Tencent's 13B unified text-to-video, image-to-video, video-to-video model.
Best-in-class temporal consistency past 10 seconds. Strong character/face stability and prompt-driven camera moves. Quantized v1.5 with offload runs on 14 GB.
Tinkerer14 GB VRAM (quantized)Tencent Hunyuan CommunityLightricks' speed-tier model. Multiscale rendering — fast low-res motion pass, then refined.
Real-time-ish generation. 4-second clips at 720p in under 30 seconds on a 4090. The pick when you want to iterate fast.
Tinkerer8 GB VRAMLightricks LTX-Video License10B Asymmetric Diffusion Transformer with a custom VAE that compresses video 128x.
Best-in-class motion realism — water, fabric, jitterless camera. Apache 2.0, so safe for commercial pipelines.
Tinkerer16–24 GB VRAMApache 2.05B model from Zhipu AI focused on prompt adherence and reproducibility.
The most researcher-friendly option. Solid Diffusers support, quantizes cleanly to 8-bit. Strong for cinematic story-driven content.
Tinkerer16 GB VRAMApache 2.0HunyuanVideo fine-tuned on 10M+ film and TV clips for human-centric scenes.
If your output needs realistic people, this is the pick. Cinematic-quality faces and motion.
Tinkerer24 GB+ VRAMOpen — see repo
Workflow engines
The orchestration layer. Most AI video work happens inside one of these — they wire models, audio, and post-processing together.
Node-based GUI/backend for diffusion models. The de facto orchestration layer.
Every new video model — Wan, Hunyuan, LTX, Mochi — ships with a ComfyUI workflow file as standard. Used by Netflix, Apple, Ubisoft. Raised $30M at $500M valuation in April 2026.
TinkererAny GPUGPL-3.0HeyGen's open-source HTML-to-video framework. Compose videos by writing HTML, CSS, and JS — no React, no DSL.
Explicitly agent-first. LLMs already know HTML inside-out, which means Claude Code and Cursor write compositions natively. Deterministic renders, swappable animation runtimes (GSAP, Lottie, Three.js). Inspired by Remotion but lighter to author.
TinkererCPU / browserApache 2.0
Transcription & video understanding
Pull existing video into text the LLMs can chew on. Foundation of every video-to-LLM pipeline.
OpenAI's reference speech-to-text. The default for serious transcription.
Foundation of nearly every video-to-LLM pipeline. Outperforms human transcribers in most conditions except heavy background noise.
TinkererCPU or GPUMITC/C++ port of Whisper. Runs on Apple Silicon, Raspberry Pi, anywhere — no Python.
Embeds transcription into desktop and mobile apps. The pick when Python isn't an option.
TinkererCPU (incl. Apple Silicon)MITDownloads video and audio from YouTube, TikTok, Instagram, Twitter, ~1000 other sites.
First step of every 'make AI summarize this video' pipeline. Pair with Whisper for an LLM-ready transcript from any URL.
TinkererCPUUnlicense100% local transcription + subtitling suite with a web UI. Powered by Faster-Whisper.
Self-hosted alternative to Otter.ai or Rev. Pulls from any yt-dlp-supported URL, edits subtitles, translates via LibreTranslate. Offline.
Non-coderCPU or GPUAGPL-3.0Transcribe + summarize videos through any OpenAI-compatible API.
Subtitle-first architecture — uses native YouTube subtitles when present (instant) and falls back to Whisper. Much faster than naive pipelines.
TinkererCPU or GPUMIT
End-to-end AI pipelines
AI agents that handle the full loop: script → voiceover → visuals → assemble → upload.
Edit videos by chatting with Claude Code. Drop raw footage in a folder, get a finished mp4.
Cuts filler words and dead space, auto color-grades, burns subtitles, generates animation overlays via HyperFrames / Remotion / Manim, and self-evaluates each cut for jumps and audio pops. Agent-native end-to-end editing — no presets, no menus.
TinkererCPU + ElevenLabs APIOpen — see repoOpen-source desktop app that turns novels and scripts into animated short dramas.
Integrates AI scriptwriting, storyboarding, character generation, and video gen into one cross-platform GUI. Aimed at indie creators batching content cheaply.
Non-coderDesktop (CPU/GPU)Open — see repo
Avatars & talking heads
Lip-sync, dubbing, AI presenters. Powers most multilingual-video and faceless-channel pipelines.
Lip-sync any audio to any face video.
Foundation tech behind half the AI dubbing startups. Pair with Whisper + GPT translation + ElevenLabs voice cloning for full multilingual pipelines.
TinkererGPUResearch / non-commercialModern lip-sync that handles head pose and lighting better than Wav2Lip.
The drop-in upgrade when Wav2Lip outputs look too obviously synced. Better quality on natural-talking footage.
TinkererGPUOpen — see repo
Honorable mentions
Not glamorous, but every serious video pipeline depends on at least one of these.
Adds motion to Stable Diffusion outputs.
Plugs into existing SD workflows — your LoRAs and checkpoints still work. Cheapest way to get animation out of an image-gen stack you already run.
Tinkerer8–16 GB VRAMApache 2.0Image-to-video from Stability AI. Older but still in active use.
When you need a known-quantity image-to-video baseline. Mature ComfyUI integrations.
Tinkerer16 GB VRAMStability Community License
Hardware cheat sheet
VRAM is the gating factor for almost every model in this catalog. Match your tier to the picks below.
| Tier | VRAM | Best models |
|---|---|---|
| Entry | 8–12 GB | Wan 2.1 (1.3B), LTX-Video, AnimateDiff |
| Mid | 16–24 GB | LTX-Video, CogVideoX-5B, HunyuanVideo (quantized), Mochi 1 |
| High | 24 GB+ | Wan 2.2 (14B), HunyuanVideo (full), SkyReels V1 |
| Cloud | 40–80 GB | Mochi 1 (full), Open-Sora-Plan, HunyuanVideo (uncompressed) |
About this list
Stars, licenses, and VRAM numbers move fast — verify the repo before betting a pipeline on it. Found something missing? Tell us and we'll add it.