A new Nous Research skill turns ordinary HTML into rendered MP4 โ and the first thing I shipped with it is the post you're reading.
HyperFrames is a new optional skill in NousResearch/hermes-agent that captures HTML/CSS/GSAP compositions to MP4. Authors design in HTML, define a timeline, and the framework hands back a polished video. I installed it, defined a visual identity, wrote a thirty-second narration, generated a few AI images for backdrop, and rendered the summary video at the top of this page. Total wall-clock time on a fresh project: under fifteen minutes.
HyperFrames lives at NousResearch/hermes-agent/optional-skills/creative/hyperframes on GitHub. The pitch is genuinely cool: instead of fighting After Effects timelines or wrestling with Remotion's React-only world, you author videos as plain HTML documents. CSS handles layout. GSAP handles animation. A small set of conventions (data-start, data-duration, data-track-index, a registered window.__timelines entry) tells the renderer what to capture.
Then a CLI walks the timeline frame-by-frame in headless Chrome and stitches the result with FFmpeg. Output: standard H.264/AAC MP4. Plays everywhere.
What makes it interesting for an AI-agent workflow:
Math.random(), no Date.now(), no network fetches in compositions โ the framework lints for it. Every render is reproducible.npm run check catches GSAP/CSS transform conflicts, missing classes, contrast issues, and layout overflows before you waste minutes on a render.The skill is in the optional-skills/ tree, so it isn't loaded by default. The whole install on the VPS was three commands and a Node version check.
# Make sure Node 20+ is on PATH
node --version # I'm on v22.22.2
# Pull and add the skill (it ships with hermes-agent)
cd ~/.hermes/hermes-agent
git pull
# Add audio prerequisites for local TTS + transcription
pip install kokoro-onnx soundfile
The CLI itself runs via npx hyperframes โ no global install needed. From a project directory you get npm run dev, npm run check, npm run render, and npm run publish. Everything else is just authoring HTML.
One command scaffolds a project:
mkdir -p ~/hyperframes-projects/hyperframes-blog-summary
cd ~/hyperframes-projects
npx hyperframes init hyperframes-blog-summary --non-interactive --example product-promo
That gave me a working index.html root composition, a compositions/ directory for sub-comps, an assets/ folder, and a meta.json. The example was fine to study โ and easy to delete once I had my own scenes drafted.
Every project follows the same shape:
DESIGN.md that locks the palette, fonts, and tone. The agent reads this before drafting compositions, which keeps brand drift out of the output.SCRIPT.md โ narration text in plain English, sized to the runtime you want. ~120 words gives you a comfortable thirty-second video.edge-tts with en-US-AndrewNeural for voice consistency with the existing book reviews on the site. The script writes both an MP3 and an SRT subtitle file in one pass.<div class="scene clip"> with data-start and data-duration. A single GSAP timeline orchestrates everything, paused, registered on window.__timelines under the composition's id.npm run check validates structure, runs the page in headless Chrome, samples nine timeline points for layout/contrast issues, and flags any timing math that doesn't add up.npm run render -- --quality high emits a 1080p MP4 in the renders/ folder.Image dimensions. Venice's image API caps width at 1280, so requesting 1920ร1080 returns a 400. Generate at 1280ร720 (or 1024 square) and let CSS background-size: cover fill the frame.
Composition root. The root element needs id="root", data-composition-id, data-width, data-height, and data-start="0" with data-duration. Missing the duration attribute makes the inspector blow up with a cryptic totalDuration error.
Timeline key must match composition id. If your composition is data-composition-id="my-promo", your timeline must register as window.__timelines["my-promo"] โ not "root". The lint catches this; trust it.
The first thing I rendered was a thirty-second promo for my book review of The Four Agreements. The whole pipeline โ DESIGN.md, SCRIPT.md, TTS, captions, animation, render, deploy โ took one focused session. The result lives on a dedicated page where you can watch the promo with all the site styling and a link back to the full review.
That gave me enough confidence to use it for this very blog post. The video at the top of this article is itself a HyperFrames render โ same pipeline, same Andrew Neural voice, same riverside palette. The composition source lives in ~/hyperframes-projects/hyperframes-blog-summary/index.html, the AI backdrop images came from the venice-ai-media skill, and the whole thing rendered in a couple of minutes at --quality high.
Most AI-built websites are walls of text with the occasional generated image. That's fine, but it's also flat. HyperFrames is the first tool I've used where an agent can author a real video as code โ review-able, version-able, lint-able โ without ever opening a non-determinstic editor. The output isn't a slideshow with Ken Burns transitions; it's actual motion graphics, with synced narration and captions, that an agent can iterate on the way it iterates on a CSS file.
Two practical things I'm now planning:
It's not magical. A few things to keep in mind:
--quality draft is the right move; only switch to high when you're happy.edge-tts emits SRT phrases, not per-word timestamps. If you want karaoke-style captions, route the audio through Whisper to get word-level transcript.json first.None of those are dealbreakers. They're the normal trade-offs you'd expect from a tool that's letting you author videos as HTML โ and they're well documented in the skill itself.
If you have a Hermes setup already, the skill is one git pull away. If you don't, the GitHub page has the source: github.com/NousResearch/hermes-agent. Worth the half-hour to bootstrap a project and render your first thing.
I'll be using it a lot.
Related reads: Agent Zero on AntSeed ยท Hermes AI Agent Guide ยท Dual Agents Helping Each Other
๐ฌ Comments