FilterHN

I completed the MV in one day. Below is a detailed breakdown of how it was done

1 points

by luokai

1 hour ago

| past

| 1 comment

| youtube.com

| HN

▲

luokai

1 hour ago

[-]

The Seedance 2 model is incredibly powerful, completely overshadowing all other models. This is an original video I created in just one day, though the music was previously made using Suno. In the past, producing a video like this would have taken me at least a week, and the quality wouldn’t have been nearly as good. Hollywood really needs to start rethinking its approach to content creation.

Using the latest Seedance 2 model, which is incredibly powerful, you can input a reference image along with detailed descriptions of beat timings and dance moves, and it generates high-quality shots with a director’s sense of framing. I hardly had to do any rerolls, especially considering the length of the song.

Each segment can generate up to 15 seconds, but I made a silly mistake! It turns out the "full reference" feature supports all media formats—I could have input the music along with the visuals and generated lip-syncing in one go… I ended up overcomplicating things and had to manually sync the lip movements afterward. Still, I’m pretty happy with how it turned out.

To clarify, I didn’t use any real human dance footage as reference for this video—everything was generated and then edited together. Each segment of my video is based on prompts that generally include the following elements:1. Overall atmosphere description 2. Key actions 3. Scene description: starting pose, mid-sequence body/hand movements over time, and ending pose 4. Dialogue/lyrics/sound effects at specific timestamps

Seedance 2 automatically designs camera angles based on the content, though you can also specify camera movements precisely. In the raw clip below, I didn’t describe camera angles. After generating the clips, I edited them by adding lip-sync, syncing them with the music, and adjusting the speed of some segments to match the beat.

This was a habitual mistake I made while working on this video. Initially, I followed the traditional workflow for video models: first generating reference images, then describing the actions, and so on. However, Seedance supports up to 9 images, 3 video clips, and 3 audio clips as reference materials simultaneously for each generated segment.

This multimodal reference capability is quite rare among current AI video tools. In theory, I could have directly provided the model with edited music or voice clips along with reference images for generation. But for this project, I generated the clips first and then re-generated them to add lip-sync.