Katsuhiro Otomo Style ?️ HunyuanVideo版本v0.3 (ID: 1285869) 综合资源合集综合资源合集

Description

This LoRA is fine-tuned on screencaps from Katsuhiro Otomo's animated movies Akira and Steamboy. I haven’t yet figured out the optimal way to train a style using videos which delivers acceptable results (currently, I get better results with images than with videos). And the best parameters for training HV are not yet fully understood. However, it's not the last in my series of anime LoRAs for HunyuanVideo, so I hope to eventually determine the best fine-tuning parameters, as I’m confident that training with clips is superior to using images.

As for this LoRA, while I really liked how it turned out, it's not perfect. I will try to resolve some problems in the next version soon enough (and it will probably involve video data). Katsuhiro Otomo deserves a better LoRA.

Usage

Tested using default ComfyUI workflow with added LoRALoaderModelOnly node, should probably work with Kijai's wrapper (but I don't know for sure, since I don't use it). I use:

guidance: 7.0

steps: 30

Also I changed temporal_size to 76 because of this.

Images in the showcase were generated in 640x480 resolution, 73 frames (generation of each clip took approximately 4m30s on RTX 3090 with triton/sage-attention enabled). HV output is resolution-dependent, lower resolutions were more likely to introduce some unnecessary artifacts.

The trigger words are "Katsuhiro Otomo style". Most of the prompts for gallery were generated by ChatGTP/Claude according to the following input:

Use the following template to create 20 prompts for video generation model: "Katsuhiro Otomo style. {CAMERA MOVEMENT} camera. {CHARACTER, with a brief description of their appearance and key visual traits}, who is {specific, dynamic action with strong visual cues}. The background is {concise, vivid description of the environment with notable features and mood-setting details}."

Distinct camera movement types include zoom in, zoom out, pan up, pan down, pan left, pan right, tilt up, tilt down, tilt left, tilt right, around left, around right, static shot and handheld shot.

Use concise descriptions, split complex descriptions into multiple sentences. Avoid vague terms or abstract expressions.

The topic is blonde girls and various machinery in dynamic scenes in style of Akira or Steamboy, but without mentioning these titles.

I only randomly switched topics to something like 'seductive semi-naked blonde girls in post-apocalyptic environments' and experimented with various shot types and camera movements, while keeping the core structure unchanged. (The only manual prompts are the ones that contain typos ?)

If it leans to semi-realistic style or fallbacks to generic anime, then:

try to get rid of terms related to photography, such as "close-up" or "wide shot", describe instead subjects directly, e.g., "a blonde woman with gray eyes and a wide nose."
try to avoid abstract adverbs "noble warrior", "fierce girl", because in may introduce ambiguity, try to be concrete
increase resolution and/or frame count
change seed ?
also, if the scene feels too static, try to enhance its dynamism by using words like "dynamic", "expressive", "reacting emotionally while...", "...with visible confusion", etc., to convey a sense of motion.

I apologize for the current inconveniences in using this LoRA, such as possible inconsistency in style. I hope to address these issues and enhance style consistency in the next version.

Fine-tuning details

As mentioned earlier, this LoRA was fine-tuned exclusively on images. A total of 103 screencaps (1500x806) were used: 62 from Akira and 41 from Steamboy (dataset is included). Captioning was done with CogVLM2, I don't remember the exact prompt for captioning, it was something like "Create a brief description of this image without describing style details". I suspect this was suboptimal because HV seems to dislike short prompts and has a preferred prompt structure.

Fine-tuning was done with diffusion-pipe, Windows 11 WSL2, 64 GB RAM, RTX 3090. The only modified training parameters were: