Katsuhiro Otomo Style ?️ HunyuanVideo版本v0.3 (ID: 1285869)

Katsuhiro Otomo Style ?️ HunyuanVideo版本v0.3 (ID: 1285869)

Description

This LoRA is fine-tuned on screencaps from Katsuhiro Otomo's animated movies Akira and Steamboy. I haven’t yet figured out the optimal way to train a style using videos which delivers acceptable results (currently, I get better results with images than with videos). And the best parameters for training HV are not yet fully understood. However, it's not the last in my series of anime LoRAs for HunyuanVideo, so I hope to eventually determine the best fine-tuning parameters, as I’m confident that training with clips is superior to using images.

As for this LoRA, while I really liked how it turned out, it's not perfect. I will try to resolve some problems in the next version soon enough (and it will probably involve video data). Katsuhiro Otomo deserves a better LoRA.

Usage

Tested using default ComfyUI workflow with added LoRALoaderModelOnly node, should probably work with Kijai's wrapper (but I don't know for sure, since I don't use it). I use:

guidance: 7.0
steps: 30

Also I changed temporal_size to 76 because of this.

Images in the showcase were generated in 640x480 resolution, 73 frames (generation of each clip took approximately 4m30s on RTX 3090 with triton/sage-attention enabled). HV output is resolution-dependent, lower resolutions were more likely to introduce some unnecessary artifacts.

The trigger words are "Katsuhiro Otomo style". Most of the prompts for gallery were generated by ChatGTP/Claude according to the following input:

Use the following template to create 20 prompts for video generation model: "Katsuhiro Otomo style. {CAMERA MOVEMENT} camera. {CHARACTER, with a brief description of their appearance and key visual traits}, who is {specific, dynamic action with strong visual cues}. The background is {concise, vivid description of the environment with notable features and mood-setting details}."
Distinct camera movement types include zoom in, zoom out, pan up, pan down, pan left, pan right, tilt up, tilt down, tilt left, tilt right, around left, around right, static shot and handheld shot.
Use concise descriptions, split complex descriptions into multiple sentences. Avoid vague terms or abstract expressions.
The topic is blonde girls and various machinery in dynamic scenes in style of Akira or Steamboy, but without mentioning these titles.

I only randomly switched topics to something like 'seductive semi-naked blonde girls in post-apocalyptic environments' and experimented with various shot types and camera movements, while keeping the core structure unchanged. (The only manual prompts are the ones that contain typos ?)

If it leans to semi-realistic style or fallbacks to generic anime, then:

  • try to get rid of terms related to photography, such as "close-up" or "wide shot", describe instead subjects directly, e.g., "a blonde woman with gray eyes and a wide nose."

  • try to avoid abstract adverbs "noble warrior", "fierce girl", because in may introduce ambiguity, try to be concrete

  • increase resolution and/or frame count

  • change seed ?

  • also, if the scene feels too static, try to enhance its dynamism by using words like "dynamic", "expressive", "reacting emotionally while...", "...with visible confusion", etc., to convey a sense of motion.

I apologize for the current inconveniences in using this LoRA, such as possible inconsistency in style. I hope to address these issues and enhance style consistency in the next version.

Fine-tuning details

As mentioned earlier, this LoRA was fine-tuned exclusively on images. A total of 103 screencaps (1500x806) were used: 62 from Akira and 41 from Steamboy (dataset is included). Captioning was done with CogVLM2, I don't remember the exact prompt for captioning, it was something like "Create a brief description of this image without describing style details". I suspect this was suboptimal because HV seems to dislike short prompts and has a preferred prompt structure.

Fine-tuning was done with diffusion-pipe, Windows 11 WSL2, 64 GB RAM, RTX 3090. The only modified training parameters were:

rank = 16
lr = 5e-5

Dataset params were also default, except:

resolutions = [768]

描述:

训练词语: Katsuhiro Otomo style

名称: katsuhiro_otomo_hv_v03_27.safetensors

大小 (KB): 157520

类型: Model

Pickle 扫描结果: Success

Pickle 扫描信息: No Pickle imports

病毒扫描结果: Success

名称: katsuhiro_otomo_v03_dataset.zip

大小 (KB): 6262

类型: Training Data

Pickle 扫描结果: Success

Pickle 扫描信息: No Pickle imports

病毒扫描结果: Success

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

Katsuhiro Otomo Style ?️ HunyuanVideo

资源下载
下载价格VIP专享
仅限VIP下载升级VIP
犹豫不决让我们错失一次又一次机会!!!
原文链接:https://1111down.com/1190042.html,转载请注明出处
由于网站升级,部分用户密码全部设置为111111,登入后自己修改, 并且VIP等级提升一级(包月提升至包季,包季提升到包年 包年提升至永久)
没有账号?注册  忘记密码?

社交账号快速登录