Style of Yoko Kuno版本v0.9 (ID: 1099140) 综合资源合集综合资源合集

Overview

Yoko Kuno (about her, interview) is a rising star in Japanese animation. Born in 1990 in Tsukuba City, Ibaraki Prefecture, she has experience as a director, animator, manga artist, and illustrator. She holds a background in graphic design from Tama Art University, graduating in 2013. Her graduation project, "Airy Me", was distinguished by a unique artistic style that blends expressionism and avant-garde elements.

In 2015, she was selected as the rotoscope animation director for Shunji Iwai's "Hana to Alice: Satsujin Jiken" ("The Murder Case of Hana & Alice"). This opportunity helped launch her career further in the industry. Since then, she has contributed to various projects, including directing and key animation for "Land of the Lustrous", concept design for "Penguin Highway", and character design, storyboarding, direction, and key animation for the film "Crayon Shin-chan: Mononoke Ninja Chinpūden".

She made her full-fledged directorial debut in mainstream feature-length animation in 2024 with "Bakeneko Anzu-chan" ("Ghost Cat Anzu"), which was first filmed in live action and then rotoscoped under Kuno's direction based on this footage. The live-action footage was directed by co-director Nobuhiro Yamashita. "Ghost Cat Anzu" received favorable reviews and was chosen as eligible for consideration for the 2025 Oscars in the Animated Feature category.

Yoko Kuno’s style is clearly not limited to the one demonstrated by this LoRA, she is a much more versatile artist and animator. Just watch her Airy Me, which, while being really disturbing (at least for me), at the same time, is mesmerizing. She is also a mangaka; see "A Horn and Love of Amagi Yuiko", winner of the 21st Manga Division New Face Award. I hope to eventually get this book ~~and steal her manga style as well~~.

But I personally learned about her work through McDonald's Japan’s Kiki's Delivery Service collaboration commercial. I really liked its style and wanted to reproduce it using the Flux LoRA. Shortly after that, I found out she has a full-fledged animated film being released - "Ghost Cat Anzu", which shares the same style as that commercial. So, while it's not entirely accurate to define her style by just these two works, I personally adore her mainstream animation projects' style and, at least for the initial version of the LoRA, I decided to use exclusively shots from these two works (see more details in the Train section below).

There is also something in her her rotoscoping animation that cannot be captured with static pictures. I highly recommend watching "Ghost Cat Anzu", it’s a wholesome and fun film, and while it’s not on Ghibli's level of storytelling magic, Yoko Kuno's narrative art style has a lot of potential.

Usage

All images published here contain ComfyUI metadata and were generated with the following settings:

Model: flux1-dev (fp8e4m3fn)

Text Encoder: t5pxxl_fp16

Sampler: euler

Scheduler: 24 steps (normal)

Flux Guidance: 4

LoRA Strength: 1

It seems to work without any trigger phrase. But I usually prefix all prompts with "In style of Yoko Kuno", just in case.

Training

For this LoRA, I used screencaps from various fragments of the Kiki's Delivery Service McDonald’s commercial and trailers for "Bakeneko Anzu-chan" (I don't yet have access to full film to get actual screencaps). I extracted every frame out of them using ffmpeg and selected the best. Then, I did cropping, inpainting watermarks, etc. This resulted in 205 high-quality images, which I captioned using CogVLM2-Chat-19B. The captioning prompt was:

"Describe this image without describing style details. Start description with phrase 'Image in style of Yoko Kuno, depicting...' and then goes your description."

I checked all captions (although CogVLM2 provides surgical accuracy in describing even complex scenes) and added explicit name tags for Kiki, Jiji, Karin, and Anzu. However, I didn't have high hopes for the tags to help in prompting these characters in the finalized LoRA, I added them mostly for completeness. And indeed, as expected, the concepts of Kiki and Karin, as well as Jiji and Anzu, tend to blend into each other (with Anzu’s features being more dominant). Even worse, therefore the model sometimes has some difficulties in correctly depicting Jiji, which is actually key metric for evaluating image generation model quality.

I plan to retrain this model with more high-quality screencaps from "Ghost Cat Anzu" once I gain access to the full film. This is why the current version isn't marked as version 1 yet. I'm also considering including data from her other works, such as "Hana to Alice: Satsujin Jiken", and possibly other projects, provided they don't conflict with her mainstream animation style. I'm also dissatisfied with how the backgrounds are rendered; they often fallback to a realistic style, and achieving their "watercolor" look wasn't successful.

Despite this, I think this is the best LoRA I’ve made so far. Its prompt adherence and anatomical coherence are really good (though I might be biased). Among all the images I’ve published here, there are very few cherry-picks; I almost always achieve correct anatomy and witness all the prompt’s details on the first try.

The LoRA was fine-tuned on an RTX 3090 using the AI Toolkit. Notable training hyperparameters include:

Rank: 8

Alpha: 16

Optimizer: prodigy

Steps: 10000

Batch size: 1

Learning rate: 1

Learning rate scheduler: constant

Decouple: true

Use bias correction: false

Betas: (0.9, 0.99)

Weight decay: 0.05

Noise offset: 0.1

And instead of the regular FLUX.1-dev, the LoRA was trained on Flux-Dev2Pro (link). While I don't have any empirical numerical evidence that this improved model quality, my personal observations suggest that it enhanced anatomy and prompt adherence. In fact, during testing, which took about two days and involved identifying the best checkpoint by simultaneously testing eight LoRAs and visually evaluating the generated images on the same seed and prompt - I occasionally noted that the original (no LoRA) images tended to follow prompts less accurately and exhibited more anatomy errors compared to the images generated with the LoRA applied.