Mercy / Mercy Cosplay - Flux.dev版本v1.0 (ID: 776886) 综合资源合集综合资源合集

This LoRA allows depicting a realistic Mercy /Mercy Cosplay from Overwatch in Flux.dev.

More specifically, the default outfit, witch outfit, and a little bit of the winged victory and Dr. Ziegler outfit are available ... and everything the creative Flux can dream of!

I provided two versions in this upload. The example images of them use exactly the same seed, prompt and everything. The difference is in the captioning of the dataset. More about that in the training section. ¹=main variant, ²=_sc variant (short caption)

Main trigger: mercy² (/mercy cosplay, mercy outfit, ...)¹

Individual cosplay elements have been tagged, so you can – and even might have to – use them.

Halo² (/golden halo¹)
Staff (broom staff)
Pistol
Iconic hair¹, iconic hair wig¹, wig² (... or more verbose captions like iconic mercy hair wig)
Wings ([white, black, ....] mechanical wings (with [golden, yellow, glowing, transparent, ...] blades), feathered wings, angelic wings)

Overall outfits which have been tagged:

Mercy (This is the default outfit, feel free to add “plastic white body armor” or similar to get more of a plastic look, otherwise add words like fabric)
Witch mercy (this is sufficient for the outfit, however if something doesn’t work you can add some more details or specifics like: book on hip, broom staff, witch hat, ...)
Winged victory mercy (seems to be too little training data to really do it consistently or in any good way, it just does a default mercy, the following keywords will help the LoRA remember a bit: feathered wings, white and blue robe with gold trim, ivy, ... or similar words which are close to the looks of that skin)
Dr. Ziegler Mercy (also too little training data, best results by adding lab coat, iconic hair wig, ...)

Suggested LoRA weight: Depending on the style you want 0.7 – 1.0.

And with that we can talk a bit about the

Training

I had an idea for this dataset and just executed it, then I came across by the post „FLUX is smarter than you! - and other surprising findings on making the model your own” by Pyro and decided to test it out.

178 images are contained in this dataset and they are as diverse as I could get it (although no Harold or chihuahuas). I then ran InternVL2-8b to give all of them a base caption. I hoped the model was smart enough to determine the outfit with some prompting, however that wasn’t the case (at least not with my prompting skills). So, I changed the main prompt and just had it describe the general image with only very little about the main subject. Then I did a manual pass describing all stuff I wanted (the outfit type, and the specific individual items).

I then copied that dataset and reduced the captions to only the outfit and relevant items to experiment with the findings Pyro described.

This means a caption like:

“Winged Victory Mercy Cosplay. The character is standing on a rocky terrain with a waterfall in the background, surrounded by lush greenery. white and blue robe adorned with gold trim and intricate patterns. They have large, white, feathered wings attached to their back. In their right hand, they are holding a pistol. The overall scene is serene and natural, with sunlight filtering through the trees, creating a peaceful atmosphere. In the bottom left is a "Shappi" watermark.”

Turned into:

“winged victory mercy with pistol and wings, wig”

*Sidenote: I labelled the original captions always with outfit or cosplay or similar, so it does worse when just writing “Mercy” without any added text.

I then trained both of the datasets with the ai-toolkit by ostris with exactly the same settings.

They were:

Alpha, Dim: 16
Total Steps: 9000
Caption dropout: 0.05
Resolutions: 512, 768, 1024
Batch Size: 1
Noise scheduler: flowmatch
Learn rate: 1.7e-4
Linear timesteps
Quantized (with gradient checkpointing)

(Each model took roughly 6 hours on a RTX 4090)

After training safetensor keys were converted to be compatible with Kohya, to then resize the models to rank 16 again (shaving off a little storage space while losing almost no detail).

And now onto my observations:

Both LoRAs work sufficiently, and it is much easier to just caption single words.

However, I personally like the average result of the LoRA trained on long texts more. The images are more “cinematic”, so that is personal preference.

In addition, in my opinion the LoRA with more detailed captions has more fine control, but it also sometimes needs more text to obtain results. But it also reproduces less watermarks from the original source dataset.

One disadvantage it has (if you want to call it that), is that it likes to generate drawings or less realistic looking images sometimes -> if you do not specify cosplay or outfit. And in some cases, it is difficult to then get a realistic image, as you cannot just add a negative in Flux.

I also feel like the fine details, such as the specific hair shape is done better by the model with more detailed captions.

So in conclusion:

I think Pyro has valuable findings, I however do not agree entirely with them. Short captions ease the work, however are less flexible in LoRAs which allow multiple nuances. I would suggest doing short captions to save time, especially on smaller less complex datasets.

I myself however will continue using longer captions for more complex things. I do think that you really need very good captions then. If you do not want to spend the effort to do great captions, do short ones instead.

However I am very sceptical of other points Pyros mentions, I do not believe in their statement that you can ~“talk to your LLM” during captioning.

I congratulate you for reaching the end!

I would be happy to answer questions if you have any, however you will probably have to wait about a month, I will be busy in the next few weeks and will probably not be logging onto CivitAI.