Belle Delphine - Flux.dev版本v1.0 (ID: 760989) 综合资源合集综合资源合集

This is a LoRA of the internet celebrity Belle Delphine for Flux.dev.

Trigger word: “Belle Delphine”.

Suggested LoRA weight: 0.6 – 1.1

The model is trained at 512, 768 and 1024 resolutions.

As with most Flux LoRAs the model is quite flexible. Although this specific LoRA is also overtrained (which was slightly mitigated in retrospect, more about that in the training section), which results in worse prompt following, especially considering text (Although with multiple tries it is still fine, as can be seen in the example images).

Quite a bit of a additional tags were captioned, however the training did not retain them. Two triggers where it might make a small difference are: braces and snapchat

Images were exclusively generated in ComfyUI.

Training

As always, I will add a little bit about the training.

I saw Flux with interest and originally planned to wait for a bit more maturity considering the software tools, but by now I have seen sufficiently many things by other people that I thought it might be time.

Flux has shown great resemblance for persons even with a low image count, so it was very likely that people will already produce a very good Flux LoRA of Belle Delphine before I do anything (which was the case as I expected). However, I always use the Belle Delphine dataset for testing new stuff for myself, as there is so much data of her available. So I decided: screw it - and trained this model with the same large dataset I used for the LoRA version for the pony checkpoint.

This dataset already had both booru tags as captions and natural language generated by a VLM. However, I felt like the quality of the natural language was too low (as it was also configured to use shorter prompts), so I decided to recaption all the images in the dataset. I used InternVL2-8B for this and generated more diverse captions (both longer and shorter). For the training I exclusively used those new natural language captions.

Then I had to decide which trainer to use. I planned on using kohya, however reading through several GitHub issues it seems like it wasn’t quite there yet, so I went with the ai-toolkit by ostris instead. Of course, this meant missing out on some nice features like masked training (as I had masks for the entire dataset). I feared that this might negatively affect watermarks, so I took special care to label them more explicitly in the captions, hoping the model would then not generate them if it is not in the prompt (which at least semi worked).

People also recommend low step counts, however they also use low image counts like 15. I on the other hand have multiple magnitudes more data than that, so I just decided on a step count of 30000 steps (without any scientific reason). And as I don’t know how fast it will overfit or turn bad I saved the checkpoint every 600 steps, leading to 50 versions of the LoRA.

And my fears turned out to be true:

At 30000 steps the model is mostly worse than at 10000 steps, while at the same time not having learnt some concepts which were tagged. So the conclusion is using a more refined smaller dataset and then maybe additional LoRAs for more concepts? I will see when I have the time again and then maybe either add a high amount of steps to see if the result changes then, or try training multiple smaller datasets and combining the resulting models.

To at least save the overcooked model a little bit, I decided to merge different versions of it, specifically: 10200 Steps – 50%; 6000 Steps – 8%; 18000 Steps – 20%; 30000 Steps – 22%

To my dismay the kohya scripts did not support the LoRA trained with ostris out of the box, so I had to manually adjust the script so that the merging worked. I then also planned on at least reducing the size slightly, as I trained with rank 16 (which is the default) and once again had to modify some code for that. I resized from 16 to 16, meaning it should keep the same quality, while shaving off roughly 60MB.

Overall, I am only semi happy with the results, but I mean I now spent the time on it, so I thought I could share the model once again.

Training was done on a 4090 in about 24 hours.

Other relevant training settings:

Alpha + Dim = 16
Caption Dropout at 5%
Trained at 512, 768, 1024 resolutions
Batch size 1
Noise Scheduler flowmatch
Enabled linear timesteps
Optimizer adamw8bit
Learn rate 1.69e-4
Trained on the quantized model

Disclaimer

I want to highlight again that this model is non-commercial, and you should only post images on CivitAI which follow the Content Rules.

Users are solely responsible for the content they generate using this LoRA. It is the user’s responsibility to ensure that their usage of this model adheres to all applicable local, state, national and international laws. I do not endorse any user-generated content and expressly disclaim any and all liability in connection with user generations.