SoteDiffusion Wuerstchen3版本pre-alpha1 (ID: 405877)

New version is out: https://civitai.com/models/628865/sotediffusion-v2

Anime finetune of Würstchen V3.

This release is sponsored by fal.ai/grants

Trained on 6M images for 3 epochs using 8x A100 80G GPUs.

This model can be used via API with Fal.AI

For more details: https://fal.ai/models/fal-ai/stable-cascade/sote-diffusion

Please refer to Huggingface for SD.Next UI, Diffusers or UNet models:
https://huggingface.co/Disty0/sotediffusion-wuerstchen3
CivitAI page has only the ComfyUI checkpoint models.

Inference Parameters:

Download the Main model (8.14 GB file):

https://civitai.com/api/download/models/563950?type=Model&format=SafeTensor&size=pruned&fp=fp16

Download the Decoder model (4.24 GB file):

https://civitai.com/api/download/models/563892?type=Model&format=SafeTensor&size=pruned&fp=fp16

Positives:

newest, extremely aesthetic, best quality,

Negatives:

very displeasing, worst quality, monochrome, realistic, oldest, loli,

Main:

Sampler: DDPM or DPMPP 2M with SGM Uniform
CFG: 7
Steps: 30 or 40

Decoder:

Sampler: Euler a Karras
CFG: 1 or 1.2
Steps: 10

Compression: 42 (or 32 to 64)

Resolution: 1024x1536, 2048x1152.

Anything works as long as it's a multiply of 128.

Training:

Software used: Kohya SD-Scripts with Stable Cascade branch.
https://github.com/kohya-ss/sd-scripts/tree/stable-cascade

GPU used: 8x Nvidia A100 80GB
GPU hours: 220

Base

parameters | value

amp | bf16
weights | fp32
save weights | fp16
resolution | 1024x1024
effective batch size | 128
unet learning rate | 1e-5
te learning rate | 4e-6
optimizer | Adafactor
images | 6M
epochs | 3

Final

parameters | value

amp | bf16
weights | fp32
save weights | fp16
resolution | 1024x1024
effective batch size | 128
unet learning rate | 4e-6
te learning rate | none
optimizer | Adafactor
images | 120K
epochs | 16

Dataset:

GPU used for captioning: 1x Intel ARC A770 16GB
GPU hours: 350

Model used for captioning: SmilingWolf/wd-swinv2-tagger-v3

Model used for text: llava-hf/llava-1.5-7b-hf

Command:

python /mnt/DataSSD/AI/Apps/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py --model_dir "/mnt/DataSSD/AI/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --use_rating_tags --character_tags_first --character_tag_expand --append_tags --onnx --caption_separator ", " --general_threshold 0.35 --character_threshold 0.50 --batch_size 4 --caption_extension ".txt" ./

dataset name | total images

newest : 1.85M
recent : 1.38M
mid : 993K
early : 566K
oldest : 160K
pixiv : 344K
visual novel cg : 231K
anime wallpaper : 105K
Total: 5.628.499 images

Note:

Smallest size is 1280x600 /768.000 pixels
Deduped based on image similarity using czkawka-cli
Around 120K very high quality images got intentionally duplicated 5 times, making the total image count 6.2M

Tags:

Tag Format:

Model is trained with random tag order but this is the order in the dataset if you are interested:

aesthetic tags, quality tags, date tags, custom tags, rating tags, character, series, rest of the tags

Date:

newest : 2022 to 2024
recent : 2019 to 2021
mid : 2015 to 2018
early : 2011 to 2014
oldest : 2005 to 2010

Aesthetic Tags:

Model used: shadowlilac/aesthetic-shadow-2

score > 0.90 : extremely aesthetic
score > 0.80 : very aesthetic
score > 0.70 : aesthetic
score > 0.50 : slightly aesthetic
score > 0.40 : not displeasing
score > 0.30 : not aesthetic
score > 0.25 : slightly displeasing
score > 0.10 : displeasing
rest of them : very displeasing

Quality Tags:

Model used: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth

score > 0.980 : best quality
score > 0.900 : high quality
score > 0.750 : great quality
score > 0.500 : medium quality
score > 0.250 : normal quality
score > 0.125 : bad quality
score > 0.025 : low quality
rest of them : worst quality

Rating Tags:

general
sensitive
nsfw
explicit nsfw

Custom Tags:

image boards: date,
text: The text says "text",
characters: character, series
pixiv: art by Display_Name,
visual novel cg: Full_VN_Name (short_3_letter_name), visual novel cg,
anime wallpaper: date, anime wallpaper,

License

SoteDiffusion models falls under Fair AI Public License 1.0-SD license, which is compatible with Stable Diffusion models’ license. Key points:

1. Modification Sharing: If you modify SoteDiffusion models, you must share both your changes and the original license.
2. Source Code Accessibility: If your modified version is network-accessible, provide a way (like a download link) for others to get the source code. This applies to derived models too.
3. Distribution Terms: Any distribution must be under this license or another with similar rules.
4. Compliance: Non-compliance must be fixed within 30 days to avoid license termination, emphasizing transparency and adherence to open-source values.