Anime Diffusion

Introduction

This model aims to change the monotonic style of previous anime SD models and provides some suggestions that may be beneficial for further fine-tuning.

Usage Details

You can download two different models:

AnimeDiffusion[Base] (Pruned Model fp16): Fine-tuning model without model-merge. This model can generate images with lower quality but richer diversity.
AnimeDiffusion[Merge] (Full Model fp16): A merged model that merge AnimeDiffusion[Base] and Aurora. This model can generate higher quality images but has lower diversity.

This model supports Danbooru tags as prompt to generate images, such as:

Prompt: masterpiece, best quality, 1girl, solo, ...
Negative Prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry

Training Details

AnimeDiffusion[Base] is a fine-tuning model that is trained with 430K-730K danbooru2022 images. Based on Baka-Diffusion[General], I first trained this model on 730K images for 5 epochs, then filtered the images with Cafe-aesthetic to obtain relatively high-quality images (430K) and trained it for 3 epochs.

Here are some important training hyperparameters:

noise_offset = 0.05 This hyperparameter may help the model to generate darker or brighter images (e.g. white background). 0.05 is also used for the training of Stable Diffusion XL. (Detail)
caption_dropout_rate = 0.2As we all know, Stable Diffusion is a Text-to-Image model, which is based on Classifier-Free Guidance (CFG) for condition generation. In the CFG approach, the diffusion model is trained both unconditionally and conditionally, which means that the model is capable of both unconditional and conditional generation. This formula shows how CFG is implemented:noise_pred = noise_pred_uncond + CFC_Scale * (noise_pred_cond - noise_pred_uncond), where noise_pred_uncond is the predicted noise when there is the null prompt (i.e. unconditional), and noise_pred_cond is the predicted noise when we provide a specific prompt (i.e. conditional). According to the CFG paper, caption_dropout_rate = 0.2 or 0.1 is recommended. However, in the most of fine-tuning practices, this training strategy is never mentioned. Due to a number of issues, I am not currently doing more experiments on this, and would also appreciate active discussion on this point. (Detail)

unconditional generation

Model-merge Guidance

Understanding how to do model-merge helps us to improve the performance of our models in specific ways.

SD contains a U-Net which consists of Convolutional layers and Transformer blocks. U-Net can be viewed as an Autoencoder that continuously downsamples a picture to reduce the resolution (64x64 -> 32x32 -> 16x16 -> 8x8) to obtain abstract semantic information about the images, and subsequently upsamples (8x8 -> 16x16 -> 32x32 -> 64x64) to recover the image from this abstract semantic information.

Overall, the blocks closer to the input and the output part of the U-Net (high resolution) are responsible for the high-frequency detailed information of the images, while the blocks in the middle (low resolution) are responsible for the low-frequency semantic information of the images. When we want to change the details of the image (e.g. texture or edges), we should prioritize changing the high-resolution blocks. When we want to change the overall image (e.g. character or poses), we should prioritize changing the low-resolution blocks.

AnimeDiffusion[Merge] is a merged model which is replaced with Aurora for the high-resolution blocks to improve the quality of images.

high-res_replace: 0,0,1,1,0.67,0.67,0.33,0.33,0,0,0,0,0,0,0,0,0,0,0,0.33,0.33,0.67,0.67,1,1,0
low-res_replace : 0,0,0,0,0,0,0,0.33,0.33,0.67,0.67,1,1,1,1,1,0.67,0.67,0.33,0.33,0,0,0,0,0,0

Acknowledgements

[Danbooru2022 Datesets]: A dataset contains about 4M+ images.
[Baka-Diffusion]: The basic model used for fine-tuning.
[Aurora]: The model used for model-merge.
[Cafe-aesthetic]: Used to further filter out low-quality and non waifu images.
[SuperMerger]: Very useful for model-merge.

描述:

训练词语:

名称: animediffusion_v10.safetensors

大小 (KB): 2082642

类型: Model

Pickle 扫描结果: Success

Pickle 扫描信息: No Pickle imports

病毒扫描结果: Success

名称: WD-Vae.safetensors

大小 (KB): 163413

类型: VAE

Pickle 扫描结果: Success

Pickle 扫描信息: No Pickle imports

病毒扫描结果: Success

AnimeDiffusion

资源下载

下载价格VIP专享

仅限VIP下载升级VIP

犹豫不决让我们错失一次又一次机会！！！

原文链接：https://1111down.com/998243.html，转载请注明出处

AnimeDiffusion版本v1.0 (ID: 276248)

Anime Diffusion

Introduction

Usage Details

Training Details

Model-merge Guidance

Acknowledgements

描述:

训练词语:

在线客服

升级VIP

全屏浏览

夜间模式

繁简切换

返回顶部

AnimeDiffusion版本v1.0 (ID: 276248)

Anime Diffusion

Introduction

Usage Details

Training Details

Model-merge Guidance

Acknowledgements

描述:

训练词语:

猜你喜欢

Kesha Lora版本V1 (ID: 1298881)

Iron Patriot版本v1.0 (ID: 1291511)

Dark Ishihara版本V1 (ID: 1303876)

Female dancer posing SDXL版本V1 (ID: 1249895)

ybqy版本V1 (ID: 1314652)

Halina Pawlowská版本v1.0 (ID: 1294642)

在线客服

升级VIP

全屏浏览

夜间模式

繁简切换

返回顶部

社交账号快速登录

社交账号快速登录