CherryCat版本CherryCat_v0.11 (ID: 33095)

CherryCat版本CherryCat_v0.11 (ID: 33095)

仅作学习目的,详细记录一下目前得到较好结果的训练过程,希望分享一些训练经验,

DO NOT POST YOUR NSFW WORKS HERE.

This model is clearly over-baked, and does not respond well to prompts like hair color (probably caused by the mis-tagged training data I've fed to model), I'm still working on it.

Notes on usage:

  1. Add "hair over one eye" to your negative prompts, I've fed some Rem cosplay which were not tagged well, they seem to be toxic in the results.

  2. LoRA weights around 1.0 should work fine when this model is the only one applied on.

  3. The new sampler UniPC works well for me at a iterating numbers around 30.

  4. Try different photo types such as face close-up, portrait and full-body, and different angles like from side, from behind and from above, the model will give decent results.

What is the purpose of this model?

I've spent last few days training LoRA models, and found out that most of my models were able to give a decent headshot or protrait output, but failed to keep the face structure when it came to a full-body photo.

Therefore, I'm looking forward to train a model with the capability to present both close-up and full-body photos.

The results are quite good, try different types of photo which are of various values on the proportion of the face in the photo (face close-up | portrait | full-body), the model will give good results to all of them.

Training Setup:

Using Akegarasu(秋葉)'s Lora-Scripts (based on Kohya's).

The whole training process of this model is completed on the AutoDL.com with the system image provided also by Akegarasu.

The base SD-model I've used to train this model is simply ChilloutMax, it may work with other SD-model.

Detailed scripts settings:

  1. network_dim=network_alpha=32:

    higher network dimension settings do not give better results for my datasets, so I pick the one with smallest file size.

  2. resolution="768, 1024":

    An aspect ratio of 3:4 just matches perfectly with the ratio of headshot photos.

  3. batch_size=4:

    Larger batch size will make the training faster, but requires more GPU memory, I'm using an A40 GPU, its 24GB memory supports this resolution and batch_size setting.

  4. max_train_epoches=8:

    Since this model I trained is obviously over-baked (even for the result from epoch=2), the setting for max_train_epoches is not going to be discussed here.

  5. noise_offset=0.05:

    According to the comments in scripts, this may have some effects on dynamic range of output.

  6. clip_skip=2:

    Though someone has adviced that it might be better to set clip_size to 1 while working on realistic photos, but I've found no noticeable difference here, so I kept this value as 2.

  7. The other parameters are as same as the default settings from Akegarasu's scripts.

Training Data:

From what I've observed, the key to generate a decent output image without corrupted faces, is to feed the model with high-quality photos and use regularization.

The datasets can be divided into 3 tiers:

  1. Lowest quality photos obtained from various platforms.

    These photos are mostly headshot photos and of really low-resolution, especially the ones that are even smaller than training resolution. If you're picky to the generated images, these photos should be considered as toxic to the model.

  2. Some headshot photos and few full-body photos which are not of low-resolution.

    You can generate decent upper-body or close-up images from these data, but when it comes to the full-body photo, there will be a great chance for to have a corrupted and twisted face. The lack of full-body photo and face from a full-body photo is the reason why the model behaves poorly.

  3. The Ideal condition is that you have dozens of photos, which are of high-resolution (something like 5k*7k or higher).

Regularization:

The regularization is quite helpful to train the model of a person, a simple explanation for regularization is that, telling the model where to put the person's face at.

There are plenty of ways to build both training and regularization sets, I'm just going to discuss the way I've taken here.

For any photo I have in hand, first I'll crop out the whole area where the person is located, and use this cropped photo as the regularization part. Then I'll zoom in to make the head/face fill the region of the training resolution (768*1024 in my case), and cropped the headshot as the training part.

It's not neccessary to match the training photo with the regularization photo since the scripts even allow that (num_repeats*num_photo) of regularization and training parts are not equal, but the way I build datasets just makes them match naturally.

Tagging:

As for the tagger, I use the wd14-vit-v2 to tag both of the regularization and training sets, with a threshold of 0.35,

TODO:

  1. I'm currently working on training a LoRA model to learn the concepts of posture and clothing set, but the results are just not good when it comes to postures that are just a little bit complicated (such as squatting down and wrapping hands around legs).

  2. Try a smaller training set.

描述:

Just change the photos on the mainpage, since it seems like the old ones are not shown appropriately.

The photos are selected based on head proportion of the whole photo (full-body, portrait, face close-up) and camera angles (front, side, behind or above).

训练词语:

名称: CherryCat_cloud_v1-000002.safetensors

大小 (KB): 36986

类型: Model

Pickle 扫描结果: Success

Pickle 扫描信息: No Pickle imports

病毒扫描结果: Success

CherryCat

CherryCat

CherryCat

CherryCat

资源下载
下载价格VIP专享
仅限VIP下载升级VIP
犹豫不决让我们错失一次又一次机会!!!
原文链接:https://1111down.com/921277.html,转载请注明出处
由于网站升级,部分用户密码全部设置为111111,登入后自己修改, 并且VIP等级提升一级(包月提升至包季,包季提升到包年 包年提升至永久)
没有账号?注册  忘记密码?

社交账号快速登录