EasyAnimateSDXL版本v1.0 (ID: 564773)

EasyAnimateSDXL版本v1.0 (ID: 564773)

Included are two Vid2Vid workflows for easily creating SDXL animations. One is an implementation of FLATTEN (https://github.com/yrcong/flatten https://github.com/logtd/ComfyUI-FLATTEN) which is a training-free approach to having consistent video animation, the other is an AnimateDiff workflow that uses HotshotXL for consistent motion. Also included is the comfyui custom node for FLATTEN as logtd's sdxl branch is deprecated and no longer maintained so I have ported over a few commits. You'll need to move this custom node folder into your custom nodes for this to work.

The workflows have a note located above the source group that describes how they work, but I will also include that information here. In this explanation I assume a base level of understanding for using comfyui for txt2img/img2img purposes, if you don't already have that then this is a little bit jumping into the deep end. All of my animations have been created using these workflows. These workflows should work for any SDXL model with any SDXL lora (and probably with any SD1.5 model or lora but I don't use SD1.5 so YMMV).

You will need the following controlnets (these go in your controlnets folder under models:

Diffusers Canny Controlnet for SDXL: https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0

Diffusers ZoeDepth Controlnet for SDXL: https://huggingface.co/diffusers/controlnet-zoe-depth-sdxl-1.0

For AnimateDiff:

(Optional) HotshotXL for AnimateDiff: https://huggingface.co/hotshotco/Hotshot-XL/blob/main/hsxl_temporal_layers.safetensors

For detectors used in masks:

https://huggingface.co/Bingsu/adetailer/tree/main

If anyone has suggestions on how to improve these workflows, please let me know I would love to hear your inputs!

This is a Vid2Vid workflow that utilizes ControlNet and FLATTEN with AnimateDiff's Evolved Sampling to create coherent animations. Unlike many of the other Vid2Vid on Civitai, this one has more of a "from scratch" feel rather than a "filter" feel to it.

SYS REQS

I run this workflow on a PC with the below specs:

AMD Ryzen 9 5950X

64 GB DDR4 3600

3090Ti (24GB VRAM)

Often this workflow consumes 23+ GB VRAM and 30+ GB RAM, for this reason it will probably run into many issues on lower end machines with less memory. This can be remedied by reducing the meta batch manager batch size, but will result in videos with more "cuts" where the context switches and a new batch begins processing. this can result in significant drift in the features that are not present in the source video i.e. clothing/hair/etc. Bigger batch size -> less cuts. The "holy grail" would be to not use the meta batch manager at all, resulting in a video that has no seams as it maintains one sliding context window the entire time. I think next gen graphics cards with 36GB VRAM will not need to use this as VRAM usage in that scenario seems to be ~26GB. Someone smarter than me can most likely implement some VRAM optimizations that would completely eliminate this issue but alas, I am not that smart haha and so we are stuck with having videos with the context switches. On this note, if you are feeding in <20 frames, you can bypass the batch manager and get a video that has no cuts, but following my rule of thumb of using 10fps input, this constrains you to 2 second videos.

This workflow takes a long time to run, ~45 seconds per frame. You can get better performance out of an AnimateDiff workflow using HotshotXL (roughly 2x speed increase), but I find that it leaves some artifacts on your output (blur/noise/lower quality). Using FLATTEN, I've found that the output is almost identical to the output I get from a simple txt2img/img2img workflow so that is why I take this approach.

***USING THIS WORKFLOW***

This workflow has several sections and is not intended to be all run at the same time. These can be viewed as "stages", I enable/disable stages by right clicking groups and selecting "Set Group Nodes to Always" or "Set Group Nodes to Never":

Testing

This is the first stage I use to find the right slice of the video that I want to feed into the Vid2Vid stage and then tune the prompt/masking/canny thresholds for the input. The first part is very simple, it just loads the video frame by frame and then recombines so you can see the result. You can dial in on the slice of video you'd like to animate by using the skip frames and frame cap. Then it selects one frame and feeds that into an identical flow as the Vid2Vid process, but without FLATTEN. I recommend selecting every nth frame to get your video input to 10fps which can then be interpolated back to the original video frame rate, I've had good results following that process. I HIGHLY recommend using this stage and tuning your params on a single image (or a select a few from the batch if the pose/angle changes) since using the full Vid2Vid flow takes so long. This is a key part of my process so that I don't waste so much time running the vid2vid. I don't often tune the canny/depth controlnet weights, almost all of my videos are done with depth at .4 and canny at 0.3. You WILL want to tune the canny thresholds for every video that you do - these can vary quite a bit depending on the lighting and colors present in the video. There's not really a great rule of thumb for what values are best (and algorithmic approaches I've tried for finding these values yielded poor results), the main one is that your high should usually be 2x your low value. I do these mainly on intuition and not any actual metrics :)

Vid2Vid

This stage uses the SOURCE, FLATTEN Vid2Vid, and Video Previews groups. All three of these groups must be enabled for this to work properly. There is an interaction between the Meta Batch Manager in Source and the Video Combine nodes in Video Previews that batches the images coming out of the Load Video node and runs the FlattenVid2Vid in a loop. The core Vid2Vid process itself is done by combining FLATTEN model with Canny and ZoeDepth ControlNets, with optional seg masking. A context length of 8 with 4 overlap is used for the sliding context window, but this could be increased on higher end systems. I haven't really messed around with changing the injection steps at all, I just keep them at 8. Based on the input video, sometimes masking is needed to clean up the background. I often apply the mask to the canny controlnet and in rare cases the depth control net as well. There are convenient routing nodes below the control nets that allows the mask to be connected. By default, there is no masking.

Detailing

Okay your video is generated and frames/video has been output to your destination folder. With this FLATTEN workflow, the video usually looks pretty good from here and I find that I have not been running the detailing stage as often. That said, you can run the detailing to perform a 2nd pass on the frames. The detailing has 3 steps: detail w/person mask -> latent upscale -> detail w/face mask. You can tune the denoise weight to change the level of effect this detailing will have, usually 0.35 - 0.5 works well. By default, the detailer uses controlnets based on the output from the vid2vid stage, however since we have the source video output as well and the frames are aligned we can easily have the controlnets be based on the source video instead (just wire the 2nd load video as the source video (the one that is output with your animation) and plug those frames into the controlnets). MOST IMPORTANTLY this stage is run with a different method than normal - instead of queuing prompt you need to queue a batch that aligns with the number of frames using ComfyUI. There is probably a smarter way to do this, but I find comfyui's apprehension towards loops to be quite infuriating so I did not dig any further once I found a way for it to work.

Interpolation

This stage is very simple, just feed your full output path into the load frames node and set the frame rate/multiple to your desired level of interpolation. I usually interpolate back to the source fps (so 30fps input -> select every 3rd node for 10fps -> multiply 3 -> 30fps interpolated output), but there's definitely some room to tune this based on your video.

描述:

训练词语:

名称: easyanimatesdxl_v10.zip

大小 (KB): 140

类型: Archive

Pickle 扫描结果: Success

Pickle 扫描信息: No Pickle imports

病毒扫描结果: Success

EasyAnimateSDXL

EasyAnimateSDXL

资源下载
下载价格VIP专享
仅限VIP下载升级VIP
犹豫不决让我们错失一次又一次机会!!!
原文链接:https://1111down.com/1031443.html,转载请注明出处
由于网站升级,部分用户密码全部设置为111111,登入后自己修改, 并且VIP等级提升一级(包月提升至包季,包季提升到包年 包年提升至永久)
没有账号?注册  忘记密码?

社交账号快速登录