Flux Dev Q5_K_M GGUF quantization (a nice balance of speed and quality in under 9 gigabytes)版本v1.0 (ID: 795785) 综合资源合集综合资源合集

NOTE: Ignore the model format listed! This is not an NF4 ONNX model, it is a Q5_K_M GGUF model.

This is a GGUF of flux_dev quantized in Q5_K_M GGUF format that should provide a significant quality boost over 4-bit quantizations while being a lot smaller than the 8-bit version (and since it's a relatively small GGUF, load times should be significantly improved over FP8 as well). This model is ideal of mid-sized graphics cards, and in my tests (without any memory optimizations such as offloading t5 onto the CPU) fits comfortably in 16GB of VRAM, and may work on as low as 8GB (if you have under 16GB of VRAM, please test it and leave a comment about whether it works for you).

UPDATE: Per this comment, this quant will work on systems with 8G of VRAM (Thanks to @VolatileSupernova for testing and responding!)

Tested and working in ComfyUI on my RTX 3050 with 8GB VRAM using ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF for CLIP-L and t5-v1_1-xxl-encoder-Q4_K_M for T5. I usually use the Q4_K-S model which gives me images in 6.4 seconds per iteration at 896x1152 resolution, this model with the same settings and only the model changed gives me them in 7.5 seconds, not a big change at all! It does mean that unfortunately I can't use any Loras with your K_M model since it just barely fits in my VRAM but I'd rather have the higher quality than use Loras!

EDIT: I can actually use the less than 20MB Loras without issue!

Apart from being quantized, this is an unmodified version of Flux Dev that has not been finetuned in any way. It should get along just fine with any LoRAs that will work with the full size or FP8 versions of the model.