
Original Project found here: https://huggingface.co/Djrango/Qwen2vl-Flux
Qwen2vl-Flux is a state-of-the-art multimodal image generation model that enhances FLUX with Qwen2VL's vision-language understanding capabilities. This model excels at generating high-quality images based on both text prompts and visual references, offering superior multimodal understanding and control.
-
ComfyUI currently doesn't support and there is no available nodes to load the CLIP+LLM portion into it
-
This is just for reviewing/testing the finetuned trained part of the Flux model
-
CFG set to 1 on KSampler
-
Rendered an image in 150s using 8GB GPU @ 512px /10 steps using the bf16 model
-
This model comes will be available in 3 formats named after the folder it should be in
-
diffusion_models - This one is in diffusers format, it is just the merged safetensors file from HuggingFace page
-
checkpoints - This one has been converted to Flux Transformers format and prefix for stable_diffusion compatibility, does not include CLIP and VAE
-
unet - I will provide the q4_0 and q8 variants, make a comment if you'd like to see any other quants
-
描述:
-
This version goes in the unet folder
-
Requires the UNET Loader GGUF node in ComfyUI
-
Has been quantsized from bf16 to q4_0
This version goes in the unet folder
Requires the UNET Loader GGUF node in ComfyUI
Has been quantsized from bf16 to q4_0
训练词语:
名称: qwen2vlFlux_unetQ40.gguf
大小 (KB): 6763226
类型: Model
Pickle 扫描结果: Success
Pickle 扫描信息: No Pickle imports
病毒扫描结果: Success