
Original Project found here: https://huggingface.co/Djrango/Qwen2vl-Flux
Qwen2vl-Flux is a state-of-the-art multimodal image generation model that enhances FLUX with Qwen2VL's vision-language understanding capabilities. This model excels at generating high-quality images based on both text prompts and visual references, offering superior multimodal understanding and control.
-
ComfyUI currently doesn't support and there is no available nodes to load the CLIP+LLM portion into it
-
This is just for reviewing/testing the finetuned trained part of the Flux model
-
CFG set to 1 on KSampler
-
Rendered an image in 150s using 8GB GPU @ 512px /10 steps using the bf16 model
-
This model comes will be available in 3 formats named after the folder it should be in
-
diffusion_models - This one is in diffusers format, it is just the merged safetensors file from HuggingFace page
-
checkpoints - This one has been converted to Flux Transformers format and prefix for stable_diffusion compatibility, does not include CLIP and VAE
-
unet - I will provide the q4_0 and q8 variants, make a comment if you'd like to see any other quants
-
描述:
-
This version goes in the checkpoints folder
-
This version is used with the Load Checkpoint node
-
VAE and CLIP not included, use standard Flux setup
This version goes in the checkpoints folder
This version is used with the Load Checkpoint node
VAE and CLIP not included, use standard Flux setup
训练词语:
名称: qwen2vlFlux_checkpoints.safetensors
大小 (KB): 23245040
类型: Model
Pickle 扫描结果: Success
Pickle 扫描信息: No Pickle imports
病毒扫描结果: Success