Workflow to generate image descriptions on Apple Silicon Mac版本v1.0 (ID: 1202130)

Workflow to generate image descriptions on Apple Silicon Mac版本v1.0 (ID: 1202130)

About

This is a workflow that makes use of multiple image-to-text tools and a LLM to produce the final image descriptions for a batch of images in a folder, and write out the corresponding .txt files.

This is especially helpful when captioning /describing NSFW images for LoRA training or fine-tuning, therefore the choice of the following 3 VLMs:

  • Florence2, WD1.4 tagger

  • JoyCaption alpha 2

  • huihui-ai/Qwen2-VL-7B-Instruct-abliterated

The LLM part for the final composition of the image description is done via the ollama node. I would say it is one of the easiest way to use a local LLM.

You will get amazing results by using a uncensored large model such as the huihui-ai/Llama-3.3-70B-Instruct-abliterated

(abliterated /uncensored models shall be used for both Qwen2-VL and the LLM to achieve the best results, nsfw or not).

Installation

Except for the ComfyUI_Qwen2-VL-Instruct and Comfyui_JC2 nodes, install the missing nodes using the ComfyUI manager.

ComfyUI_Qwen2-VL-Instruct

You will need to use the Qwen2-VL-Instruct node from this fork for this workflow to work:

https://github.com/edwios/ComfyUI_Qwen2-VL-Instruct

This fork incorporated two major changes: Allows Image input same as the other VLM tools and uses the Mac GPU (mps) with Python 3.12 and up to and including PyTorch 2.6.

Comfyui_JC2

You may also want to use this ComfyUI_JC2 fork to utilise the Mac GPU for JoyCaption: Alpha 2.

How to use

Everything you will need to interact with this workflow is on the Left most.

The simplest way to start is to enter the path to the directory containing the images. The results will be written to the same directory and name of the images but with the .txt extension.

Optionally, you can do the followings:

  • Change the VLM prompt to have Qwen2-VL to focus on a specific aspect of the image or images

  • Change the LLM prompt such as for better reasoning, or if you want it to write the descriptions in a SFW way (use at least a 70b instruct model for this). [No, this is NOT the same as using a 'safe' model.]

Credits

Credits go to all that contributed to make ComfyUI and all these nodes available for all of us.

Especially ComfyUI_Qwen-VL-Instruct, ComfyUI_JC2, ComfyUI-WD14-Tagger, ComfyUI-Ollama, ComfyUI-Florence2 and Ollama for making these amazing machine learning models available on mps, or at least not force it into a Nvidia only solution.

描述:

V1.0

训练词语:

名称: workflowToGenerateImage_v10.zip

大小 (KB): 1644

类型: Archive

Pickle 扫描结果: Success

Pickle 扫描信息: No Pickle imports

病毒扫描结果: Success

Workflow to generate image descriptions on Apple Silicon Mac

资源下载
下载价格VIP专享
仅限VIP下载升级VIP
犹豫不决让我们错失一次又一次机会!!!
原文链接:https://1111down.com/1174654.html,转载请注明出处
由于网站升级,部分用户密码全部设置为111111,登入后自己修改, 并且VIP等级提升一级(包月提升至包季,包季提升到包年 包年提升至永久)
没有账号?注册  忘记密码?

社交账号快速登录