
This is a vector database containing some 900k prompts from successful txt2img generations taken from the civitai database. The data has been processed and has removed the majority of data around children or potential toxic prompts.
The database is a chroma database with vectors of cleaned prompt, positive prompt, and negative prompt. The cleaned prompt is a processed version of the positive prompt meant to remove extraneous punctuation and prompting artifacts.
The metadata for each vector includes the base model, the nsfw level of the prompt (nsfw vs None), and the imageId associated with the prompt
How to use:
-
Download the resource and unzip locally
-
run the following code in a notebook:
!pip install langchain_community ##assuming you're working from nb. If working from terminal remove !
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
##Load persisted vectorstore and use all-MiniLM-L12-V2 embeddings (Same as those that were used to MAKE vectorDB)
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L12-v2")
vectorstore = Chroma(embedding_function=embedding_function,
persist_directory=LOCALVECTORDB
##input some base prompt
basePrompt = 'A cute kitten'
##Retrieve top k=5 similar prompts
context = vectorstore \
_search("cleanedPrompt [TOPICKEY] "+ basePrompt,
filter = {"nsfw": "safe"}, k = 5)
Note: This does contain sfw and nsfw text. To get nsfw text, toggle nsfw filter to
None
users can then use the prompts to build context in a large language model or for other uses.
描述:
训练词语:
名称: promptlyvectordb_v11.zip
大小 (KB): 4777323
类型: Archive
Pickle 扫描结果: Success
Pickle 扫描信息: No Pickle imports
病毒扫描结果: Success