
NOT TO BE USED COMMERCIALLY!
Overview
When generating male portraits using different models, I have certainly noticed varying results. SDXL tends to generate men with similar facial features, especially when using widely recognized ethnic descriptors such as "Latino" or "Indian." On the other hand, Flux.1 DEV doesn't seem to understand age, and sometimes ethnicity, what with men in their early 20s looking like they are middle-aged and more obscure descriptors not altering the final outputs in a meaningful way.
Dall-E 3 stands out as having plenty of versatility when it comes to male faces, even when prompting for a specific ethnicity, and especially for specific ages. This is why I'm attempting to replicate this aesthetic by creating Dall-E 3 Male LoRA.
This project is entirely experimental as of writing this and driven by my curiosity regarding AI tools and training. It will also serve as a log for all of my training attempts and my learning process.
What is Dall-E 3 Male LoRA?
The Dall-E 3 Male LoRA is a Style LoRA which has been trained on 220 raw example images of male portraits generated using Dall-E 3. It aims to introduce some features which may be hard to get solely by prompting SDXL checkpoints.
This initial version of the LoRA turned out better than I expected, and didn't involve any convoluted workflows. When releasing future versions, my goal is for them to be trained on a wider variety of images, especially ones processed through my personal realism workflow to make details such as skin texture and color look crisper.
Comparisons and Examples
Let's compare two outputs produced by an SDXL Checkpoint and Dall-E 3. To do this, I will be using the popular trained checkpoint Juggernaut XL, and Dall-E 3 via Microsoft Designer.
Prompt: A young 24-year-old Indian man with short black hair and a small black beard is wearing a red shirt.
Negative Prompt (Juggernaut XL): ugly, boring, disfigured, necklace
Juggernaut XL:
Dall-E 3:
Aesthetics and beauty are deeply personal, oftentimes subjective qualities and experiences. Both images have ups and downs in my eyes, and there is no right or wrong answer when it comes to which one anyone prefers.
In this isolated example, Juggernaut XL produced a very realistic image. The Indian man looks like an average person wearing a red t-shirt whose photo has been taken against a blurry background, under direct sunlight. The normalcy of this output is one of its core qualities.'
Dall-E 3 tends to offer a variety of faces and facial expressions, with the Indian man's face possessing softer features, bushier eyebrows, more saturated colors, and a neater appearance. These qualities are often associated with actors, singers, and so on, but it's important to consider that this look is also natural. The downside with Dall-E 3 is realism, with details such as skin texture looking airbrushed or too plastic.
Juggernaut XL doesn't appear to blend well with the Dall-E 3 Male LoRA, as the latter has been trained on Base SDXL. From my testing, I've discovered that a great contender is Hephaistos NextGen DPO. I haven't done extensive testing with different SDXL Checkpoints, so I am uncertain as to how the LoRA will behave with other ones. Feedback will be of great use.
Prompt: A young 24-year-old Indian man with short black hair and a small black beard is wearing a red shirt. <lora:Dall-E_3_Male_LoRA:1>
Negative Prompt: child, ugly, boring, disfigured, necklace
Hephaistos NextGen DPO + Dall-E 3 Male LoRA Alpha_v0.1:
As usual, using an adetailer (such as face_yolov8n_v2) can help refine the raw output if desired:
The observed results from this example showcase the aforementioned soft features adopted by the style LoRA. Interestingly, Dall-E 3 Male LoRA doesn't render images too airbrushed and plastic-looking as opposed to the raw Dall-E 3 model. In any case, these examples from the first iteration of the LoRA showcase the overall look of the very first results, and I will work on refining it down the line.
Another, much faster option is the SDXL Lightning checkpoint DreamShaper XL Lightning DPM++ SDE. Due to the nature of how lightning finetunes work, you are limited to 3 CFG Scale settings on ForgeUI - 1, 1.5, and 2 (which is the recommended option). Using a lower CFG Scale may allow for more variety or less saturated colors, but CFG 1 has the drawback of disabling negative prompting, which is not ideal in most cases.
DreamShaperXL Lightning DPM++ SDE + Dall-E 3 Male LoRA Alpha_v0.1:
With adetailer:
Potential Novelties in SDXL Generation
Oftentimes, and as far as I'm concerned, SDXL Checkpoints struggle to generate less represented cultures and religions (particularly garments). With Dall-E 3 Male LoRA, one of my aims is to make it possible to also include Kippahs, which Orthodox Jewish men often wear as a sign of religious respect. Dall-E 3 itself appears to be capable of generating somewhat accurate Kippahs, but in order to ensure accuracy and appropriate handling, I believe that including real, public domain examples of Jewish men wearing a Kippah will not only facilitate this goal, but also improve the final outputs provided by the LoRA. I firmly believe that positive representation for Jewish culture and the Jewish population across the globe is a much-needed practice, which is essential for people with Jewish roots to feel seen, valid, and respected.
Conclusion
The field of artificial intelligence has become a landscape whose growth never ceases to accelerate. Learning about the tools and how they can be used as an extension to what we already possess may be merely a useful skill now, but it could be an important one in the future.
Applying specific styles to generated images is crucial, especially when a specific aesthetic is preferred for any non-harmful purpose. SDXL Checkpoints, as well as Flux Checkpoints, have an incredibly massive advantage by being open-source and free to use locally, but suffer from their respective limitations. LoRAs can help mitigate different issues or cater to different needs without going through the hassle of training a checkpoint, which is my goal for Dall-E 3 Male LoRA.
As a reminder, this model may NOT be used commercially under any circumstances. Proper attribution is required, and any commercial application is prohibited.
描述:
-
Diversified the dataset, introducing 100+ new images
-
Recreated many of the images using my personal workflow to make them look more realistic
Diversified the dataset, introducing 100+ new images
Recreated many of the images using my personal workflow to make them look more realistic
During testing the results appear to be... not very good, and that's okay - it's a learning process. I decided to name this version EXP for Experimental. ?
训练词语:
名称: Dall-E_3_Male_LoRA_Alpha_v0.2.safetensors
大小 (KB): 223103
类型: Model
Pickle 扫描结果: Success
Pickle 扫描信息: No Pickle imports
病毒扫描结果: Success