v9 Testing
Since we all seem to be having different quality issues with each iteration, I figure I'd start this thread since I saw there was a placeholder for v9. It seems the consensus is that v5 is still the quality baseline we are all comparing new versions to, so lets use this to showcase our testing.
In the spirit of readability and ease of replication, report the following in your response:
- Scheduler/sampler used: I have sometimes seen better results using other combos than Phr00t recommends, so test the recommendations first, then any of your go-to combos to compare results
- v5 vs v9 comparison: Be detailed in your response about what you see that's different, pros/cons, etc
- any other loras you used to enhance the generation: I've had to use penis loras in the past because of the "vagina testicle" issue. I've also used some skin enhancer loras to get rid of the plasticy looks. NOTE: Please include the weights of any loras you are adding so we can all try it ourselves
- prompt examples for realism: This one is important to me (and for others I am sure) to ensure not just high quality renders, but CONSISTENT CHARACTERS. This is an issue with Qwen as a whole, but any tips/tricks to get consistency is welcome. For me personally, I fuse multiple characters in an image and then change their pose. I have seen it documented that whatever image is in "image 1" gets the weight and the bias when creating the new merged image, the character in "image 2" is hardly ever consistent. I have to either do 50 generations to get MAYBE a single one where both characters faces are accurate, or have to inpaint the second character to match the face, which is tedious and not scalable.
This is what makes a community and helps Phr00t iterate and get closer to the baseline we all want. I will be posting my results as soon as he releases it.
@mystical8218
thats exactly what i try, this is what gave relatively good results. used qwen3 to create a prompt for me. https://huggingface.co/spaces/Qwen/Qwen3-VL-Demo.
Using v5.3
Using multi-image input:
- Image 1 = Identity reference (Person identity source)
- Image 2 = Clothing reference (Lingerie clothing source)
- Image 3 = Pose reference (Pose source)
**Strictly execute:**
1. Extract the **entire identity** from Image 1:
- Full facial structure, skin texture, skin tone, hair (color/length/style), body proportions, limb dimensions, and all unique physical features.
- Zero changes to these elements.
2. Apply ONLY the **lingerie design** from Image 2:
- Extract color, fabric texture, cut, and placement of lingerie.
- Adjust lingerie to **fit the exact body proportions** from Image 1 (no resizing/distortion of body).
3. Adopt the **full-body pose** from Image 3:
- Match joint angles, hand positioning, head tilt, and posture.
- Preserve all identity features from Image 1 *throughout* the pose transition.
**Critical constraints:**
- NO blending of faces/hair/skin from other images.
- NO alteration of body shape, size, or anatomy from Image 1.
- Lingerie must integrate seamlessly with lighting/shadows of Image 1.
- All micro-details (freckles, moles, scars) from Image 1 must remain visible.
- Background remains unchanged (use Image 1 background).
Output: A single image showing the person from Image 1, with lingerie from Image 2, in pose from Image 3 β with 100% identity fidelity to Image 1.
The blending part is very urgent as without it keeps generating somewhat between image 1 and 2
@Phil2Sat , that is not my use case. I have 2 different CHARACTERS in image 1 and image 2, two different people that I am merging into a single scene together. I don't have any issues with a single reference image and then adding clothes, changing poses, etc. Its adding 2 characters together that breaks the consistency in the faces. That's a deal breaker for my generations. Still searching for something that works consistently, takes so many generations to even get a single one that's close and most times I have to inpaint the second character.
v9 is out boys!! Commence testing :) So excited because the new version incorporates skin refinement loras!! As long as the grid line issue is gone, its go time :) Will report back my findings.
IMPORTANT: Since @Phro00t has integrated "Smartphone" lora, you will need to add "amateur photo" at the beginning of your prompt or that lora will not assist with the render. Try it both ways and see which you like better. I am testing scheduler/sampler combinations now, but here are my first observations:
- "grid effect" is completely gone using this as is out of the box
- consistency of characters is MUCH BETTER, I have yet to try with multiple characters fused in a scene, but a single reference and then adding another person with a prompt works great so far (more testing needed)
- the "vagina testicles" issue seems to still be there in some generations, but I think I might have an answer for this without having to use my "penislora".
Initial results are much better than any other checkpoint to date. Please be aware, qwen, by default, is VERY particular about how you design your prompt and describe details. I will provide some prompt call outs that work in my tests, please do the same when you are reporting yours. Back in a few hrs to provide my test results. Really excited though, @Phr00t , great job on this iteration man!!
Would appreciate it if anyone doing the comparisons can post picture samples in this thread
Testing setup:
- Phr00t's default workflow he provided in the repo
- Using v9 exclusively to test with (I will not have time to compare it to v5)
- Using a single seed that is fixed for direct comparisons of sampler/scheduler
Results:
- I tested a dozen combinations of sampler/schedulers, including what Phr00t recommends, and I have narrowed it down to 3 that are worthy of generations: euler/beta, res2s_simple, res2s_bong_tangent
- euler/beta is always my baseline, and that generation beat out everything I tested BEFORE I got to the res2s generations. The realism and clarity of the final two I tested were unmatched. The winner: res2s_bong_tangent
- 8 steps generation is better clarity than 6 steps (but its close). If you are using your generations for WAN videos, you won't see a difference in the videos you generate as WAN will do its own smoothing of the realism you generated here with Qwen. But the more detailed a pic you can use with WAN the better the clarity of your characters will be in the video for longer/consistently.
- I tested both with the prompt for "amateur photo" to see if that changed things. It does, but not for the better. I will mention it in my observations below, but the recommendation is don't use it and don't prompt for it.
- The top 2 sampler/scheduler combinations (res2s/simple and res2s/bong_tangent) are similar except for one small difference. Bong_tangent makes body hair on the legs look more like cracked glass, but I can overlook that for the incredible skin/features realism.
Observations:
- "Vagina testicles" is still an issue (I am convinced this is just a qwen issue when there is a female and male character in the same scene). Even when prompting to try and minimize it, it happens. There are 2 solutions: try and prompt to resolve it and randomize your seeds, or use the PENISLORA I use at 0.3 strength. Using the lora will darken the colors a bit and you will lose a bit of realism overall, but you will gain an incredible amount of detail in the penis AND it seems to resolve completely the vagina on the testicles. Its all what you are comfortable with. I am using my image generations to then feed into WAN to animate them, so I would rather have the correct anatomy and sacrifice a little detail, since my video generations are at a lower res anyway to the 1024 image I am creating.
- Prompting to engage the Smartphone lora Phr00t integrated does not drastically change the image realism or quality. This will smooth out some of the textures a bit, but there is not a lot of noticeable difference, so I would recommend NOT using it and don't prompt for it
- Qwen is VASTLY different in generations with different seeds. You WILL need to try many generations before getting what you want. Let me repeat that. This is NOT a fault of Phr00t, it is a limitation of Qwen. It is VERY particular about prompt adherence and needs a LOT of detail in the prompt to get close to what you are looking for. It is not as forgiving as prompting with WAN, you have to be very verbose, Qwen likes that.
- There are no more grid lines (thank goodness) if you use this as is out of the box. If you start adding more loras with this, you will see those gridlines come back. Any loras that you add need to be run at LOW STRENGTH or else you will overpower the checkpoint weights Phr00t has set and you will see much degradation in clarity.
- There is a use case for euler/beta. If you want your subject characters to look a little younger without sacrificing too much detail, this combo tends to "tighten" the skin and make more mature characters seem younger. Only really affects older characters.
Things to add to your prompt for more realism:
- At the end of your prompt "Natural skin texture and detailed facial features, natural lighting. Ultra-sharp details. 8K resolution." This seems to help refine somewhat even if you aren't actually rendering in 8k.
- Beginning of your prompt: "A photorealistic portrait of"
- If using the penis lora below, be sure to add this description the first time you mention "penis" in the prompt: " with smooth perfectly rounded testicles". This helps get rid of the vagina on most seed iterations.
One final note, if your generations here will NOT be used for WAN videos, then I would recommend upscaling your generations to give that extra detail you require. There are other refiners that you can feed these images into when you are upscaling that can also add more realism which removes the Qwen-effect of plasticy-ness. Its an extra step, yes, but you can just load your images in a folder and run them all through a refiner so its not really that big a deal to do after all your generations are done.
I hope this helps others who haven't had the time to test. Looking forward to hearing other results from people. I will have a few more tests to run with some other skin modifiers to see if I can crank up the realism without inducing the grid effect. Will report back if there is anything better.
Resources:
- Penislora: https://civitai.com/models/1476909?modelVersionId=2292091
- res2s nodes: Install RES4LYF nodes from within ComfyUI Manager
As Mystical and others have said, correct prompting is a key factor with Qwen. I am experimenting with extra nodes in my workflow, utilising Qwen3-VL for prompt generation. It's still a work in progress.. π
At the moment I just cut and paste for generation or editing. but once I get it sorted, I could feed the prompt directly into the workflow. (hopefully..). Or I could be wasting my time..
A couple of advantages are, various models are available, including a "abliterated" model. (some editing in the config file is required). Plus a drop down menu for various scenarios.
This might be helpful for someone and perhaps they could advance the workflow further..
link for Comfyui-Qwen3-VL if anyone's interested. https://github.com/1038lab/ComfyUI-QwenVL
Many thanks to Phr00t for his work. (I do wonder if he gets any sleep.) π Also I have donated, as this is good work.
V9 crops source image at lower resolutions again, v8 did not have this issue.
I'm looking forward to trying v9 later today. Have been highly impressed with phr00t's work on this. In my own app I have a kind of battle test arena which allows me to blind test multiple combinations of samplers and schedulers to try and work out what the top 3 are - or at least get a general idea of what they likely are anyway. Usually takes about up to 30 minutes to complete it, with me visually judging "is A better or is B better, are they a tie, etc". I'll share my findings later today, hopefully it'll be helpful to others here.
EDIT: So, in my testing I found that in the battle benchmark the following were the top 3 for me.
1st/2nd - DDIM / Beta
1st/2nd - Euler Ancestral / Simple
3rd. DDIM / Normal
I used 8 steps with denoise 1.0, output resolution 1344x1344, model is the NSFW v9 variant.
First and second place were tied. I found subject's consistency, particularly when changing the scene as well as the subject's attire, much better in version 9. I don't notice the grid either. The battle benchmark took longer than usual, but for the prompt and image I used I found that those were the top 3 sampler/scheduler combos above. I didn't test the full range but I did test the following of all possible combinations:
- lcm
- euler
- euler_ancestral
- ddim
and... - normal
- simple
- beta
- sgm_uniform
The prompt I used for this particular benchmark battle was:
Replace the woman's swimsuit with a scarlet rio bikini of high-leg nylon-spandex blend, triangle top tied in halter style with metallic cherry accents on straps, transform the scene to a bright modern bathroom in a house on a sunny day using the reference for identity and body proportions, adjust pose and natural facial expression to suit the situation in a full body shot. Professional digital photography, natural skin texture, detailed facial features, natural lighting, ultra-sharp details, 8K resolution.
I hope this information is helpful.
So first initial V 9.0 NSFW tests:
Basic tasks works.
- Consistency okay.
- Cropping/zooming is promptable. If cropping occurs try lowering the "target_size" about 10%. Sometimes it helps, just to add a Openpose/DWPose image in the same resolution as the output Image
lower output resolutions under 1mpx at all often ignore prompts or give weird results, so test with ~1mpx, but thats also in the base 2509. - Converting anime to realism with added A2R lora weight=1 works pretty well.
Convert to realistic.
Lock the angle and position.
Natural skin texture and detailed facial features, natural lighting. Ultra-sharp details. 8K resolution.
First step to get an realistic Anime Girl.
Next step change Face and keep Body.
will edit try further...
@Phil2Sat , I am having the damnest of time trying to control the camera properly with Qwen. I have 2 characters that I merge in a shot and I try and prompt for "The camera is from the side" but it either ignores that or only moves the camera a slight amount to the side. Can you recommend a good prompt to control the view of the shot better?
@Phil2Sat , I am having the damnest of time trying to control the camera properly with Qwen. I have 2 characters that I merge in a shot and I try and prompt for "The camera is from the side" but it either ignores that or only moves the camera a slight amount to the side. Can you recommend a good prompt to control the view of the shot better?
did you tried https://huggingface.co/dx8152/Qwen-Edit-2509-Multiple-angles ???
Actually im uploading the v9 ggufs so next testing tomorrow
@TheNecr0mancer
https://huggingface.co/Phr00t/Qwen-Image-Edit-Rapid-AIO/discussions/101#690cacabafd53ec5d5caba8d
I didn't need the node with V8, but I will go back to using this.
Any idea how to keep it from color shifting the image and re-rendering the areas you don't want modified?
Qwen edit smooths things out way too much.
the node also completely alters my characters face.
you need, depending on resolution you
- can set zoom with target size
- speedup inference alot if you go low res.
image 832x1216
target size 1024 for me around 65s/it
target size 896 45s/it
target size 416 23s/it
but if you go too low the eg. person is small in the resulting image, sometimes its possible to prompt against:
do not zoom in or crop
or
center and maximize
or
lock angle and position of...
btw. lock is a magic i want it consistent trigger.
lock character personality
lock body build
lock pose
and so on
qwen likes
lock (personality/body build/pose/clothes/angle and position).
style
Lowering target size for faster inference completely obliterates identity retention
somehow...i still reckon 5.3 can yield best results for my workflow euler/beta 6step.
tried v8 v9....realistick skin is a big concern in my workflow neither can these 2 version beat v5.3 with a samsung phone lora atm in my opinion.
In my testing (v5.3, v7.1, v9.0). I found that v7.1 and v9.0 have major issues with face consistency, but v9.0 is the best with artistic style.
I will keep using both v5.3 and v9.0 (depends on what i am editing).
Some pattern issues still exist in output, but barely noticeable.
somehow...i still reckon 5.3 can yield best results for my workflow euler/beta 6step.
tried v8 v9....realistick skin is a big concern in my workflow neither can these 2 version beat v5.3 with a samsung phone lora atm in my opinion.
Link to the Samsung phone lora?
EDIT: I also just retested 5.3 and it fixes every issue I've had. colors match input, character face retention is flawless, and so on.
Recommend you guys trying Anime2Real (even if that's not what you're doing!) and Consistency V2 LoRA, both at 0.5 for v9. For v5 also, but with Consistency LoRA at only 0.25.
The multi angle lora is good after the fact, but I don't want to add yet another lora to the stack and start moving back to the grid issue. I suppose if I have to I could run images through that workflow, but I'd rather try and find the right prompt to make the angle right from the first generation than to have to iterate on it. I will try "lock" as Phil mentioned.
@Phil2sat , how would you prompt for a camera fixed "from the side" using lock? My issue is that with multiple characters, Qwen tries to keep them both in frame which is good and what I want, but it locks the camera at like 45 degrees to the side when I need it a full 90 degrees from the side to get real side view. That's my struggle.
did you tried https://huggingface.co/dx8152/Qwen-Edit-2509-Multiple-angles ???
did tried? LOL
Did you triED trolling me? For my bad english? Looking at your other comments, shame is on you...
The multi angle lora is good after the fact, but I don't want to add yet another lora to the stack and start moving back to the grid issue. I suppose if I have to I could run images through that workflow, but I'd rather try and find the right prompt to make the angle right from the first generation than to have to iterate on it. I will try "lock" as Phil mentioned.
@Phil2sat , how would you prompt for a camera fixed "from the side" using lock? My issue is that with multiple characters, Qwen tries to keep them both in frame which is good and what I want, but it locks the camera at like 45 degrees to the side when I need it a full 90 degrees from the side to get real side view. That's my struggle.
https://www.reddit.com/r/StableDiffusion/comments/1nxcqro/prompts_for_camera_control_in_qwen_edit_2509/&ved=2ahUKEwiAx4vgn-CQAxWeSfEDHWpfETcQFnoECCEQAQ&usg=AOvVaw3Mw9ntrwj__861-uwy68B0
thats what i would try but im actually trying to make a comfyui node https://unicomai.github.io/LeMiCa/ so developing and i think i got something π
