Phr00t/Qwen-Image-Edit-Rapid-AIO

4 days ago

Since we all seem to be having different quality issues with each iteration, I figure I'd start this thread since I saw there was a placeholder for v9. It seems the consensus is that v5 is still the quality baseline we are all comparing new versions to, so lets use this to showcase our testing.

In the spirit of readability and ease of replication, report the following in your response:

Scheduler/sampler used: I have sometimes seen better results using other combos than Phr00t recommends, so test the recommendations first, then any of your go-to combos to compare results
v5 vs v9 comparison: Be detailed in your response about what you see that's different, pros/cons, etc
any other loras you used to enhance the generation: I've had to use penis loras in the past because of the "vagina testicle" issue. I've also used some skin enhancer loras to get rid of the plasticy looks. NOTE: Please include the weights of any loras you are adding so we can all try it ourselves
prompt examples for realism: This one is important to me (and for others I am sure) to ensure not just high quality renders, but CONSISTENT CHARACTERS. This is an issue with Qwen as a whole, but any tips/tricks to get consistency is welcome. For me personally, I fuse multiple characters in an image and then change their pose. I have seen it documented that whatever image is in "image 1" gets the weight and the bias when creating the new merged image, the character in "image 2" is hardly ever consistent. I have to either do 50 generations to get MAYBE a single one where both characters faces are accurate, or have to inpaint the second character to match the face, which is tedious and not scalable.

This is what makes a community and helps Phr00t iterate and get closer to the baseline we all want. I will be posting my results as soon as he releases it.

Phil2Sat

3 days ago

•

edited 3 days ago

@mystical8218 thats exactly what i try, this is what gave relatively good results. used qwen3 to create a prompt for me. https://huggingface.co/spaces/Qwen/Qwen3-VL-Demo.
Using v5.3

Using multi-image input:
- Image 1 = Identity reference (Person identity source)
- Image 2 = Clothing reference (Lingerie clothing source)
- Image 3 = Pose reference (Pose source)

**Strictly execute:**
1. Extract the **entire identity** from Image 1:
   - Full facial structure, skin texture, skin tone, hair (color/length/style), body proportions, limb dimensions, and all unique physical features.
   - Zero changes to these elements.

2. Apply ONLY the **lingerie design** from Image 2:
   - Extract color, fabric texture, cut, and placement of lingerie.
   - Adjust lingerie to **fit the exact body proportions** from Image 1 (no resizing/distortion of body).

3. Adopt the **full-body pose** from Image 3:
   - Match joint angles, hand positioning, head tilt, and posture.
   - Preserve all identity features from Image 1 *throughout* the pose transition.

**Critical constraints:**
- NO blending of faces/hair/skin from other images.
- NO alteration of body shape, size, or anatomy from Image 1.
- Lingerie must integrate seamlessly with lighting/shadows of Image 1.
- All micro-details (freckles, moles, scars) from Image 1 must remain visible.
- Background remains unchanged (use Image 1 background).

Output: A single image showing the person from Image 1, with lingerie from Image 2, in pose from Image 3 – with 100% identity fidelity to Image 1.

The blending part is very urgent as without it keeps generating somewhat between image 1 and 2

mystical8218

3 days ago

@Phil2Sat , that is not my use case. I have 2 different CHARACTERS in image 1 and image 2, two different people that I am merging into a single scene together. I don't have any issues with a single reference image and then adding clothes, changing poses, etc. Its adding 2 characters together that breaks the consistency in the faces. That's a deal breaker for my generations. Still searching for something that works consistently, takes so many generations to even get a single one that's close and most times I have to inpaint the second character.

mystical8218

3 days ago

v9 is out boys!! Commence testing :) So excited because the new version incorporates skin refinement loras!! As long as the grid line issue is gone, its go time :) Will report back my findings.

mystical8218

3 days ago

IMPORTANT: Since @Phro00t has integrated "Smartphone" lora, you will need to add "amateur photo" at the beginning of your prompt or that lora will not assist with the render. Try it both ways and see which you like better. I am testing scheduler/sampler combinations now, but here are my first observations:

"grid effect" is completely gone using this as is out of the box
consistency of characters is MUCH BETTER, I have yet to try with multiple characters fused in a scene, but a single reference and then adding another person with a prompt works great so far (more testing needed)
the "vagina testicles" issue seems to still be there in some generations, but I think I might have an answer for this without having to use my "penislora".

Initial results are much better than any other checkpoint to date. Please be aware, qwen, by default, is VERY particular about how you design your prompt and describe details. I will provide some prompt call outs that work in my tests, please do the same when you are reporting yours. Back in a few hrs to provide my test results. Really excited though, @Phr00t , great job on this iteration man!!

jerrydev11

3 days ago

Would appreciate it if anyone doing the comparisons can post picture samples in this thread

mystical8218

3 days ago

Testing setup:

Phr00t's default workflow he provided in the repo
Using v9 exclusively to test with (I will not have time to compare it to v5)
Using a single seed that is fixed for direct comparisons of sampler/scheduler

Results:

I tested a dozen combinations of sampler/schedulers, including what Phr00t recommends, and I have narrowed it down to 3 that are worthy of generations: euler/beta, res2s_simple, res2s_bong_tangent
euler/beta is always my baseline, and that generation beat out everything I tested BEFORE I got to the res2s generations. The realism and clarity of the final two I tested were unmatched. The winner: res2s_bong_tangent
8 steps generation is better clarity than 6 steps (but its close). If you are using your generations for WAN videos, you won't see a difference in the videos you generate as WAN will do its own smoothing of the realism you generated here with Qwen. But the more detailed a pic you can use with WAN the better the clarity of your characters will be in the video for longer/consistently.
I tested both with the prompt for "amateur photo" to see if that changed things. It does, but not for the better. I will mention it in my observations below, but the recommendation is don't use it and don't prompt for it.
The top 2 sampler/scheduler combinations (res2s/simple and res2s/bong_tangent) are similar except for one small difference. Bong_tangent makes body hair on the legs look more like cracked glass, but I can overlook that for the incredible skin/features realism.

Observations:

"Vagina testicles" is still an issue (I am convinced this is just a qwen issue when there is a female and male character in the same scene). Even when prompting to try and minimize it, it happens. There are 2 solutions: try and prompt to resolve it and randomize your seeds, or use the PENISLORA I use at 0.3 strength. Using the lora will darken the colors a bit and you will lose a bit of realism overall, but you will gain an incredible amount of detail in the penis AND it seems to resolve completely the vagina on the testicles. Its all what you are comfortable with. I am using my image generations to then feed into WAN to animate them, so I would rather have the correct anatomy and sacrifice a little detail, since my video generations are at a lower res anyway to the 1024 image I am creating.
Prompting to engage the Smartphone lora Phr00t integrated does not drastically change the image realism or quality. This will smooth out some of the textures a bit, but there is not a lot of noticeable difference, so I would recommend NOT using it and don't prompt for it
Qwen is VASTLY different in generations with different seeds. You WILL need to try many generations before getting what you want. Let me repeat that. This is NOT a fault of Phr00t, it is a limitation of Qwen. It is VERY particular about prompt adherence and needs a LOT of detail in the prompt to get close to what you are looking for. It is not as forgiving as prompting with WAN, you have to be very verbose, Qwen likes that.
There are no more grid lines (thank goodness) if you use this as is out of the box. If you start adding more loras with this, you will see those gridlines come back. Any loras that you add need to be run at LOW STRENGTH or else you will overpower the checkpoint weights Phr00t has set and you will see much degradation in clarity.
There is a use case for euler/beta. If you want your subject characters to look a little younger without sacrificing too much detail, this combo tends to "tighten" the skin and make more mature characters seem younger. Only really affects older characters.

Things to add to your prompt for more realism:

At the end of your prompt "Natural skin texture and detailed facial features, natural lighting. Ultra-sharp details. 8K resolution." This seems to help refine somewhat even if you aren't actually rendering in 8k.
Beginning of your prompt: "A photorealistic portrait of"
If using the penis lora below, be sure to add this description the first time you mention "penis" in the prompt: " with smooth perfectly rounded testicles". This helps get rid of the vagina on most seed iterations.

One final note, if your generations here will NOT be used for WAN videos, then I would recommend upscaling your generations to give that extra detail you require. There are other refiners that you can feed these images into when you are upscaling that can also add more realism which removes the Qwen-effect of plasticy-ness. Its an extra step, yes, but you can just load your images in a folder and run them all through a refiner so its not really that big a deal to do after all your generations are done.

I hope this helps others who haven't had the time to test. Looking forward to hearing other results from people. I will have a few more tests to run with some other skin modifiers to see if I can crank up the realism without inducing the grid effect. Will report back if there is anything better.

Resources:

Penislora: https://civitai.com/models/1476909?modelVersionId=2292091
res2s nodes: Install RES4LYF nodes from within ComfyUI Manager

Kjay

3 days ago

•

edited 3 days ago

As Mystical and others have said, correct prompting is a key factor with Qwen. I am experimenting with extra nodes in my workflow, utilising Qwen3-VL for prompt generation. It's still a work in progress.. 😒
At the moment I just cut and paste for generation or editing. but once I get it sorted, I could feed the prompt directly into the workflow. (hopefully..). Or I could be wasting my time..
A couple of advantages are, various models are available, including a "abliterated" model. (some editing in the config file is required). Plus a drop down menu for various scenarios.
This might be helpful for someone and perhaps they could advance the workflow further..

link for Comfyui-Qwen3-VL if anyone's interested. https://github.com/1038lab/ComfyUI-QwenVL

Many thanks to Phr00t for his work. (I do wonder if he gets any sleep.) 😄 Also I have donated, as this is good work.

TheNecr0mancer

3 days ago

V9 crops source image at lower resolutions again, v8 did not have this issue.

qpqpqpqpqpqp

3 days ago

@TheNecr0mancer
https://huggingface.co/Phr00t/Qwen-Image-Edit-Rapid-AIO/discussions/101#690cacabafd53ec5d5caba8d

Ixel1

3 days ago

•

edited 3 days ago

I'm looking forward to trying v9 later today. Have been highly impressed with phr00t's work on this. In my own app I have a kind of battle test arena which allows me to blind test multiple combinations of samplers and schedulers to try and work out what the top 3 are - or at least get a general idea of what they likely are anyway. Usually takes about up to 30 minutes to complete it, with me visually judging "is A better or is B better, are they a tie, etc". I'll share my findings later today, hopefully it'll be helpful to others here.

EDIT: So, in my testing I found that in the battle benchmark the following were the top 3 for me.

1st/2nd - DDIM / Beta
1st/2nd - Euler Ancestral / Simple
3rd. DDIM / Normal

I used 8 steps with denoise 1.0, output resolution 1344x1344, model is the NSFW v9 variant.

First and second place were tied. I found subject's consistency, particularly when changing the scene as well as the subject's attire, much better in version 9. I don't notice the grid either. The battle benchmark took longer than usual, but for the prompt and image I used I found that those were the top 3 sampler/scheduler combos above. I didn't test the full range but I did test the following of all possible combinations:

lcm
euler
euler_ancestral
ddim
and...
normal
simple
beta
sgm_uniform

The prompt I used for this particular benchmark battle was:
Replace the woman's swimsuit with a scarlet rio bikini of high-leg nylon-spandex blend, triangle top tied in halter style with metallic cherry accents on straps, transform the scene to a bright modern bathroom in a house on a sunny day using the reference for identity and body proportions, adjust pose and natural facial expression to suit the situation in a full body shot. Professional digital photography, natural skin texture, detailed facial features, natural lighting, ultra-sharp details, 8K resolution.

I hope this information is helpful.

Phil2Sat

3 days ago

So first initial V 9.0 NSFW tests:
Basic tasks works.

Consistency okay.
Cropping/zooming is promptable. If cropping occurs try lowering the "target_size" about 10%. Sometimes it helps, just to add a Openpose/DWPose image in the same resolution as the output Image
lower output resolutions under 1mpx at all often ignore prompts or give weird results, so test with ~1mpx, but thats also in the base 2509.
Converting anime to realism with added A2R lora weight=1 works pretty well.

Convert to realistic.
Lock the angle and position.
Natural skin texture and detailed facial features, natural lighting. Ultra-sharp details. 8K resolution.

First step to get an realistic Anime Girl.
Next step change Face and keep Body.
will edit try further...

mystical8218

3 days ago

@Phil2Sat , I am having the damnest of time trying to control the camera properly with Qwen. I have 2 characters that I merge in a shot and I try and prompt for "The camera is from the side" but it either ignores that or only moves the camera a slight amount to the side. Can you recommend a good prompt to control the view of the shot better?

010O11

3 days ago

@Phil2Sat , I am having the damnest of time trying to control the camera properly with Qwen. I have 2 characters that I merge in a shot and I try and prompt for "The camera is from the side" but it either ignores that or only moves the camera a slight amount to the side. Can you recommend a good prompt to control the view of the shot better?

did you tried https://huggingface.co/dx8152/Qwen-Edit-2509-Multiple-angles ???

Phil2Sat

2 days ago

•

edited 2 days ago

tried this ?https://www.reddit.com/r/StableDiffusion/comments/1nxcqro/prompts_for_camera_control_in_qwen_edit_2509/&ved=2ahUKEwiAx4vgn-CQAxWeSfEDHWpfETcQFnoECCEQAQ&usg=AOvVaw3Mw9ntrwj__861-uwy68B0

Actually im uploading the v9 ggufs so next testing tomorrow

TheNecr0mancer

2 days ago

•

edited 2 days ago

@TheNecr0mancer
https://huggingface.co/Phr00t/Qwen-Image-Edit-Rapid-AIO/discussions/101#690cacabafd53ec5d5caba8d

I didn't need the node with V8, but I will go back to using this.
Any idea how to keep it from color shifting the image and re-rendering the areas you don't want modified?
Qwen edit smooths things out way too much.

the node also completely alters my characters face.

Phil2Sat

2 days ago

•

edited 2 days ago

you need, depending on resolution you

can set zoom with target size
speedup inference alot if you go low res.
image 832x1216
target size 1024 for me around 65s/it
target size 896 45s/it
target size 416 23s/it

but if you go too low the eg. person is small in the resulting image, sometimes its possible to prompt against:
do not zoom in or crop
or
center and maximize
or
lock angle and position of...

btw. lock is a magic i want it consistent trigger.

lock character personality
lock body build
lock pose
and so on

qwen likes
lock (personality/body build/pose/clothes/angle and position).

style

TheNecr0mancer

2 days ago

Lowering target size for faster inference completely obliterates identity retention

Tofusang

2 days ago

somehow...i still reckon 5.3 can yield best results for my workflow euler/beta 6step.
tried v8 v9....realistick skin is a big concern in my workflow neither can these 2 version beat v5.3 with a samsung phone lora atm in my opinion.

Zimdin12

2 days ago

In my testing (v5.3, v7.1, v9.0). I found that v7.1 and v9.0 have major issues with face consistency, but v9.0 is the best with artistic style.
I will keep using both v5.3 and v9.0 (depends on what i am editing).
Some pattern issues still exist in output, but barely noticeable.

TheNecr0mancer

2 days ago

•

edited 2 days ago

somehow...i still reckon 5.3 can yield best results for my workflow euler/beta 6step.
tried v8 v9....realistick skin is a big concern in my workflow neither can these 2 version beat v5.3 with a samsung phone lora atm in my opinion.

Link to the Samsung phone lora?

EDIT: I also just retested 5.3 and it fixes every issue I've had. colors match input, character face retention is flawless, and so on.

qpqpqpqpqpqp

2 days ago

did you tried https://huggingface.co/dx8152/Qwen-Edit-2509-Multiple-angles ???

did tried? LOL

CamiloMM

2 days ago

Recommend you guys trying Anime2Real (even if that's not what you're doing!) and Consistency V2 LoRA, both at 0.5 for v9. For v5 also, but with Consistency LoRA at only 0.25.

mystical8218

2 days ago

The multi angle lora is good after the fact, but I don't want to add yet another lora to the stack and start moving back to the grid issue. I suppose if I have to I could run images through that workflow, but I'd rather try and find the right prompt to make the angle right from the first generation than to have to iterate on it. I will try "lock" as Phil mentioned.

@Phil2sat , how would you prompt for a camera fixed "from the side" using lock? My issue is that with multiple characters, Qwen tries to keep them both in frame which is good and what I want, but it locks the camera at like 45 degrees to the side when I need it a full 90 degrees from the side to get real side view. That's my struggle.

010O11

1 day ago

did you tried https://huggingface.co/dx8152/Qwen-Edit-2509-Multiple-angles ???

did tried? LOL

Did you triED trolling me? For my bad english? Looking at your other comments, shame is on you...

Phil2Sat

1 day ago

•

edited 1 day ago

The multi angle lora is good after the fact, but I don't want to add yet another lora to the stack and start moving back to the grid issue. I suppose if I have to I could run images through that workflow, but I'd rather try and find the right prompt to make the angle right from the first generation than to have to iterate on it. I will try "lock" as Phil mentioned.

@Phil2sat , how would you prompt for a camera fixed "from the side" using lock? My issue is that with multiple characters, Qwen tries to keep them both in frame which is good and what I want, but it locks the camera at like 45 degrees to the side when I need it a full 90 degrees from the side to get real side view. That's my struggle.

https://www.reddit.com/r/StableDiffusion/comments/1nxcqro/prompts_for_camera_control_in_qwen_edit_2509/&ved=2ahUKEwiAx4vgn-CQAxWeSfEDHWpfETcQFnoECCEQAQ&usg=AOvVaw3Mw9ntrwj__861-uwy68B0
thats what i would try but im actually trying to make a comfyui node https://unicomai.github.io/LeMiCa/ so developing and i think i got something 😁

Phr00t
/

Qwen-Image-Edit-Rapid-AIO

v9 Testing