EliGen: Entity-Level Controlled Image Generation

Introduction

We propose EliGen, a novel approach that leverages fine-grained entity-level information to enable precise and controllable text-to-image generation. EliGen excels in tasks such as entity-level controlled image generation and image inpainting, while its applicability is not limited to these areas. Additionally, it can be seamlessly integrated with existing community models, such as the IP-Adapter and In-Context LoRA.

Methodology

regional-attention

We introduce a regional attention mechanism within the DiT framework to effectively process the conditions of each entity. This mechanism enables the local prompt associated with each entity to semantically influence specific regions through regional attention. To further enhance the layout control capabilities of EliGen, we meticulously contribute an entity-annotated dataset and fine-tune the model using the LoRA framework.

  1. Regional Attention: Regional attention is shown in the above figure, which can be easily applied to other text-to-image models. Its core principle involves transforming the positional information of each entity into an attention mask, ensuring that the mechanism only affects the designated regions.

  2. Dataset with Entity Annotation: To construct a dedicated entity control dataset, we start by randomly selecting captions from DiffusionDB and generating the corresponding source image using Flux. Next, we employ Qwen2-VL 72B, recognized for its advanced grounding capabilities among MLLMs, to randomly identify entities within the image. These entities are annotated with local prompts and bounding boxes for precise localization, forming the foundation of our dataset for further training.

  3. Training: We utilize LoRA (Low-Rank Adaptation) and DeepSpeed to fine-tune regional attention mechanisms using a curated dataset, enabling our EliGen model to achieve effective entity-level control.

Usage

This model was trained using DiffSynth-Studio. We recommend using DiffSynth-Studio for generation.

git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .
  1. Entity-Level Controlled Image Generation
    EliGen achieves effective entity-level control results. See entity_control.py for usage.
  2. Image Inpainting
    To apply EliGen to image inpainting tasks, we propose an inpainting fusion pipeline that preserves non-inpainted areas while enabling precise, entity-level modifications within inpainted regions.
    See entity_inpaint.py for usage.
  3. Styled Entity Control
    EliGen can be seamlessly integrated with existing community models. We provide an example of integrating it with IP-Adapter. See entity_control_ipadapter.py for usage.
  4. Entity Transfer
    We provide an example of integrating EliGen with In-Context LoRA, achieving interesting entity transfer results. See entity_transfer.py for usage.
  5. Play with EliGen using UI
    Download the EliGen checkpoint from ModelScope to models/lora/entity_control and run the following command to launch the interactive UI:
    python apps/gradio/entity_level_control.py
    

Examples

Entity-Level Controlled Image Generation

  1. Generating images with continuously changing entity positions.
  1. Image generation results for complex combinations of entities, demonstrating the strong generalization capability of EliGen. See entity_control.py example_1-6 for the generation prompts.
Entity Conditions Generated Image
eligen_example_1_mask_0 eligen_example_1_0
eligen_example_2_mask_0 eligen_example_2_0
eligen_example_3_mask_27 eligen_example_3_27
eligen_example_4_mask_21 eligen_example_4_21
eligen_example_5_mask_0 eligen_example_5_0
eligen_example_6_mask_8 eligen_example_6_8
  1. Demonstration of the robustness of EliGen. The following examples are generated using the same prompt but different random seeds. Refer to entity_control.py example_7 for the generation prompt.
Entity Conditions Generated Image
eligen_example_7_mask_5 eligen_example_7_5
eligen_example_7_mask_5 eligen_example_7_6
eligen_example_7_mask_5 eligen_example_7_7
eligen_example_7_mask_5 eligen_example_7_8

Image Inpainting

Demonstration of the inpainting mode in EliGen. See entity_inpaint.py for generation prompts.

Inpainting Input Inpainting Output
inpaint_i1 inpaint_o1
inpaint_i2 inpaint_o2

Styled Entity Control

Demonstration of styled entity control results using EliGen and IP-Adapter. See entity_control_ipadapter.py for generation prompts.

Style Reference Entity Control Variation 1 Entity Control Variation 2 Entity Control Variation 3
ip_ref ip_1 ip_2 ip_3

We also provide a demo of styled entity control results using EliGen with a specific style LoRA. See ./styled_entity_control.py for details. Below is the visualization of EliGen combined with the Lego DreamBooth LoRA.

image_1_base result1 result2 result3
image_1_base result1 result2 result3

Entity Transfer

Demonstration of the entity transfer results using EliGen and In-Context LoRA. See entity_transfer.py for generation prompts.

Entity to Transfer Transfer Target Image Transfer Example 1 Transfer Example 2
ic_logo ic_target ic_1 ic_2

Citation

If you find our work helpful, please consider citing us:

@article{zhang2025eligen,
  title={Eligen: Entity-level controlled image generation with regional attention},
  author={Zhang, Hong and Duan, Zhongjie and Wang, Xingjun and Chen, Yingda and Zhang, Yu},
  journal={arXiv preprint arXiv:2501.01097},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support