Skip to main content

Documentation Index

Fetch the complete documentation index at: https://dripart-docs-recommend-assets-api.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Qwen-Image-Layered is a model developed by Alibabaโ€™s Qwen team that can decompose an image into multiple RGBA layers. This layered representation unlocks inherent editability: each layer can be independently manipulated without affecting other content. Key Features:
  • Inherent Editability: Each layer can be independently manipulated without affecting other content
  • High-Fidelity Elementary Operations: Supports resizing, repositioning, and recoloring with physical isolation of semantic components
  • Variable-Layer Decomposition: Not limited to a fixed number of layers - decompose into 3, 4, 8, or more layers as needed
  • Recursive Decomposition: Any layer can be further decomposed, enabling infinite decomposition depth
Related Links:

Qwen-Image-Layered workflow

Download JSON Workflow File

Run on ComfyUI Cloud

Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates. If you canโ€™t find them in the template, your ComfyUI may be outdated. (Desktop versionโ€™s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
  1. You are not using the latest ComfyUI version (Nightly version)
  2. Some nodes failed to import at startup
text_encoders diffusion_models vae Model Storage Location
๐Ÿ“‚ ComfyUI/
โ”œโ”€โ”€ ๐Ÿ“‚ models/
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ text_encoders/
โ”‚   โ”‚      โ””โ”€โ”€ qwen_2.5_vl_7b_fp8_scaled.safetensors
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ diffusion_models/
โ”‚   โ”‚      โ””โ”€โ”€ qwen_image_layered_bf16.safetensors
โ”‚   โ””โ”€โ”€ ๐Ÿ“‚ vae/
โ”‚          โ””โ”€โ”€ qwen_image_layered_vae.safetensors

FP8 version

By default we are using bf16, which requires high VRAM. For lower VRAM usage, you can use the fp8 version: Then update the Load Diffusion model node inside the Subgraph to use it.

Workflow settings

Sampler settings

This model is slow. The original sampling settings are steps: 50 and CFG: 4.0, which will at least double the generation time.

Input size

For input size, 640px is recommended. Use 1024px for high-resolution output.

Prompt (optional)

The text prompt is intended to describe the overall content of the input imageโ€”including elements that may be partially occluded (e.g., you may specify the text hidden behind a foreground object). It is not designed to control the semantic content of individual layers explicitly.