paper

Cost-Aware Routing for Efficient Text-To-Image Generation

arxiv 2025

Qinchan Li, Tandon School of Engineering, New York University

Kenneth Chen, Tandon School of Engineering, New York University

Changyue Su, Tandon School of Engineering, New York University

Wittawat Jitkrittum, Google Research

Qi Sun, Tandon School of Engineering, New York University

Patsorn Sangkloy, Tandon School of Engineering, New York University

Two input prompts that require different denoising steps to ensure quality. As shown in (c), prompt (a) only requires a small number of denoising steps to reach a high CLIPScore. By contrast, the more complex prompt (b) requires over 100 steps to reach a similar quality. Key to our proposed CATImage is to allocate an appropriate amount of computation for each prompt, so that the overall computational cost is reduced while the quality remains the same.

Abstract

Diffusion models are well known for their ability to generate a high-fidelity image for an input prompt through an iterative denoising process. Unfortunately, the high fidelity also comes at a high computational cost due the inherently sequential generative process. In this work, we seek to optimally balance quality and computational cost, and propose a framework to allow the amount of computation to vary for each prompt, depending on its complexity. Each prompt is automatically routed to the most appropriate text-to-image generation function, which may correspond to a distinct number of denoising steps of a diffusion model, or a disparate, independent text-to-image model. Unlike uniform cost reduction techniques (e.g., distillation, model quantization), our approach achieves the optimal trade-off by learning to reserve expensive choices (e.g., 100+ denoising steps) only for a few complex prompts, and employ more economical choices (e.g., small distilled model) for less sophisticated prompts. We empirically demonstrate on COCO and DiffusionDB that by learning to route to nine already-trained text-to-image models, our approach is able to deliver an average quality that is higher than that achievable by any of these models alone.

Links

paper

paper paper paper paper paper paper paper paper paper paper