Poster
The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation
Yi Yao · Chan-Feng Hsu · Jhe-Hao Lin · Hongxia Xie · Terence Lin · Yi-Ning Huang · Hong-Han Shuai · Wen-Huang Cheng
# 193
In spite of recent advancements in text-to-image generation, it still has limitations when it comes to complex, imaginative text prompts. Due to the limited exposure to diverse and complex data in their training sets, text-to-image models often struggle to comprehend the semantics of these difficult prompts, leading to the generation of irrelevant images. This work explores how diffusion models can process and generate images based on prompts requiring artistic creativity or specialized knowledge. Recognizing the absence of a dedicated evaluation framework for such tasks, we introduce a new benchmark, the Realistic-Fantasy Benchmark (RFBench), which blends scenarios from both realistic and fantastical realms. Accordingly, for reality and fantasy scene generation, we propose an innovative training-free approach, Realistic-Fantasy Network (RFNet), that integrates diffusion models with LLMs. Through our proposed RFBench, extensive human evaluations coupled with GPT-based compositional assessments have demonstrated our approach's superiority over other state-of-the-art methods.
Live content is unavailable. Log in and register to view live content