stanford,

No.24-20 Controllable Visual Synthesis via Structural Representation

Follow Dec 13, 2024 · 1 min read
No.24-20 Controllable Visual Synthesis via Structural Representation
Share this

End-to-end neural approaches have revolutionized visual generation, producing stunning outputs from natural language prompts. However, precise controls remain challenging through direct text-to-scene generation, as natural language alone lacks the precision needed for specifying complex visual relationships. This talk explores compositional visual representations bridging pre-trained language and visual generative models through programs for scene structure, words for semantic abstraction, and neural embeddings capturing visual identity. Our results show how such representations enable precise scene control while building on modern generative models’ capabilities, suggesting a scalable path for controllable visual synthesis.

Speaker Bio

Yunzhi Zhang is a PhD student at Stanford University, advised by Jiajun Wu. Her current research interest lies in learning structural representation and generative models from visual data. She is supported by the Stanford Interdisciplinary Graduate Fellowship.

More Details

3d
Join Newsletter
Get the latest news right in your inbox. We never spam!