AI models excel at creating content, but typically render it with static, predefined
interfaces. Specifically, the output of LLMs is often a markdown “wall of text”.
Generative UI is a long standing promise, where the model generates not just the
content, but the interface itself. Until now, Generative UI was not possible in a
robust fashion. We demonstrate that when properly prompted and equipped with the
right set of tools, a modern LLM can robustly produce high quality custom UIs for
virtually any prompt. When ignoring generation speed, results generated by our
implementation are overwhelmingly preferred by humans over the standard LLM markdown
output. In fact, while the results generated by our implementation are worse than
those crafted by human experts, they are at least comparable in 44% of cases. We show
that this ability for robust Generative UI is emergent, with substantial improvements
from previous models.
We also create and release PAGEN, a novel dataset of expert-crafted results to aid in
evaluating Generative UI implementations, as well as the results of our system for
future comparisons.