As a full-stack builder, mostly focused on BE and architecture in my day-to-day, I’ve also been using Claude and v0 to generate React components with Tailwind for a while now. It’s great for the easy stuff, but when it comes to complex customized components, the endless “supervised feedback” cycles start to drag. Time killer.
When this tool popped up a few days ago, I thought, “Why not put LLMs’ self-feedback loops to the test and see if we can kill off those manual feedback cycles entirely?” The goal: boost productivity by letting the LLM handle everything—even the feedback—no humans needed.
For this experiment, I grabbed a random design from Dribbble (props to Outcrowd for the design) and tested two approaches:
Why this design? It’s got a mix of components and custom styles—perfect for testing the limits of LLM creativity beyond the usual “dashboard builder” task.
I kept prompts consistent for both approaches with the following steps:
The initial prompt was kept super simple—no extra instructions.
The self-feedback prompt picked out the visual differences and gave detailed instructions for improvement. The LLM iterated based on its own feedback until it hit the pre-set threshold for acceptable differences.
No human feedback was involved—just a few pre-set prompts doing all the heavy lifting.
The first generation was okay, but as expected, a lot of fine details were off. The visual representation wasn't quite right for the custom components.
After the first round of self-feedback, there were small improvements. However, by the second or third iteration, the changes became negligible. It seemed the LLM had reached its limit in terms of improvement, and the design wasn’t getting much better.
Conclusion
The top-down approach hit diminishing returns pretty quickly. It became clear that tackling a page holistically wasn’t going to cut it for more complex, custom components. Time to pivot to the bottom-up approach.
Breaking the design into smaller chunks helped significantly. Fun fact: the LLM thought the new version of the Age Range component was closer to the original design, even though it looked a bit off. Honestly, I can see the logic—it does make sense from a visualization perspective. 😂
The auto-iteration process had pretty much wrapped up based on the predefined prompts. I jumped in with just a tiny bit of manual tweaking to fine-tune things and get it closer to the original design.
(Sure, I could’ve gone through all the nitty-gritty—colors, shading, gradients, you name it—to get it as perfect as possible. But you get the idea.)
And here’s what it looked like right after the initial generation, just for comparison:
This experiment was a fun way to explore LLMs and self-feedback loops in real-world frontend scenarios. If this is the future of development, we're looking at a more automated, time-efficient process. Manual prompt engineering will continue to play a role, but we’re getting smarter about how we approach it.
Keep an eye on RL with auto-optimization—it could be the next big wave. And as they say, "What gets measured, gets improved." ✨