post.thumbnail.alt

Debugging the Hype: Practical AI for Frontend Engineers

20th February 2025

Filed Under:

  • Blog
  • Studio Notes

Photo by Dorin Seremet on Unsplash

Generative AI is rapidly changing the software development landscape. I’ve been exploring how these tools impact my work, both personally and professionally. Here’s what I’ve learned from my journey so far, from solo experiments to team-wide adoption.

I’ve used generative tools as part of my coding since 2019, starting with TabNine. I began experimenting with the current generation of Large Language Models (LLMs) for coding in my side projects in 2022, test-driving tools like Cursor and Windsurf, and I currently use Gemini Advanced. At work, we’ve been using Github Copilot as part of our software production process since April 2023. I’ve also conducted exploratory projects to understand the true potential of LLM tools in software engineering, separating hype from reality. I have not worked on integrating LLM tech into our product.

This Requires a Fundamental Reassessment of Your Pull Request Review Skills

Reviewing code is a crucial part of supporting a software team. If your role includes this, be aware: reviewing AI-generated code requires a different approach. Years of experience have taught me to spot bugs based on common human errors: incorrect syntax, spelling mistakes, or a general ‘code smell’. Inconsistencies in variable names, for example, often indicated that the author had lost the thread of the feature.

LLM-generated code lacks these telltale signs. It appears convincing, but it can fail unexpectedly. You might find variables that lead nowhere, are re-used inappropriately (just to avoid linter errors), or encounter types that, while technically correct, don’t accurately describe the data flow or prevent bugs.

The prompt is the thing; the prompt is king.

It can be easy to input basic prompts and get basic results. I would not go so far as to describe writing prompts as ‘prompt engineering’, but there is a method and a process to improve prompts. If you and your team are adopting LLM tools, then you need to be open and collaborative on how to improve in the practice as a team. You will need to invest some time to learn these techniques, and as ever, the hype is strong on Socials, so it can be hard to tell the wood from the trees. There are a lot of good resources available for learning how to prompt Anthropic has outstanding documentation that details techniques that can apply to all of the tools. There is excellent community documentation for GitHub Copilot.

Filling in the Gaps with Generative AI

At Mixcloud, we hold ‘hack weeks’ a couple of times a year, pausing regular work to experiment with new technologies. Last year, I explored how LLMs and other machine learning technologies could be used to maximum effect. I worked solo, aiming to see what kind of impact a frontend specialised tech lead could have by using generative AI to fill in other skill gaps.

I started by using Miro to brainstorm user stories. The LLM didn’t produce anything groundbreaking, but it provided an adequate starting point.

A screenshot of the whiteboarding tool miro, in the left sidebar it says "Create with AI" In the prompt is "Create user stories for moderating comments in a live streaming app"
And on the right there are post-it notes that read:
- As a moderator, I want to approve or reject comments so that inappropriate content is not displayed.
- As a moderator, I want to mute users temporarily so that disruptive participants can be controlled.
- As a moderator, I want to ban users permanently so that repeat offenders are removed from the community.
- As a moderator, I want to flag comments for review so that questionable content can be assessed later.
- As a moderator, I want to edit comments for clarity or appropriateness so that the conversation remains respectful.
- As a moderator, I want to view a history of user comments so that I can make informed decisions about moderation.
- As a moderator, I want to receive alerts for flagged words or phrases so that I can respond quickly to potential issues.
- As a moderator, I want to set up auto-moderation rules so that common issues are handled automatically.
- As a moderator, I want to communicate with other moderators in real-time so that we can coordinate our efforts.
- As a moderator, I want to provide feedback to users on why their comments were moderated so that they understand the rules.
Using Miro’s “Create with AI” for user story mapping

For design, I experimented with several solutions, and Galileo stood out as the best. The results weren’t inspirational. Honestly, while I’m no designer, I could have achieved a similar result in half a day using Figma. A notebook sketch probably would have sufficed, although the more polished output from Galileo did feel more substantial – which can be both an advantage and a disadvantage. Where it was very useful was in consuming the documentation and quickly providing a wireframe. I did not have to read the documentation and summarise its capabilities directly to the designer. The output wasn’t earth-shattering, but it was good enough to get moving on an MVP.

A screenshot of Gallileo AI at the top is my avatar with the prompt "a user interface for a comment moderation rules to detect and filter toxic comments with the Perspective API"
Below are two wireframe mockups the first one has a heading Moderate comments and has the following plus minus controls:
Toxicity
This setting determines the threshold for toxic comments
Insult
This setting determines the threshold for insult comments
Profanity
This setting determines the threshold for profanity
Threat
This setting determines the threshold for threatening comments
Half way down the mobile mockup there is a Heading that reads "Comments that would be flagged"
I hate you!
Toxicity: 0.9
You are so stupid
I will kill you
Threat: 0.9
There is a "Save Changes" button at the bottom of the screen.
The Second mockup has a header that reads "Toxicity Detection"
Below that are sliders controls with the following labels
Adjust thresholds
Personal attacks
Insult
Profanity
Threats
Sample comments
at the bottom of the screen is a link that reads "Show more samples" and two buttons reading "Skip" & "Accept"
GalileoAI generating wireframes

I understand that a designer’s role is far more complex than simply producing wireframes. The best designers I’ve worked with deeply understand their product’s user personas and the competitive landscape. They make design decisions rooted in empathy for their users. Whether a tool like Gemini, with access to a Google Workspace’s user research, could surpass Galileo in the future remains an open question. This tool performed better than the least skilled designers I’ve worked with, as it operated within the technological constraints, but the final product lacked inspiration.

Limitations: Technology Selection and the Frontend

While my hack week project showed some promise, I’ve also encountered significant limitations when using LLMs for broader development tasks. One of the most glaring is that out of the box, LLMs are poor at technology selection for the frontend.

It’s a truism that design by committee results in the best that the least skilled member of the committee can imagine. In many ways, using LLM tools for technology selection is a direct route to failure. They tend to suggest only the most popular technologies. You could argue that frontend technology selection has always been a popularity contest, to some extent. LLMs can provide a general overview. I’ve found Gemini exceptionally helpful for generating reports and summarising available technologies when combined with in-depth research. However, they fail at the fundamental process of analysing requirements and constraints to recommend the best technologies for a given problem.
If you’re interested in the reports that Gemini can generate here is one that Gemini produced when researching datepickers in React.

A screenshot of Gemini 
The heading reads "Accessible React Datepicker Selection"
The text is reproduced below: 
Researching 5 websites...

Ask Gemini (G)
Gemini can make mistakes, including about people, so double-check it. Your privacy and Gemini are protected by Google's Privacy Policy.

In researching your report, Gemini searches the web and browses content from relevant websites on your behalf.


The best React Native date picker libraries.... blog.logrocket.com/best-react-native-date-...
Most Awesome React Datepicker Example.... themeselection.com/react-datepicker
Choosing the Best React Date Pickers for... dhiwise.com/post/choosing-the-best-react-...
Top 12+ React Datepickers to Use in 2024.... flatlogic.com/blog/top-12-react-datepickers-...
Exploring the top React date pickers - Log.... blog.logrocket.com/top-react-date-pickers
Gemini Advanced performing deep research looking for a react date picker

Architecture is Not a Democracy

Frontend development moves quickly. This rapid pace can be challenging, especially for those new to the specialty. However, when choosing an architecture, the most popular choice isn’t always the right one. The old adage was, “You can’t get fired for choosing IBM,” and for a few years, it felt like, “You can’t get fired for choosing React.” That’s not strictly true anymore. To reiterate, wherever I asked LLMs architectural questions, they consistently suggested the most popular technologies, not necessarily the best ones for my specific requirements. If you ask an LLM an architectural question be prepared to be recommended Next.js, no matter what the scale of problem that you’re trying to solve. In frontend development, the best technologies available today are often relatively new (having stabilised only in the last year or two), meaning the model’s training data significantly influences the results.

Gherkin Specifications: Unlocking LLM Potential

If you use Gherkin scenarios – specification statements based on “Given, When, Then” – you can create a precise specification for the LLM to work from.

GIVEN the user has not enabled any monetisations
WHEN the Viewer is viewing their own profile,
THEN the profile should show an upsell linking to the monetisation settings
GIVEN the user has not enabled any monetisations
WHEN the viewer is not the owner
THEN the profile should not render any monetisation controls
GIVEN the user has enabled monetisation
WHEN the viewer is the owner
THEN the profile should show the monetisation controls with the buttons disabled
GIVEN the user has enabled monetisation
WHEN the viewer is not the owner
THEN the profile should render the monetisation controls
GIVEN the user has enabled monetisation
WHEN the viewer is premium
THEN the profile should render the monetisation controls with the option to pay in tokens

When building a component, starting with Gherkin acceptance criteria allows tools like Copilot, Claude, and Gemini to easily generate unit tests, including effective setup and teardown.

Accessibility Shortcomings: A Critical Issue

LLMs struggle with accessibility, and some of the code they produce is actively harmful in this regard. This is a significant shortcoming that I hope model-producing companies will address. I understand that every engineering specialty likely has its own “LLMs can’t do X” (e.g., “LLMs can’t handle infrastructure scaling,” or “LLMs can’t structure microservices maintainably” – and, let’s be honest, that last one is a challenge for many humans, too!).

The Unsustainable Energy Cost of LLMs

Beyond code quality and architectural decisions, there’s a larger, often overlooked issue: the environmental impact. I’m deeply concerned about the significant energy cost incurred each time I query an LLM. Every interaction with these tools reinforces the unsustainable energy demands of our industry.

What, then, is the Future of Software Engineering?

Some believe we’re entering an era of “disposable code,” where code is rewritten from scratch for every change, dramatically reducing software costs. It’s conceivable that an LLM’s context window could eventually encompass all of a business’s domain knowledge, with engineers maintaining a vast catalogue of specifications.

There’s some truth to this, and we are entering a transitional phase of significantly increased development velocity. However, experienced practitioners will still be crucial. We’re entering an “arms race” where efficiency will become paramount, and traditional scale moats may erode. It’s becoming essential for all software engineers to adopt LLM-enhanced workflows.

While I recognise that LLMs are not neutral knowledge mediators, they are a helpful catalyst for software engineering teams to increase their impact and a vital part of any scaling team. In light of that, my most pressing concern is their energy consumption. We need clean energy-powered models, and the model companies must prioritise efficiency. Hopefully, the open-sourcing of models like DeepSeek will encourage a shift away from the “muscle car” approach to model training towards smaller, more refined applications.

Comments

Add your comment