Generative AI is rapidly changing the software development landscape. I’ve been exploring how these tools impact my work, both personally and professionally. Here’s what I’ve learned from my journey so far, from solo experiments to team-wide adoption.
I’ve used generative tools as part of my coding since 2019, starting with TabNine. I began experimenting with the current generation of Large Language Models (LLMs) for coding in my side projects in 2022, test-driving tools like Cursor and Windsurf, and I currently use Gemini Advanced. At work, we’ve been using Github Copilot as part of our software production process since April 2023. I’ve also conducted exploratory projects to understand the true potential of LLM tools in software engineering, separating hype from reality. I have not worked on integrating LLM tech into our product.
This Requires a Fundamental Reassessment of Your Pull Request Review Skills
Reviewing code is a crucial part of supporting a software team. If your role includes this, be aware: reviewing AI-generated code requires a different approach. Years of experience have taught me to spot bugs based on common human errors: incorrect syntax, spelling mistakes, or a general ‘code smell’. Inconsistencies in variable names, for example, often indicated that the author had lost the thread of the feature.
LLM-generated code lacks these telltale signs. It appears convincing, but it can fail unexpectedly. You might find variables that lead nowhere, are re-used inappropriately (just to avoid linter errors), or encounter types that, while technically correct, don’t accurately describe the data flow or prevent bugs.
The prompt is the thing; the prompt is king.
It can be easy to input basic prompts and get basic results. I would not go so far as to describe writing prompts as ‘prompt engineering’, but there is a method and a process to improve prompts. If you and your team are adopting LLM tools, then you need to be open and collaborative on how to improve in the practice as a team. You will need to invest some time to learn these techniques, and as ever, the hype is strong on Socials, so it can be hard to tell the wood from the trees. There are a lot of good resources available for learning how to prompt Anthropic has outstanding documentation that details techniques that can apply to all of the tools. There is excellent community documentation for GitHub Copilot.
Filling in the Gaps with Generative AI
At Mixcloud, we hold ‘hack weeks’ a couple of times a year, pausing regular work to experiment with new technologies. Last year, I explored how LLMs and other machine learning technologies could be used to maximum effect. I worked solo, aiming to see what kind of impact a frontend specialised tech lead could have by using generative AI to fill in other skill gaps.
I started by using Miro to brainstorm user stories. The LLM didn’t produce anything groundbreaking, but it provided an adequate starting point.

For design, I experimented with several solutions, and Galileo stood out as the best. The results weren’t inspirational. Honestly, while I’m no designer, I could have achieved a similar result in half a day using Figma. A notebook sketch probably would have sufficed, although the more polished output from Galileo did feel more substantial – which can be both an advantage and a disadvantage. Where it was very useful was in consuming the documentation and quickly providing a wireframe. I did not have to read the documentation and summarise its capabilities directly to the designer. The output wasn’t earth-shattering, but it was good enough to get moving on an MVP.

I understand that a designer’s role is far more complex than simply producing wireframes. The best designers I’ve worked with deeply understand their product’s user personas and the competitive landscape. They make design decisions rooted in empathy for their users. Whether a tool like Gemini, with access to a Google Workspace’s user research, could surpass Galileo in the future remains an open question. This tool performed better than the least skilled designers I’ve worked with, as it operated within the technological constraints, but the final product lacked inspiration.
Limitations: Technology Selection and the Frontend
While my hack week project showed some promise, I’ve also encountered significant limitations when using LLMs for broader development tasks. One of the most glaring is that out of the box, LLMs are poor at technology selection for the frontend.
It’s a truism that design by committee results in the best that the least skilled member of the committee can imagine. In many ways, using LLM tools for technology selection is a direct route to failure. They tend to suggest only the most popular technologies. You could argue that frontend technology selection has always been a popularity contest, to some extent. LLMs can provide a general overview. I’ve found Gemini exceptionally helpful for generating reports and summarising available technologies when combined with in-depth research. However, they fail at the fundamental process of analysing requirements and constraints to recommend the best technologies for a given problem.
If you’re interested in the reports that Gemini can generate here is one that Gemini produced when researching datepickers in React.

Architecture is Not a Democracy
Frontend development moves quickly. This rapid pace can be challenging, especially for those new to the specialty. However, when choosing an architecture, the most popular choice isn’t always the right one. The old adage was, “You can’t get fired for choosing IBM,” and for a few years, it felt like, “You can’t get fired for choosing React.” That’s not strictly true anymore. To reiterate, wherever I asked LLMs architectural questions, they consistently suggested the most popular technologies, not necessarily the best ones for my specific requirements. If you ask an LLM an architectural question be prepared to be recommended Next.js, no matter what the scale of problem that you’re trying to solve. In frontend development, the best technologies available today are often relatively new (having stabilised only in the last year or two), meaning the model’s training data significantly influences the results.
Gherkin Specifications: Unlocking LLM Potential
If you use Gherkin scenarios – specification statements based on “Given, When, Then” – you can create a precise specification for the LLM to work from.
GIVEN the user has not enabled any monetisations
WHEN the Viewer is viewing their own profile,
THEN the profile should show an upsell linking to the monetisation settings
GIVEN the user has not enabled any monetisations
WHEN the viewer is not the owner
THEN the profile should not render any monetisation controls
GIVEN the user has enabled monetisation
WHEN the viewer is the owner
THEN the profile should show the monetisation controls with the buttons disabled
GIVEN the user has enabled monetisation
WHEN the viewer is not the owner
THEN the profile should render the monetisation controls
GIVEN the user has enabled monetisation
WHEN the viewer is premium
THEN the profile should render the monetisation controls with the option to pay in tokens
When building a component, starting with Gherkin acceptance criteria allows tools like Copilot, Claude, and Gemini to easily generate unit tests, including effective setup and teardown.
Accessibility Shortcomings: A Critical Issue
LLMs struggle with accessibility, and some of the code they produce is actively harmful in this regard. This is a significant shortcoming that I hope model-producing companies will address. I understand that every engineering specialty likely has its own “LLMs can’t do X” (e.g., “LLMs can’t handle infrastructure scaling,” or “LLMs can’t structure microservices maintainably” – and, let’s be honest, that last one is a challenge for many humans, too!).
The Unsustainable Energy Cost of LLMs
Beyond code quality and architectural decisions, there’s a larger, often overlooked issue: the environmental impact. I’m deeply concerned about the significant energy cost incurred each time I query an LLM. Every interaction with these tools reinforces the unsustainable energy demands of our industry.
What, then, is the Future of Software Engineering?
Some believe we’re entering an era of “disposable code,” where code is rewritten from scratch for every change, dramatically reducing software costs. It’s conceivable that an LLM’s context window could eventually encompass all of a business’s domain knowledge, with engineers maintaining a vast catalogue of specifications.
There’s some truth to this, and we are entering a transitional phase of significantly increased development velocity. However, experienced practitioners will still be crucial. We’re entering an “arms race” where efficiency will become paramount, and traditional scale moats may erode. It’s becoming essential for all software engineers to adopt LLM-enhanced workflows.
While I recognise that LLMs are not neutral knowledge mediators, they are a helpful catalyst for software engineering teams to increase their impact and a vital part of any scaling team. In light of that, my most pressing concern is their energy consumption. We need clean energy-powered models, and the model companies must prioritise efficiency. Hopefully, the open-sourcing of models like DeepSeek will encourage a shift away from the “muscle car” approach to model training towards smaller, more refined applications.
Comments
Add your comment