Over the past year, Designstripe has made extensive use of AI tools, including Cursor, Copilot, and Cline, to help speed up our development process. These tools have proven to be extremely valuable for prototyping and new feature development, especially in the early stages of implementation.
Our experience shows that AI-generated code typically achieves 50%-75% functionality before requiring developer intervention. While this sometimes significantly speeds up development, other cases require a complete rewrite. Recently, we have focused on optimizing our approach to consistently achieve 75%-85% completion rates with minimal rework.
Common challenges arising from LLM code
Through our experiments, we discovered several consistent challenges in AI-assisted development. First, the generated code is often inconsistent with our established coding patterns. For example, although our library uses signals for state management in React, LLM will use the useState hook by default. Second, we encountered “phantom” interfaces and functions – LL.M.s would invent components that didn’t exist in our code base. Finally, we face the problem of over-generation, where the tool will produce complete files with tests that will take a lot of time to parse and modify.
A structured approach to better code generation
Based on our experience, we developed a three-step approach in a recent API integration project:
-
Detailed analysis: Prior to engaging with LLM, we conducted a thorough review of the existing code base and created a detailed implementation plan for Cline to review and analyze all required code changes.
-
context settings: We provide Cline with detailed tips, including our coding standards (such as using signals instead of useState), code library structure, and information about our preferred libraries and patterns. For example, we specified TypeScript configuration preferences and React element patterns.
-
incremental implementation: Instead of producing everything at once, we break the implementation into smaller steps, validating each change before continuing. This includes reviewing individual component updates, API integration code, and type definitions individually.
Results and insights
This structured approach has produced mixed but promising results. The initial steps were very successful, producing code that closely matched our standards and existing patterns. For example, LLM correctly implemented our signal-based state management and maintained consistent typing patterns.
However, we encountered limitations in implementing complex API integrations. After the first three steps, the LL.M. began to lose context, reverting to producing complete documents rather than sustaining our incremental approach.
Lessons learned and next steps
Our experiments revealed several key insights. First, early planning and scenario setting significantly improve the quality of code generation. The more specific our requirements and standards are, the better the results will be.
We also learned that smaller, more precise hints tend to produce better results than trying to generate a large amount of code all at once. This approach helps maintain context and reduces the likelihood of hallucinatory components.
Going forward, we plan to:
- Develop a structured reminder template that includes codebase conventions, preferred patterns, and common pitfalls to avoid
- Break complex features into smaller, focused cues (limit each tip to a single component or feature)
- Build a library of common coding patterns using examples from our code library to improve context setting
- Implement a validation checklist for each generated snippet
in conclusion
While we have not yet reached our goal of 80-90% accuracy, these experiments show that the success of LLM code generation depends more on how we build the interactions than on the capabilities of the tool itself. By continually improving our approach and building on these experiences, we are steadily moving towards a more efficient and reliable AI-assisted development process.