Generating Code with Prompts: The Fundamentals Still Matter
Comments to code, SQL from schemas, and the verification habit that separates working code from plausible code
Code generation is the LLM use case most of us touch every day, and the one we prompt most carelessly. Agents like Claude Code and Copilot hide the prompting from you, but when you are building a product feature that generates code (a SQL builder, a config generator, an internal script writer), you are back to writing the prompts yourself, and the fundamentals decide whether the output runs.
The Prompt Engineering Guide's code generation chapter collects the core patterns. The examples there date back to gpt-3.5-turbo, and models have improved enormously since, but I keep seeing production incidents in 2026 that trace back to ignoring exactly the lessons those old examples teach. So here they are, updated with how I apply them now.
Set the contract in the system prompt#
The guide's examples all start with a system message like this:
You are a helpful code assistant that can teach a junior developer how to
code. Your language of choice is Python. Don't explain the code, just
generate the code block itself.Two things in there do real work. "Your language of choice is Python" pins the language so a vague request does not come back in JavaScript. "Don't explain the code, just generate the code block itself" pins the output shape, which matters when a program, not a human, consumes the response. If you are parsing the output, say exactly what you want back: one fenced code block, no prose. In 2026 you would often go further and use structured outputs to guarantee the shape (I covered that in structured outputs and function calling), but a clear contract in the system prompt is still the first line of defence.
The funny side effect of a strong contract: the guide shows the same assistant refusing to write SQL because it was told it is a Python assistant. Constraints bind. Write them knowing they will be followed more literally than you expect.
Comments to code, and the missing import#
The comment-block pattern is the simplest generation prompt there is. You write what you want as comments and let the model fill in the rest:
"""
1. Create a list of movies
2. Create a list of ratings for these movies
3. Combine them to make a json object of 10 movies with their ratings.
"""The model produces a perfectly plausible solution that builds the lists, zips them into a dict, and calls json.dumps. And in the guide's own example, it forgets import json. The code looks right, reads right, and crashes on line one.
That tiny failure is the whole lesson. Generated code fails in boring, mechanical ways: missing imports, hallucinated method names, off-by-one versions of an API. Modern models miss imports less often, but they still confuse library versions constantly (writing Pydantic v1 syntax in a v2 project is the classic). The response is not a better prompt. It is a verification step: run the code, lint it, or at minimum import-check it before it goes anywhere. If you want the model itself to do more of the verification, the PAL pattern takes this to its logical end by making code execution part of the reasoning loop.
The SQL workflow: generate the test rig too#
The most useful idea in the chapter is not a prompt, it is a workflow. Say you want a query generated from a schema description:
"""
Table departments, columns = [DepartmentId, DepartmentName]
Table students, columns = [DepartmentId, StudentId, StudentName]
Create a MySQL query for all students in the Computer Science department
"""You get a clean join:
SELECT students.StudentId, students.StudentName
FROM students
INNER JOIN departments
ON students.DepartmentId = departments.DepartmentId
WHERE departments.DepartmentName = 'Computer Science';But you have no database to test it against. So you use the model to build one. Same schema description, new instruction: "Create a valid database schema with the above tables and columns." Then a third prompt: "Given the database schema above, generate valid insert statements include 4 rows for each table." Load it all into a scratch database and run the original query. If the dummy data includes two computer science students, the query should return exactly those two rows, and now you have actually verified it.
I use this generate-the-rig pattern well beyond SQL: generate the function, then generate the pytest cases, then run them. The model writes both sides, and execution is the referee. It is not bulletproof (the model can make matching mistakes on both sides), but it catches the large majority of mechanical failures for nearly zero effort.
Give the schema, not the whole database. Including table and column names in the prompt grounds the query in reality. Most bad generated SQL comes from the model guessing at column names you never told it.
Explanation is a first-class use#
Prompting a model to explain code is underrated, and it is the same skill in reverse. Paste the query above and ask "Explain the above SQL statement" and you get a correct plain-English walkthrough of the join. Two practical notes. First, explanation quality degrades when the system prompt fights the request (the Python-only assistant grumbled before explaining SQL). Second, explanations are a good code review primitive: asking "explain what this does" and comparing the answer to what you intended catches logic drift that a lint pass never will.
What has changed since these examples were written#
The chapter still lists "editing code" and "debugging code" as coming soon, which is a period piece in itself: those became the headline use cases. Three updates worth making explicit in 2026:
| Then (2023) | Now (2026) |
|---|---|
| Paste code into a chat window | Agents read the repo themselves and edit in place |
| Prompt for one function at a time | Prompt for a change across many files, with tests |
| Trust, then eyeball the output | Execution, tests, and linters in the loop by default |
What has not changed: the model optimizes for plausible, not correct. Every pattern above that survived three years survived because it adds a source of ground truth (a schema, a test rig, an execution result) to a system that has none of its own. If you take one habit from this post, take that one: never let generated code merge on vibes.
For the prompting fundamentals underneath all of this, the structure of a good instruction is in the anatomy of a good prompt, and when examples help versus hurt is in zero-shot vs few-shot prompting.

Folarin Akinloye is an AI Engineer based in London, UK. He builds production-ready agentic AI systems, multi-agent architectures, and sophisticated RAG implementations, and writes about the engineering decisions behind them.