Claude prompts: 10 copy-ready engineering templates

Good Claude prompts do not come from inspiration; they come from patterns and composition. For the same task, a beginner may write a long, inefficient prompt, while an experienced user can stabilize the output with one XML structure and one prefill.

These patterns are not guesswork. They come from Anthropic’s prompting documentation and from production failures that turned into engineering rules. This guide collects 10 high-frequency, high-value prompt templates, each with a full copy-ready example and working Claude API code through ClaudeAPI’s gateway endpoint.

Use XML tags to separate context from instructions

Claude recognizes XML tags more reliably than plain line breaks or fenced code blocks when the prompt contains multiple parts. If your prompt mixes reference material, task instructions, and output requirements in one long paragraph, splitting them with XML tags can improve accuracy.

Bad:

Can you look at the code below and tell me what it does,
then tell me whether it has performance issues,
and also the function above that I asked you about earlier,
you said changing it to async would be enough...

Can you look at the code below and tell me what it does,
then tell me whether it has performance issues,
and also the function above that I asked you about earlier,
you said changing it to async would be enough...

Good:

<context>
We are building a high-concurrency scraping service with about 500 requests per second.
</context>

<code>
def fetch_all(urls):
    results = []
    for url in urls:
        results.append(requests.get(url).text)
    return results
</code>

<task>
1. Explain what this code does in one sentence
2. Identify the most serious performance issue
3. Provide a rewrite that fits the concurrency level in the context
</task>

<output_format>
Answer using the task's 1/2/3 numbering. Keep each point under 5 lines.
</output_format>

<context>
We are building a high-concurrency scraping service with about 500 requests per second.
</context>

<code>
def fetch_all(urls):
    results = []
    for url in urls:
        results.append(requests.get(url).text)
    return results
</code>

<task>
1. Explain what this code does in one sentence
2. Identify the most serious performance issue
3. Provide a rewrite that fits the concurrency level in the context
</task>

<output_format>
Answer using the task's 1/2/3 numbering. Keep each point under 5 lines.
</output_format>

API call:

import anthropic

client = anthropic.Anthropic(
    api_key="sk-your-ClaudeAPI-key",
    base_url="https://gw.claudeapi.com"
)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": prompt_with_xml}]
)

import anthropic

client = anthropic.Anthropic(
    api_key="sk-your-ClaudeAPI-key",
    base_url="https://gw.claudeapi.com"
)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": prompt_with_xml}]
)

Use semantic English tag names such as <context>, <task>, <rules>, and <examples>. Avoid generic tags like <a> or <b>.

Few-shot prompting: 3 examples beat 300 words of rules

When you want Claude to learn a small task with a specific format, such as entity extraction, text classification, or JSON transformation, examples usually matter more than verbal descriptions.

<task>
From a product review, extract the following fields and return JSON:
- sentiment: positive / negative / neutral
- mentioned_features: array
- has_complaint: bool
</task>

<examples>
<example>
<input>The battery life is good, but charging is too slow; it takes 3 hours.</input>
<output>{"sentiment": "neutral", "mentioned_features": ["battery life", "charging speed"], "has_complaint": true}</output>
</example>

<example>
<input>The display looks excellent, with vivid colors.</input>
<output>{"sentiment": "positive", "mentioned_features": ["display"], "has_complaint": false}</output>
</example>

<example>
<input>Support never replied to my message. Bad experience.</input>
<output>{"sentiment": "negative", "mentioned_features": ["support"], "has_complaint": true}</output>
</example>
</examples>

<user_input>
The camera quality is impressive, but startup is a bit slow. Overall, I am satisfied.
</user_input>

<task>
From a product review, extract the following fields and return JSON:
- sentiment: positive / negative / neutral
- mentioned_features: array
- has_complaint: bool
</task>

<examples>
<example>
<input>The battery life is good, but charging is too slow; it takes 3 hours.</input>
<output>{"sentiment": "neutral", "mentioned_features": ["battery life", "charging speed"], "has_complaint": true}</output>
</example>

<example>
<input>The display looks excellent, with vivid colors.</input>
<output>{"sentiment": "positive", "mentioned_features": ["display"], "has_complaint": false}</output>
</example>

<example>
<input>Support never replied to my message. Bad experience.</input>
<output>{"sentiment": "negative", "mentioned_features": ["support"], "has_complaint": true}</output>
</example>
</examples>

<user_input>
The camera quality is impressive, but startup is a bit slow. Overall, I am satisfied.
</user_input>

Three examples are the sweet spot. One example usually does not teach the pattern; ten examples waste tokens and can scatter attention. Choose examples that cover the important boundaries: positive, negative, and mixed cases.

Chain-of-thought: make the model reason before answering

For complex reasoning tasks, including math, logic, and multi-step analysis, asking directly for the answer increases the error rate. A simple version is to add “Let’s think step by step” at the end of the prompt.

A more engineering-friendly version is to lock the reasoning structure with XML:

<question>
A car travels from A to B in 4 hours at an average speed of 60 km/h. On the return trip along the same route, rain lowers the average speed to 40 km/h. What is the average speed for the round trip?
</question>

<instructions>
Answer using the following structure:

<thinking>
- Step 1: List the known conditions
- Step 2: Calculate the one-way distance
- Step 3: Calculate the return-trip time
- Step 4: Use total distance / total time to calculate the average speed
</thinking>

<answer>
Final answer in km/h, rounded to 1 decimal place
</answer>
</instructions>

<question>
A car travels from A to B in 4 hours at an average speed of 60 km/h. On the return trip along the same route, rain lowers the average speed to 40 km/h. What is the average speed for the round trip?
</question>

<instructions>
Answer using the following structure:

<thinking>
- Step 1: List the known conditions
- Step 2: Calculate the one-way distance
- Step 3: Calculate the return-trip time
- Step 4: Use total distance / total time to calculate the average speed
</thinking>

<answer>
Final answer in km/h, rounded to 1 decimal place
</answer>
</instructions>

For a more advanced setup, use Claude 4.7’s Extended Thinking mode. The model reasons inside dedicated thinking tokens, which can keep the final answer cleaner.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[{"role": "user", "content": question}]
)

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[{"role": "user", "content": question}]
)

Prefill: start the assistant response to lock the format

Anthropic’s API lets you include a role: assistant message as the model’s starting point. Claude continues from that text. This capability is underused.

Scenario 1: Force JSON output

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Return JSON with 3 interesting Python libraries, each containing name and desc."},
        {"role": "assistant", "content": "{\n  \"libraries\": ["}  # Make the model continue from here
    ]
)
print("{\n  \"libraries\": [" + response.content[0].text)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Return JSON with 3 interesting Python libraries, each containing name and desc."},
        {"role": "assistant", "content": "{\n  \"libraries\": ["}  # Make the model continue from here
    ]
)
print("{\n  \"libraries\": [" + response.content[0].text)

With this pattern, the model starts after { "libraries": [ and does not add preambles such as “Sure, here is the JSON you asked for.”

Scenario 2: Keep role consistency

{"role": "assistant", "content": "<legal_advisor_response>\n"}

{"role": "assistant", "content": "<legal_advisor_response>\n"}

The model will continue in the tone implied by the tag.

Scenario 3: Reduce false refusals

Sometimes the model replies with “I cannot help with that…” even when the request is compliant. For a valid request that triggered an overly defensive response, a prefill such as “Sure, I can help you analyze that” can reduce false refusals.

Use this only for compliant requests.

System prompts: put role, constraints, and style at the top

The system field is the highest-priority instruction channel. Put global constraints in system, and put one-off tasks in user.

SYSTEM = """You are a senior Python backend engineer specializing in FastAPI and high-concurrency optimization.

Response rules:
1. Provide directly runnable code only, not pseudocode
2. Avoid multi-line docstrings and filler comments
3. For version-specific details, assume FastAPI 0.115+ and Python 3.12+
4. If the user's request is clearly wrong, point it out directly and provide the correct approach
5. If you are unsure, say "I am not sure" instead of inventing an answer
"""

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=SYSTEM,
    messages=[{"role": "user", "content": user_query}]
)

SYSTEM = """You are a senior Python backend engineer specializing in FastAPI and high-concurrency optimization.

Response rules:
1. Provide directly runnable code only, not pseudocode
2. Avoid multi-line docstrings and filler comments
3. For version-specific details, assume FastAPI 0.115+ and Python 3.12+
4. If the user's request is clearly wrong, point it out directly and provide the correct approach
5. If you are unsure, say "I am not sure" instead of inventing an answer
"""

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=SYSTEM,
    messages=[{"role": "user", "content": user_query}]
)

Keep the system prompt under about 5,000 words. If it grows beyond that, you are probably putting task-specific material in system when it belongs in user.

Control output length

Claude tends to write at length. These three controls work best in this order.

A. Specify a word or line limit

Answer in no more than 100 words.

Answer in no more than 100 words.

B. Provide a structure

Answer using the following structure. Each item must be 1 sentence:
- Conclusion:
- Main reason:
- Recommended approach:

Answer using the following structure. Each item must be 1 sentence:
- Conclusion:
- Main reason:
- Recommended approach:

C. Combine prefill and `max_tokens`

messages=[
    {"role": "user", "content": "Summarize the article above in one sentence."},
    {"role": "assistant", "content": "One-sentence summary:"}
],
max_tokens=80

messages=[
    {"role": "user", "content": "Summarize the article above in one sentence."},
    {"role": "assistant", "content": "One-sentence summary:"}
],
max_tokens=80

The structure tells Claude where to go, and max_tokens prevents the response from drifting.

Role prompting: use a concrete identity instead of an abstract instruction

“Explain this in simple terms” is abstract. “Assume you are teaching a high-school student who has never programmed before” is concrete. The second version usually works much better.

Abstract instruction	Role-based prompt
“Explain this in simple terms”	“Assume you are teaching a high-school student who has never programmed before”
“Make the answer more professional”	“You are a senior tax advisor with 15 years of experience, and the reader is the CFO of a mid-sized company”
“Be rigorous”	“You are a peer reviewer evaluating a paper submitted to a top-tier conference”
“Keep it short”	“You are replying on X/Twitter, and the post must stay under 280 characters”

Role definitions are most stable when placed in the system field.

Treat the model like a worker: define when it should refuse

For correctness-sensitive tasks, such as financial calculations, legal judgment, and medical advice, add explicit refusal conditions.

When answering the following question, if any of the conditions below apply, say "I cannot answer this accurately" and stop:

- Specific data required for the calculation is missing
- The answer depends on a specific legal provision and you are not sure about the latest version
- The request involves a specific medical diagnosis
- The answer depends on facts you cannot verify

When answering the following question, if any of the conditions below apply, say "I cannot answer this accurately" and stop:

- Specific data required for the calculation is missing
- The answer depends on a specific legal provision and you are not sure about the latest version
- The request involves a specific medical diagnosis
- The answer depends on facts you cannot verify

This is one of the most effective ways to reduce fabricated answers. Anthropic has trained Claude to acknowledge uncertainty, but that behavior triggers more reliably when the prompt explicitly encourages it.

Long-document prompting: put the document first and the question last

Anthropic’s prompting guidance reports that, in long-context scenarios, placing the document at the beginning of the message and the question at the end improves accuracy by about 30% compared with the reverse order.

prompt = f"""<document>
{long_document}
</document>

Answer based on the document above: {question}

If the document does not contain relevant information, say "The document does not mention this" and do not guess.
"""

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    messages=[{"role": "user", "content": prompt}]
)

prompt = f"""<document>
{long_document}
</document>

Answer based on the document above: {question}

If the document does not contain relevant information, say "The document does not mention this" and do not guess.
"""

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    messages=[{"role": "user", "content": prompt}]
)

If the document is very long, such as 10,000+ words, and you plan to ask multiple questions, use prompt caching.

messages=[
    {"role": "user", "content": [
        {"type": "text", "text": f"<document>\n{long_document}\n</document>",
         "cache_control": {"type": "ephemeral"}},
        {"type": "text", "text": f"Question: {question}"}
    ]}
]

messages=[
    {"role": "user", "content": [
        {"type": "text", "text": f"<document>\n{long_document}\n</document>",
         "cache_control": {"type": "ephemeral"}},
        {"type": "text", "text": f"Question: {question}"}
    ]}
]

This keeps the stable document portion cacheable while allowing the question to change.

Multishot style control: teach the model your team’s voice

PR descriptions, commit messages, and docs all vary by team. The most effective way to align with a team’s style is to provide historical examples.

<task>
Write a PR description from the git diff below, matching the team's style.
</task>

<style_examples>
<example>
## What
Add retry logic to the payment webhook handler.

## Why
~3% of webhook calls were failing silently due to upstream timeouts.

## How
Exponential backoff, max 5 retries, with DLQ for permanent failures.

## Test
- Unit: tests/payments/test_webhook_retry.py
- Manual: triggered 50 mock failures locally
</example>

<example>
## What
Switch user search from LIKE to Postgres tsvector.

## Why
P99 latency on /api/users/search was 1.8s; reduces to <100ms.

## How
Added GIN index on `users.search_vector`; migration in 0042.

## Test
- Unit: existing search tests
- Benchmark: scripts/bench/users_search.py before/after
</example>
</style_examples>

<diff>
{git_diff_content}
</diff>

<task>
Write a PR description from the git diff below, matching the team's style.
</task>

<style_examples>
<example>
## What
Add retry logic to the payment webhook handler.

## Why
~3% of webhook calls were failing silently due to upstream timeouts.

## How
Exponential backoff, max 5 retries, with DLQ for permanent failures.

## Test
- Unit: tests/payments/test_webhook_retry.py
- Manual: triggered 50 mock failures locally
</example>

<example>
## What
Switch user search from LIKE to Postgres tsvector.

## Why
P99 latency on /api/users/search was 1.8s; reduces to <100ms.

## How
Added GIN index on `users.search_vector`; migration in 0042.

## Test
- Unit: existing search tests
- Benchmark: scripts/bench/users_search.py before/after
</example>
</style_examples>

<diff>
{git_diff_content}
</diff>

The output will naturally align to the What / Why / How / Test structure.

A reusable Claude prompt template

In production, a stable prompt is usually a combination of 3 to 5 patterns from this guide.

SYSTEM = """You are <role>, <constraints>.

Response rules:
1. <rule 1>
2. <rule 2>
3. <rule 3>
"""

USER_PROMPT = f"""
<context>
{background_information}
</context>

<examples>
<example>
<input>...</input>
<output>...</output>
</example>
</examples>

<task>
{specific_task}
</task>

<output_format>
{format_requirements}
</output_format>
"""

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=SYSTEM,
    messages=[
        {"role": "user", "content": USER_PROMPT},
        {"role": "assistant", "content": "<format-locking prefill>"}
    ]
)

SYSTEM = """You are <role>, <constraints>.

Response rules:
1. <rule 1>
2. <rule 2>
3. <rule 3>
"""

USER_PROMPT = f"""
<context>
{background_information}
</context>

<examples>
<example>
<input>...</input>
<output>...</output>
</example>
</examples>

<task>
{specific_task}
</task>

<output_format>
{format_requirements}
</output_format>
"""

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=SYSTEM,
    messages=[
        {"role": "user", "content": USER_PROMPT},
        {"role": "assistant", "content": "<format-locking prefill>"}
    ]
)

This skeleton works for roughly 80% of tasks. For the remaining 20%, add scenario-specific extensions such as chain-of-thought prompting, prompt caching, or tool use.

Calling Claude through ClaudeAPI.com

All code above can run through ClaudeAPI’s gateway endpoint by setting base_url to https://gw.claudeapi.com. For OpenAI-compatible calls, use https://gw.claudeapi.com/v1.

Model selection:

Pattern	Recommended model	Notes
Few-shot prompting / simple classification	`claude-haiku-4-5-20251001`	Lowest cost and fastest
Most daily tasks	`claude-sonnet-4-6`	Best overall balance
Chain-of-thought / complex reasoning / Extended Thinking	`claude-opus-4-7`	Highest reasoning quality

The ClaudeAPI console provides per-request usage details, which helps debug why a prompt became long or expensive.

Summary

Prompt engineering is not magic. Memorize these 10 patterns, then combine XML structure, few-shot examples, and prefill to solve most conversation-quality problems.

The three lines worth remembering are:

Structure beats description: XML tags work better than paragraph breaks.
Examples beat rules: 3 examples can outperform 300 words of explanation.
Prefill beats cleanup: make the model start from the right position instead of repairing the output afterward.

Get started

Use ClaudeAPI when you need production access to Claude features such as Extended Thinking, Prompt Caching, and Tool Use, with usage visibility and enterprise invoicing available. Get an API key.

Claude prompts: 10 copy-ready engineering templates