Skip to main content

Claude API with Python: The Complete Beginner's Guide — From Setup to Streaming Output

A hands-on walkthrough for calling the Claude API in Python. Covers installation, your first API call, multi-turn conversations, and streaming output — with fully working code and real screenshots. No VPN or proxy required.

Dev GuidesDeveloper Guideclaude apipythonEst. read10min
2026.04.05 published
Claude API with Python: The Complete Beginner's Guide — From Setup to Streaming Output

Claude API with Python: The Complete Beginner’s Guide — From Setup to Streaming Output

Never used the Claude API before? This guide starts from scratch — environment setup, sending your first message, multi-turn conversations, and streaming output. Every step includes copy-paste-ready code that runs out of the box. No proxy or VPN needed.


What You’ll Learn

  • ✅ Installing and configuring the Anthropic Python SDK
  • ✅ Making your first API request — with real terminal screenshots
  • ✅ Setting a System Prompt to give Claude a custom persona
  • ✅ Building multi-turn conversations with context (3 live demo rounds)
  • ✅ Streaming output for a typewriter-style UX
  • ✅ FastAPI + SSE integration — full frontend & backend code included
  • ✅ Production-grade error handling and best practices

All code in this guide has been tested locally. Screenshots show actual terminal output. Everything is ready to copy and run.


目录

  1. 环境准备
  2. 第一步:安装 Anthropic SDK
  3. 第二步:发出第一条消息
  4. 第三步:System Prompt 设定 AI 角色
  5. 第四步:实现多轮对话
  6. 第五步:流式输出 Streaming
  7. 第六步:FastAPI 集成 SSE 推流
  8. 第七步:生产环境错误处理
  9. 模型选型参考
  10. 常见问题 FAQ

Prerequisites

Before you start, make sure you have Python ≥ 3.8 and pip available.

# Check your Python version
python --version

# Check pip is available
pip --version
# Check your Python version
python --version

# Check pip is available
pip --version

ou’ll also need a Claude API key. Sign up atClaudeAPI.com — access works from anywhere without a VPN, and you’ll get free credits on sign-up to run your first request within 5 minutes.


Step 1: Install the Anthropic SDK

pip install anthropic
pip install anthropic

Verify the installation:

python -c "import anthropic; print(anthropic.__version__)"
# Expected output: 0.40.0
python -c "import anthropic; print(anthropic.__version__)"
# Expected output: 0.40.0

**Slow download speeds?**Try using an alternative PyPI mirror:

pip install anthropic -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install anthropic -i https://pypi.tuna.tsinghua.edu.cn/simple

Step 2: Send Your First Message

Create a new file called hello_claude.py, and add the following code:

import os
import anthropic

import os
import anthropic
# Clear any proxy environment variables to avoid SSL conflicts
os.environ['HTTP_PROXY']  = ''
os.environ['HTTPS_PROXY'] = ''
os.environ['ALL_PROXY']   = ''
os.environ['http_proxy']  = ''
os.environ['https_proxy'] = ''
os.environ['all_proxy']   = ''
client = anthropic.Anthropic(
    api_key="your-api-key-here",          # Replace with your API key
    base_url="https://api.claudeapi.com", # ClaudeAPI relay endpoint — no VPN required
    timeout=60.0,
)
message = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain recursion in one sentence."}
    ]
)
print(message.content[0].text)
import os
import anthropic

import os
import anthropic
# Clear any proxy environment variables to avoid SSL conflicts
os.environ['HTTP_PROXY']  = ''
os.environ['HTTPS_PROXY'] = ''
os.environ['ALL_PROXY']   = ''
os.environ['http_proxy']  = ''
os.environ['https_proxy'] = ''
os.environ['all_proxy']   = ''
client = anthropic.Anthropic(
    api_key="your-api-key-here",          # Replace with your API key
    base_url="https://api.claudeapi.com", # ClaudeAPI relay endpoint — no VPN required
    timeout=60.0,
)
message = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain recursion in one sentence."}
    ]
)
print(message.content[0].text)

Run it:

python hello_claude.py
python hello_claude.py

Heads up: max_tokens is required in the Claude API — this is the biggest difference from OpenAI. Omitting it will throw a 422 error. 1024 is a safe default for most use cases.


Step 3: Set an AI Role with System Prompts

The system parameter controls Claude’s persona and behavior. Unlike OpenAI, Claude treats system as a top-level parameter — it’s not passed inside the messages array.

message = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=2048,
    system="You are a senior Python engineer. Keep your code clean and concise. Always respond directly — no filler — and you must include a runnable code example.",
    messages=[
        {"role": "user", "content": "How do I read a large file without running out of memory?"}
    ]
)
print(message.content[0].text)
message = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=2048,
    system="You are a senior Python engineer. Keep your code clean and concise. Always respond directly — no filler — and you must include a runnable code example.",
    messages=[
        {"role": "user", "content": "How do I read a large file without running out of memory?"}
    ]
)
print(message.content[0].text)

System Prompt Writing Tips

Technique Example Effect
Define a role You are a senior {domain} engineer More focused, domain-appropriate responses
Specify output format Output code only, no explanations Less filler, precise output
Constrain response length Be concise, 100 words max Control token usage and cost
Lock the language/framework Use Python 3.10+ syntax only Avoids outdated patterns
Ban filler phrases No apologies, no "certainly" or "of course" Cuts the fluff, gets straight to the point

Step 4: Build Multi-Turn Conversations

The Claude API is stateless — there’s no built-in memory. For multi-turn conversations, you need to manually pass the full message history on every request. The only rule: user and assistant turns must strictly alternate.

history = []
def chat(user_input: str) -> str:
    history.append({"role": "user", "content": user_input})
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=2048,
        system="You are a professional coding assistant.",
        messages=history,
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply
print("=== Round 1 ===")
print(chat("Write me a function to calculate the Fibonacci sequence."))
print("\n=== Round 2 ===")
print(chat("Now update it to use memoization to avoid redundant calculations."))
print("\n=== Round 3 ===")
print(chat("Add input validation — negative numbers should raise an exception."))
history = []
def chat(user_input: str) -> str:
    history.append({"role": "user", "content": user_input})
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=2048,
        system="You are a professional coding assistant.",
        messages=history,
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply
print("=== Round 1 ===")
print(chat("Write me a function to calculate the Fibonacci sequence."))
print("\n=== Round 2 ===")
print(chat("Now update it to use memoization to avoid redundant calculations."))
print("\n=== Round 3 ===")
print(chat("Add input validation — negative numbers should raise an exception."))

Managing History Length (Preventing Token Limit Errors)

Token usage grows linearly with conversation length. Add a simple cap to keep things under control:

MAX_HISTORY = 20  # Keep only the most recent 20 messages
def chat(user_input: str) -> str:
    history.append({"role": "user", "content": user_input})
    trimmed = history[-MAX_HISTORY:] if len(history) > MAX_HISTORY else history
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=2048,
        messages=trimmed,
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply
MAX_HISTORY = 20  # Keep only the most recent 20 messages
def chat(user_input: str) -> str:
    history.append({"role": "user", "content": user_input})
    trimmed = history[-MAX_HISTORY:] if len(history) > MAX_HISTORY else history
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=2048,
        messages=trimmed,
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

Step 5: Streaming Output

By default, the API waits for the full response before returning anything. Streaming lets content render as it’s generated — giving users that familiar typewriter effect and a much snappier feel.

Basic Streaming

with client.messages.stream(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Implement bubble sort in Python and walk through each line with comments."}
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
print("\n\nStream complete.")
with client.messages.stream(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Implement bubble sort in Python and walk through each line with comments."}
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
print("\n\nStream complete.")

Streaming + Token Usage Tracking

with client.messages.stream(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Implement bubble sort in Python and walk through each line with comments."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    final = stream.get_final_message()
print(f"\n\nToken usage — Input: {final.usage.input_tokens}, Output: {final.usage.output_tokens}")
with client.messages.stream(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Implement bubble sort in Python and walk through each line with comments."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    final = stream.get_final_message()
print(f"\n\nToken usage — Input: {final.usage.input_tokens}, Output: {final.usage.output_tokens}")

Step 6: FastAPI + SSE Streaming Endpoin

t The typewriter effect on the frontend, SSE push on the backend — this is the most common production pattern for real-time AI responses.

Install dependencies:

pip install fastapi uvicorn
pip install fastapi uvicorn

Backend(save as main.py):

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
import anthropic, os

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"], allow_methods=["*"], allow_headers=["*"]
)

client = anthropic.Anthropic(
    api_key="your-api-key-here",
    base_url="https://api.claudeapi.com",
    timeout=60.0,
)

@app.get("/chat")
async def chat_stream(q: str, system: str = "You are a professional assistant."):
    def generate():
        with client.messages.stream(
            model="claude-haiku-4-5-20251001",
            max_tokens=2048,
            system=system,
            messages=[{"role": "user", "content": q}],
        ) as stream:
            for text in stream.text_stream:
                yield f"data: {text}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(
        generate(),
        media_type="text/event-stream",
        headers={"X-Accel-Buffering": "no"},
    )

# Start the server:
# uvicorn main:app --reload --port 8000
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
import anthropic, os

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"], allow_methods=["*"], allow_headers=["*"]
)

client = anthropic.Anthropic(
    api_key="your-api-key-here",
    base_url="https://api.claudeapi.com",
    timeout=60.0,
)

@app.get("/chat")
async def chat_stream(q: str, system: str = "You are a professional assistant."):
    def generate():
        with client.messages.stream(
            model="claude-haiku-4-5-20251001",
            max_tokens=2048,
            system=system,
            messages=[{"role": "user", "content": q}],
        ) as stream:
            for text in stream.text_stream:
                yield f"data: {text}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(
        generate(),
        media_type="text/event-stream",
        headers={"X-Accel-Buffering": "no"},
    )

# Start the server:
# uvicorn main:app --reload --port 8000

Frontend — consuming the stream (JavaScript):

const source = new EventSource(
    `/chat?q=${encodeURIComponent('Implement bubble sort in Python')}`
);
source.onmessage = (event) => {
    if (event.data === '[DONE]') { source.close(); return; }
    document.getElementById('output').textContent += event.data;
};
source.onerror = () => source.close();
const source = new EventSource(
    `/chat?q=${encodeURIComponent('Implement bubble sort in Python')}`
);
source.onmessage = (event) => {
    if (event.data === '[DONE]') { source.close(); return; }
    document.getElementById('output').textContent += event.data;
};
source.onerror = () => source.close();

Key configX-Accel-Buffering: no prevents reverse proxies like Nginx from buffering SSE responses. Without it, chunks pile up and get sent all at once — killing the typewriter effect.


Step 7: Production-Ready Error Handling

Three things you must do before going live: environment variable management (no hardcoded keys), automatic retries (built into the SDK), and typed error handling.

Standard Production Setup

import anthropic, os
client = anthropic.Anthropic(
    api_key=os.environ["ANTHROPIC_API_KEY"],
    base_url=os.environ.get("ANTHROPIC_BASE_URL", "https://api.claudeapi.com"),
    max_retries=3,    # Auto-handles 429s and 5xx errors with exponential backoff
    timeout=60.0,
)
def chat(prompt: str, system: str = "") -> str:
    kwargs = {
        "model": "claude-haiku-4-5-20251001",
        "max_tokens": 2048,
        "messages": [{"role": "user", "content": prompt}],
    }
    if system:
        kwargs["system"] = system
    try:
        response = client.messages.create(**kwargs)
        return response.content[0].text
    except anthropic.AuthenticationError:
        raise ValueError("Invalid API key — check your environment variables")
    except anthropic.RateLimitError as e:
        raise RuntimeError(f"Rate limit exceeded: {e}") from e
    except anthropic.BadRequestError as e:
        raise ValueError(f"Invalid request parameters: {e}") from e
    except anthropic.APIStatusError as e:
        raise RuntimeError(f"API error {e.status_code}: {e.message}") from e
import anthropic, os
client = anthropic.Anthropic(
    api_key=os.environ["ANTHROPIC_API_KEY"],
    base_url=os.environ.get("ANTHROPIC_BASE_URL", "https://api.claudeapi.com"),
    max_retries=3,    # Auto-handles 429s and 5xx errors with exponential backoff
    timeout=60.0,
)
def chat(prompt: str, system: str = "") -> str:
    kwargs = {
        "model": "claude-haiku-4-5-20251001",
        "max_tokens": 2048,
        "messages": [{"role": "user", "content": prompt}],
    }
    if system:
        kwargs["system"] = system
    try:
        response = client.messages.create(**kwargs)
        return response.content[0].text
    except anthropic.AuthenticationError:
        raise ValueError("Invalid API key — check your environment variables")
    except anthropic.RateLimitError as e:
        raise RuntimeError(f"Rate limit exceeded: {e}") from e
    except anthropic.BadRequestError as e:
        raise ValueError(f"Invalid request parameters: {e}") from e
    except anthropic.APIStatusError as e:
        raise RuntimeError(f"API error {e.status_code}: {e.message}") from e

.env file — add to .gitignore,never commit this:

ANTHROPIC_API_KEY=your-api-key-here
ANTHROPIC_BASE_URL=https://api.claudeapi.com
ANTHROPIC_API_KEY=your-api-key-here
ANTHROPIC_BASE_URL=https://api.claudeapi.com

Error Code Reference

Status Code Meaning How to Handle
401 Invalid API key Verify the key in your environment variables
400/422 Malformed request check max_tokensrequired、messages format
403 No access to this model Confirm your account’s permission tier
429 Rate limit exceeded SDK retries automatically; or upgrade your quota plan
529 Service overloaded Wait briefly — SDK will retry automatically

Model Selection Guide

Model ID Best For Speed Cost
claude-haiku-4-5-20251001 Classification, translation, summarization, Q&A, bulk generation Fastest Lowest
claude-sonnet-4-6 Code generation, writing, analysis, everyday dev tasks Fast Medium
claude-opus-4-6 Complex reasoning, hard tasks, long-document understanding Slower Highest

Tip:Use claude-haiku-4-5-20251001 across the board during prototyping to keep costs low. Switch to Sonnet or Opus only after validating your use case.


FAQ

Q:Is max_tokens required? Yes — unlike OpenAI where it’s optional, Claude API returns a 422 error if you omit it. When in doubt, use 1024 for short tasks and 4096 for longer ones.

Q: Why can’t I put system inside the messages array?

That’s how the native Anthropic SDK is designed — system is a top-level parameter, separate from messages. If you’re using the OpenAI-compatible endpoint ( /v1/chat/completions ), you can keep the role: system syntax. ClaudeAPI.com supports both approaches.

Q: Token count keeps growing in multi-turn conversations — what should I do?

Cap your history length by keeping only the last N turns (see the MAX_HISTORY truncation pattern in Step 4). Alternatively, periodically ask Claude to summarize the conversation history, then replace the raw history with that summary.

Q: What if a streaming response gets interrupted mid-way?

Catch anthropic.APIConnectionError in your generator and decide whether to retry or return whatever partial content was already received. For production, implement reconnection logic with dead-line handling.

Q: Can I use the OpenAI SDK to call Claude API directly?

Yes. ClaudeAPI.com is fully compatible with the OpenAI Chat format — just swap base_url and api_key, change model to claude-haiku-4-5-20251001, and the rest of your code stays untouched.



Summary

Step Key Takeaway
Install pip install anthropic
Basic call client.messages.create()max_tokens is required
System role Top-level system parameter,don’t stuff it into messages
Multi-turn chat anually maintain the history list,strictly alternate user/assistant turns
Streaming client.messages.stream()for typewriter-style output
Web integration FastAPI + SSE— full frontend and backend code included
Production hardening Environment variables + max_retries + typed error handling

Run all the code in this guide without any proxy setup — just point base_url to ClaudeAPI.com and you’re good to go:

client = anthropic.Anthropic(
    api_key="your-claudeapi-key",
    base_url="https://api.claudeapi.com",  # Change only this line — everything else stays the same
)
client = anthropic.Anthropic(
    api_key="your-claudeapi-key",
    base_url="https://api.claudeapi.com",  # Change only this line — everything else stays the same
)

ClaudeAPI.com supports both the native Anthropic format and the OpenAI-compatible format. All Claude models are available, pay-as-you-go with no subscription required.

Get started at claudeapi.commake your first API call in under 5 minutes.


相关阅读


Written and maintained by the ClaudeAPI.com team. Last updated: April 2026.

Related Articles