← Home | Module 07 — Claude API & SDK
0%
Module 07 · Live

Claude API & SDK — Graduate to Code

The CLI is powerful. The API is limitless. This module teaches you when to switch, how to make your first API call, and how to use the features that aren't available in the CLI — tool use, streaming, prompt caching, and batch processing.

~50 min
🏗 Project: AI writing assistant app
🔑 Requires API key

🤔 When to Use the CLI vs the API

Claude Code CLI is your default tool. The API is a step up — more power, more control, more code. Know when the upgrade is worth it.

Use the CLI when…
  • You're working interactively in a project
  • You need to read/write files, run shell commands
  • You're building workflows with hooks and agents
  • You want a conversation, not a one-shot call
  • The task is exploratory — you'll iterate
  • You need MCP server / plugin integrations
Use the API when…
  • You're building a product that calls Claude
  • You need streaming responses in a web UI
  • You need tool use / function calling
  • You're processing thousands of items (batch)
  • You want prompt caching for long system prompts
  • You need fine-grained token / cost control

The mental model: CLI = you talking to Claude. API = your app talking to Claude. If you're shipping something users will interact with, the API is almost always the right choice.

Good news: Everything you've learned in Modules 01–06 still applies. The API doesn't replace your CLAUDE.md workflow or your brief-first approach — it just gives you a programmable interface to the same underlying model.

API Capabilities at a Glance

FeatureWhat it doesCLI equivalent?
messages.createSingle turn or multi-turn conversation✓ Yes (basic)
Tool useClaude calls your functions with structured JSON~ Partial (via hooks)
StreamingTokens arrive as they're generated — live UI updates✓ Default in CLI
Prompt cachingCache repeated system prompts — up to 90% cost reduction✗ Not available
Batch APIProcess 1000s of prompts async — 50% cost discount✗ Not available
Token countingCount tokens before sending — budget control✗ Not available
Model selectionChoose Opus, Sonnet, or Haiku per call✓ /model command

🔧 SDK Setup

The Anthropic SDK is available in Python and TypeScript/JavaScript. Pick the language that matches your project — the API surface is nearly identical in both.

terminal
# Install the SDK
pip install anthropic

# Set your API key (get it at console.anthropic.com)
export ANTHROPIC_API_KEY="sk-ant-..."

# Or add to .env file (use python-dotenv)
echo 'ANTHROPIC_API_KEY=sk-ant-...' >> .env
python — verify setup
import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[{"role": "user", "content": "Say hello in one sentence."}]
)

print(message.content[0].text)
Never hardcode your API key. Always read it from an environment variable or a .env file. Add .env to your .gitignore immediately — a leaked key gets invalidated and can rack up charges before you notice.

Which model to use?

  • claude-opus-4-7 — Most intelligent. Best for complex reasoning, nuanced writing, difficult code. Highest cost.
  • claude-sonnet-4-6 — Best balance. Excellent quality at moderate speed and cost. Default for most apps.
  • claude-haiku-4-5-20251001 — Fastest and cheapest. Great for classification, extraction, high-volume tasks.
Pattern: Use Haiku for cheap pre-processing (classify, filter, extract) and Sonnet for the final step (write, reason, synthesize). This combination cuts cost by 60–80% for pipeline-heavy apps.

📡 Anatomy of an API Call

Every call to messages.create follows the same structure. Understanding each parameter lets you control exactly what Claude does.

model
Which Claude to use
claude-sonnet-4-6, claude-opus-4-7, claude-haiku-4-5-20251001
system
System prompt — Claude's persona and rules
Set tone, role, constraints. Cached with prompt caching for big cost savings.
messages
Conversation history
Array of {role: "user" | "assistant", content: "..."} objects. Build multi-turn by appending.
max_tokens
Max tokens in the response
Controls cost and length. 1024 is a safe default. Output stops if reached — not an error.
tools
Functions Claude can call (optional)
Describe your functions in JSON schema. Claude returns structured calls for you to execute.

Multi-turn conversation pattern

python — multi-turn chat
import anthropic

client = anthropic.Anthropic()
history = []

def chat(user_message):
    history.append({"role": "user", "content": user_message})
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="You are a concise writing assistant. Keep replies under 3 sentences.",
        messages=history
    )
    assistant_msg = response.content[0].text
    history.append({"role": "assistant", "content": assistant_msg})
    return assistant_msg

# Start chatting
print(chat("Write a tagline for a SaaS invoicing app."))
print(chat("Make it shorter and punchier."))
print(chat("Now write 3 variations."))

🔨 Tool Use — Claude Calls Your Functions

Tool use (also called function calling) lets Claude trigger structured actions in your code. Instead of asking Claude to return JSON you parse manually, you define a function schema and Claude returns a precise, typed call you execute.

📝
You define tools JSON schema describing function name + parameters
🤖
Claude decides Returns tool_use block with arguments
⚙️
You execute Run the function, get the result
Claude responds Uses result to form final answer
python — tool use example
import anthropic, json

client = anthropic.Anthropic()

# 1. Define the tool Claude can call
tools = [{
    "name": "get_word_count",
    "description": "Count words in a piece of text",
    "input_schema": {
        "type": "object",
        "properties": {
            "text": {"type": "string", "description": "Text to count"}
        },
        "required": ["text"]
    }
}]

# 2. Your actual function
def get_word_count(text):
    return {"count": len(text.split()), "characters": len(text)}

# 3. Send request — Claude may call the tool
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    tools=tools,
    messages=[{"role": "user",
               "content": "How many words are in: 'The quick brown fox jumps over the lazy dog'?"}]
)

# 4. Handle tool call if Claude decided to use it
if response.stop_reason == "tool_use":
    tool_call = next(b for b in response.content if b.type == "tool_use")
    result = get_word_count(**tool_call.input)

    # 5. Send result back so Claude can answer
    final = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=256,
        tools=tools,
        messages=[
            {"role": "user", "content": "How many words are in: 'The quick brown fox...'?"},
            {"role": "assistant", "content": response.content},
            {"role": "user", "content": [{
                "type": "tool_result",
                "tool_use_id": tool_call.id,
                "content": json.dumps(result)
            }]}
        ]
    )
    print(final.content[0].text)
Real-world tool use patterns: search the web, query a database, call an external API, write/read files, send a notification. Claude decides when to call the tool — you decide what the tool does.

⚡ Streaming — Live Token-by-Token Output

Without streaming, your app waits silently until Claude finishes, then shows the full response at once. With streaming, tokens arrive as they're generated — your UI feels instant, even for long outputs.

WITHOUT STREAMING
  • User sees nothing for 3–10 seconds
  • Full response appears all at once
  • Feels slow even when it isn't
  • OK for backend pipelines, bad for UI
WITH STREAMING
  • First tokens appear in <1 second
  • Text builds in real time like Claude Code CLI
  • Feels fast and responsive
  • Essential for any user-facing chat UI
python — streaming
import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a short poem about shipping software."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)  # prints each chunk as it arrives

# After loop: access the final complete message
final_message = stream.get_final_message()
print(f"\n\nTotal tokens: {final_message.usage.input_tokens + final_message.usage.output_tokens}")
In a web app: pipe the stream through a Server-Sent Events (SSE) endpoint. Your frontend reads chunks with EventSource or fetch with a ReadableStream and appends tokens to a <div>. This is exactly how claude.ai's chat interface works.

💾 Prompt Caching — Cut Costs by Up to 90%

If your app sends the same long system prompt (documentation, codebase context, instructions) with every request, you're paying to re-process that content every time. Prompt caching stores it at Anthropic — you pay 10% of normal input cost on cache hits.

Without caching — every request
System prompt (2000 tokens)$0.003
System prompt (2000 tokens)$0.003
System prompt (2000 tokens)$0.003
× 1000 requests = $3.00 just for system prompt
With caching — cache hit after first
System prompt (2000 tokens) — write$0.00375
System prompt — cache hit$0.0003
System prompt — cache hit$0.0003
× 1000 requests = $0.30 (90% savings)
python — prompt caching
import anthropic

client = anthropic.Anthropic()

# Large system prompt (e.g. documentation, codebase, instructions)
SYSTEM_PROMPT = """You are an expert technical writer...
[imagine 2000+ tokens of detailed instructions here]
"""

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": SYSTEM_PROMPT,
        "cache_control": {"type": "ephemeral"}  # ← this is the only change
    }],
    messages=[{"role": "user", "content": "Summarize the authentication section."}]
)

# Check if cache was used
usage = response.usage
print(f"Cache write: {usage.cache_creation_input_tokens}")
print(f"Cache read:  {usage.cache_read_input_tokens}")
Cache TTL is 5 minutes. The cache expires if no request hits it for 5 minutes. For apps with low traffic or infrequent use, caching won't help much — the first request after each gap always pays the full write cost.

📦 Batch API — Process Thousands Async

Need to run Claude on 500 support tickets, 10,000 product descriptions, or a month of commit messages? Don't loop through them in real time. Use the Batch API: submit all requests at once, get results when they're ready, pay 50% less.

  • Cost: 50% discount vs standard API pricing — no other change required
  • Throughput: Up to 100,000 requests per batch
  • Latency: Results typically within 1 hour, up to 24 hours max
  • Use case: Offline processing, data pipelines, bulk generation — not real-time user interactions
python — batch API
import anthropic

client = anthropic.Anthropic()

# Build a list of requests — each has a custom_id for tracking
tickets = ["Login fails on Safari", "Export button does nothing", "Billing page 500 error"]

requests = [
    anthropic.types.beta.messages.MessageCreateParamsNonStreaming(
        custom_id=f"ticket-{i}",
        params={
            "model": "claude-haiku-4-5-20251001",
            "max_tokens": 128,
            "messages": [{
                "role": "user",
                "content": f"Classify this bug report as Critical/High/Medium/Low. Reply with one word.\n\n{ticket}"
            }]
        }
    )
    for i, ticket in enumerate(tickets)
]

# Submit — returns immediately with a batch ID
batch = client.beta.messages.batches.create(requests=requests)
print(f"Batch submitted: {batch.id} — status: {batch.processing_status}")

# Poll until done (in production, use a webhook or cron job)
import time
while batch.processing_status == "in_progress":
    time.sleep(10)
    batch = client.beta.messages.batches.retrieve(batch.id)

# Retrieve results
for result in client.beta.messages.batches.results(batch.id):
    severity = result.result.message.content[0].text
    print(f"{result.custom_id}: {severity}")
When to use Haiku for batch: Classification, tagging, extraction, summarization, sentiment — anything where you need a quick structured answer. Haiku + Batch = the cheapest possible way to run Claude at scale.
Live Sandbox — Practice API Concepts interactive
Click a quick command to simulate an API scenario
Claude API & SDK sandbox
Simulates API calls and responses. Click a quick command to start.
 
api $

🎯 Challenge

Each task below targets a specific API concept. Complete them in order — each one builds on the last.

  • 1
    First call. Install the Anthropic SDK for your preferred language. Write a script that sends one message asking Claude to list 5 use cases for the API (not the CLI). Print the response. Verify you see output.
  • 2
    Multi-turn chat. Extend your script into a 3-turn conversation. First ask for a product name. Then ask Claude to write a one-line tagline for it. Then ask for 3 variations. Print each response as it comes back.
  • 3
    Stream it. Take your multi-turn script and make the final message (the 3 variations) stream to the terminal in real time. You should see tokens printing character-by-character, not all at once.
  • 4
    Add a tool. Define a format_as_list tool that takes an array of strings and returns them as a numbered markdown list. Ask Claude to generate 5 tagline ideas and call the tool to format them. Print the final formatted result.

🏗 Mini Project — AI Writing Assistant

Build a command-line AI writing assistant that uses streaming output, maintains conversation history, and has one tool — a save_draft function that writes the last Claude response to a file.

WHAT YOU'LL BUILD
💬
Chat loop
Multi-turn history
Streaming
Real-time output
💾
save_draft tool
Write to file
💰
Token counter
Cost per message
Step 1 — Create the project and install the SDK Set up a folder, create a virtual environment, install anthropic, and store your API key.
Step 2 — Build the streaming chat loop Write writer.py — a loop that reads user input, streams Claude's reply, and maintains conversation history.
Step 3 — Add the save_draft tool Define a tool that lets Claude write the last response to a file when the user asks it to save.
Step 4 — Add prompt caching and update CLAUDE.md Cache the system prompt to save on repeated calls, then document the project.