Module 07 · Live

Claude API & SDK — Graduate to Code

The CLI is powerful. The API is limitless. This module teaches you when to switch, how to make your first API call, and how to use the features that aren't available in the CLI — tool use, streaming, prompt caching, and batch processing.

⏱ ~50 min

🏗 Project: AI writing assistant app

🔑 Requires API key

🤔 When to Use the CLI vs the API

Claude Code CLI is your default tool. The API is a step up — more power, more control, more code. Know when the upgrade is worth it.

Use the CLI when…

You're working interactively in a project
You need to read/write files, run shell commands
You're building workflows with hooks and agents
You want a conversation, not a one-shot call
The task is exploratory — you'll iterate
You need MCP server / plugin integrations

Use the API when…

You're building a product that calls Claude
You need streaming responses in a web UI
You need tool use / function calling
You're processing thousands of items (batch)
You want prompt caching for long system prompts
You need fine-grained token / cost control

The mental model: CLI = you talking to Claude. API = your app talking to Claude. If you're shipping something users will interact with, the API is almost always the right choice.

Good news: Everything you've learned in Modules 01–06 still applies. The API doesn't replace your CLAUDE.md workflow or your brief-first approach — it just gives you a programmable interface to the same underlying model.

API Capabilities at a Glance

Feature	What it does	CLI equivalent?
messages.create	Single turn or multi-turn conversation	✓ Yes (basic)
Tool use	Claude calls your functions with structured JSON	~ Partial (via hooks)
Streaming	Tokens arrive as they're generated — live UI updates	✓ Default in CLI
Prompt caching	Cache repeated system prompts — up to 90% cost reduction	✗ Not available
Batch API	Process 1000s of prompts async — 50% cost discount	✗ Not available
Token counting	Count tokens before sending — budget control	✗ Not available
Model selection	Choose Opus, Sonnet, or Haiku per call	✓ /model command

🔧 SDK Setup

The Anthropic SDK is available in Python and TypeScript/JavaScript. Pick the language that matches your project — the API surface is nearly identical in both.

terminal

# Install the SDK
pip install anthropic

# Set your API key (get it at console.anthropic.com)
export ANTHROPIC_API_KEY="sk-ant-..."

# Or add to .env file (use python-dotenv)
echo 'ANTHROPIC_API_KEY=sk-ant-...' >> .env

python — verify setup

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[{"role": "user", "content": "Say hello in one sentence."}]
)

print(message.content[0].text)

Never hardcode your API key. Always read it from an environment variable or a .env file. Add .env to your .gitignore immediately — a leaked key gets invalidated and can rack up charges before you notice.

Which model to use?

claude-opus-4-7 — Most intelligent. Best for complex reasoning, nuanced writing, difficult code. Highest cost.
claude-sonnet-4-6 — Best balance. Excellent quality at moderate speed and cost. Default for most apps.
claude-haiku-4-5-20251001 — Fastest and cheapest. Great for classification, extraction, high-volume tasks.

Pattern: Use Haiku for cheap pre-processing (classify, filter, extract) and Sonnet for the final step (write, reason, synthesize). This combination cuts cost by 60–80% for pipeline-heavy apps.

📡 Anatomy of an API Call

Every call to messages.create follows the same structure. Understanding each parameter lets you control exactly what Claude does.

model

Which Claude to use

claude-sonnet-4-6, claude-opus-4-7, claude-haiku-4-5-20251001

system

System prompt — Claude's persona and rules

Set tone, role, constraints. Cached with prompt caching for big cost savings.

messages

Conversation history

Array of {role: "user" | "assistant", content: "..."} objects. Build multi-turn by appending.

max_tokens

Max tokens in the response

Controls cost and length. 1024 is a safe default. Output stops if reached — not an error.

tools

Functions Claude can call (optional)

Describe your functions in JSON schema. Claude returns structured calls for you to execute.

Multi-turn conversation pattern

python — multi-turn chat

import anthropic

client = anthropic.Anthropic()
history = []

def chat(user_message):
    history.append({"role": "user", "content": user_message})
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="You are a concise writing assistant. Keep replies under 3 sentences.",
        messages=history
    )
    assistant_msg = response.content[0].text
    history.append({"role": "assistant", "content": assistant_msg})
    return assistant_msg

# Start chatting
print(chat("Write a tagline for a SaaS invoicing app."))
print(chat("Make it shorter and punchier."))
print(chat("Now write 3 variations."))

🔨 Tool Use — Claude Calls Your Functions

Tool use (also called function calling) lets Claude trigger structured actions in your code. Instead of asking Claude to return JSON you parse manually, you define a function schema and Claude returns a precise, typed call you execute.

📝

You define tools JSON schema describing function name + parameters

→

🤖
Claude decidesReturns tool_use block with arguments

→

⚙️

You execute Run the function, get the result

→

✅
Claude respondsUses result to form final answer

python — tool use example

import anthropic, json

client = anthropic.Anthropic()

# 1. Define the tool Claude can call
tools = [{
    "name": "get_word_count",
    "description": "Count words in a piece of text",
    "input_schema": {
        "type": "object",
        "properties": {
            "text": {"type": "string", "description": "Text to count"}
        },
        "required": ["text"]
    }
}]

# 2. Your actual function
def get_word_count(text):
    return {"count": len(text.split()), "characters": len(text)}

# 3. Send request — Claude may call the tool
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    tools=tools,
    messages=[{"role": "user",
               "content": "How many words are in: 'The quick brown fox jumps over the lazy dog'?"}]
)

# 4. Handle tool call if Claude decided to use it
if response.stop_reason == "tool_use":
    tool_call = next(b for b in response.content if b.type == "tool_use")
    result = get_word_count(**tool_call.input)

    # 5. Send result back so Claude can answer
    final = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=256,
        tools=tools,
        messages=[
            {"role": "user", "content": "How many words are in: 'The quick brown fox...'?"},
            {"role": "assistant", "content": response.content},
            {"role": "user", "content": [{
                "type": "tool_result",
                "tool_use_id": tool_call.id,
                "content": json.dumps(result)
            }]}
        ]
    )
    print(final.content[0].text)

Real-world tool use patterns: search the web, query a database, call an external API, write/read files, send a notification. Claude decides when to call the tool — you decide what the tool does.

⚡ Streaming — Live Token-by-Token Output

Without streaming, your app waits silently until Claude finishes, then shows the full response at once. With streaming, tokens arrive as they're generated — your UI feels instant, even for long outputs.

WITHOUT STREAMING

User sees nothing for 3–10 seconds
Full response appears all at once
Feels slow even when it isn't
OK for backend pipelines, bad for UI

WITH STREAMING

First tokens appear in <1 second
Text builds in real time like Claude Code CLI
Feels fast and responsive
Essential for any user-facing chat UI

python — streaming

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a short poem about shipping software."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)  # prints each chunk as it arrives

# After loop: access the final complete message
final_message = stream.get_final_message()
print(f"\n\nTotal tokens: {final_message.usage.input_tokens + final_message.usage.output_tokens}")

In a web app: pipe the stream through a Server-Sent Events (SSE) endpoint. Your frontend reads chunks with EventSource or fetch with a ReadableStream and appends tokens to a <div>. This is exactly how claude.ai's chat interface works.

💾 Prompt Caching — Cut Costs by Up to 90%

If your app sends the same long system prompt (documentation, codebase context, instructions) with every request, you're paying to re-process that content every time. Prompt caching stores it at Anthropic — you pay 10% of normal input cost on cache hits.

Without caching — every request

System prompt (2000 tokens)$0.003

× 1000 requests = $3.00 just for system prompt

With caching — cache hit after first

System prompt (2000 tokens) — write$0.00375

System prompt — cache hit$0.0003

× 1000 requests = $0.30 (90% savings)

python — prompt caching

import anthropic

client = anthropic.Anthropic()

# Large system prompt (e.g. documentation, codebase, instructions)
SYSTEM_PROMPT = """You are an expert technical writer...
[imagine 2000+ tokens of detailed instructions here]
"""

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": SYSTEM_PROMPT,
        "cache_control": {"type": "ephemeral"}  # ← this is the only change
    }],
    messages=[{"role": "user", "content": "Summarize the authentication section."}]
)

# Check if cache was used
usage = response.usage
print(f"Cache write: {usage.cache_creation_input_tokens}")
print(f"Cache read:  {usage.cache_read_input_tokens}")

Cache TTL is 5 minutes. The cache expires if no request hits it for 5 minutes. For apps with low traffic or infrequent use, caching won't help much — the first request after each gap always pays the full write cost.

📦 Batch API — Process Thousands Async

Need to run Claude on 500 support tickets, 10,000 product descriptions, or a month of commit messages? Don't loop through them in real time. Use the Batch API: submit all requests at once, get results when they're ready, pay 50% less.

Cost: 50% discount vs standard API pricing — no other change required
Throughput: Up to 100,000 requests per batch
Latency: Results typically within 1 hour, up to 24 hours max
Use case: Offline processing, data pipelines, bulk generation — not real-time user interactions

python — batch API

import anthropic

client = anthropic.Anthropic()

# Build a list of requests — each has a custom_id for tracking
tickets = ["Login fails on Safari", "Export button does nothing", "Billing page 500 error"]

requests = [
    anthropic.types.beta.messages.MessageCreateParamsNonStreaming(
        custom_id=f"ticket-{i}",
        params={
            "model": "claude-haiku-4-5-20251001",
            "max_tokens": 128,
            "messages": [{
                "role": "user",
                "content": f"Classify this bug report as Critical/High/Medium/Low. Reply with one word.\n\n{ticket}"
            }]
        }
    )
    for i, ticket in enumerate(tickets)
]

# Submit — returns immediately with a batch ID
batch = client.beta.messages.batches.create(requests=requests)
print(f"Batch submitted: {batch.id} — status: {batch.processing_status}")

# Poll until done (in production, use a webhook or cron job)
import time
while batch.processing_status == "in_progress":
    time.sleep(10)
    batch = client.beta.messages.batches.retrieve(batch.id)

# Retrieve results
for result in client.beta.messages.batches.results(batch.id):
    severity = result.result.message.content[0].text
    print(f"{result.custom_id}: {severity}")

When to use Haiku for batch: Classification, tagging, extraction, summarization, sentiment — anything where you need a quick structured answer. Haiku + Batch = the cheapest possible way to run Claude at scale.

⚡ Live Sandbox — Practice API Concepts interactive

Click a quick command to simulate an API scenario

Claude API & SDK sandbox

Simulates API calls and responses. Click a quick command to start.

api $

🎯 Challenge

Each task below targets a specific API concept. Complete them in order — each one builds on the last.

1

First call. Install the Anthropic SDK for your preferred language. Write a script that sends one message asking Claude to list 5 use cases for the API (not the CLI). Print the response. Verify you see output.
2

Multi-turn chat. Extend your script into a 3-turn conversation. First ask for a product name. Then ask Claude to write a one-line tagline for it. Then ask for 3 variations. Print each response as it comes back.
3

Stream it. Take your multi-turn script and make the final message (the 3 variations) stream to the terminal in real time. You should see tokens printing character-by-character, not all at once.
4

Add a tool. Define a format_as_list tool that takes an array of strings and returns them as a numbered markdown list. Ask Claude to generate 5 tagline ideas and call the tool to format them. Print the final formatted result.

🏗 Mini Project — AI Writing Assistant

Build a command-line AI writing assistant that uses streaming output, maintains conversation history, and has one tool — a save_draft function that writes the last Claude response to a file.

WHAT YOU'LL BUILD

💬

Chat loop

Multi-turn history

⚡

Streaming

Real-time output

💾

save_draft tool

Write to file

💰

Token counter

Cost per message

Step 1 — Create the project and install the SDK Set up a folder, create a virtual environment, install anthropic, and store your API key.

What we're doing: Creating an isolated Python environment so this project's dependencies don't conflict with anything else on your machine. Then installing the SDK and securing your API key.

Terminal — macOS mkdir ~/Projects/ai-writer && cd ~/Projects/ai-writer
python3 -m venv venv
source venv/bin/activate
pip install anthropic
echo 'ANTHROPIC_API_KEY=sk-ant-your-key-here' > .env
echo '.env' >> .gitignore
# Replace sk-ant-your-key-here with your real key from console.anthropic.com

What's a virtual environment? A self-contained Python install just for this project. When you run source venv/bin/activate, your terminal uses this isolated Python — packages you install here don't affect the rest of your system.

Get your API key: Go to console.anthropic.com → API Keys → Create Key. Copy the full key (starts with sk-ant-) and paste it into your .env file. You only see it once.

Step 2 — Build the streaming chat loop Write writer.py — a loop that reads user input, streams Claude's reply, and maintains conversation history.

Step 3 — Add the save_draft tool Define a tool that lets Claude write the last response to a file when the user asks it to save.

Step 4 — Add prompt caching and update CLAUDE.md Cache the system prompt to save on repeated calls, then document the project.

← Module 06: Build Real Things Module 08: Power User →