The CLI is powerful. The API is limitless. This module teaches you when to switch, how to make your first API call, and how to use the features that aren't available in the CLI — tool use, streaming, prompt caching, and batch processing.
⏱ ~50 min
🏗 Project: AI writing assistant app
🔑 Requires API key
🤔 When to Use the CLI vs the API
Claude Code CLI is your default tool. The API is a step up — more power, more control, more code. Know when the upgrade is worth it.
Use the CLI when…
You're working interactively in a project
You need to read/write files, run shell commands
You're building workflows with hooks and agents
You want a conversation, not a one-shot call
The task is exploratory — you'll iterate
You need MCP server / plugin integrations
Use the API when…
You're building a product that calls Claude
You need streaming responses in a web UI
You need tool use / function calling
You're processing thousands of items (batch)
You want prompt caching for long system prompts
You need fine-grained token / cost control
The mental model: CLI = you talking to Claude. API = your app talking to Claude. If you're shipping something users will interact with, the API is almost always the right choice.
Good news: Everything you've learned in Modules 01–06 still applies. The API doesn't replace your CLAUDE.md workflow or your brief-first approach — it just gives you a programmable interface to the same underlying model.
API Capabilities at a Glance
Feature
What it does
CLI equivalent?
messages.create
Single turn or multi-turn conversation
✓ Yes (basic)
Tool use
Claude calls your functions with structured JSON
~ Partial (via hooks)
Streaming
Tokens arrive as they're generated — live UI updates
✓ Default in CLI
Prompt caching
Cache repeated system prompts — up to 90% cost reduction
✗ Not available
Batch API
Process 1000s of prompts async — 50% cost discount
✗ Not available
Token counting
Count tokens before sending — budget control
✗ Not available
Model selection
Choose Opus, Sonnet, or Haiku per call
✓ /model command
🔧 SDK Setup
The Anthropic SDK is available in Python and TypeScript/JavaScript. Pick the language that matches your project — the API surface is nearly identical in both.
terminal
# Install the SDK
pip install anthropic
# Set your API key (get it at console.anthropic.com)
export ANTHROPIC_API_KEY="sk-ant-..."
# Or add to .env file (use python-dotenv)
echo 'ANTHROPIC_API_KEY=sk-ant-...' >> .env
python — verify setup
importanthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=256,
messages=[{"role": "user", "content": "Say hello in one sentence."}]
)
print(message.content[0].text)
terminal
# Install the SDK
npm install @anthropic-ai/sdk
# Set your API key
export ANTHROPIC_API_KEY="sk-ant-..."
# Or use a .env file with dotenv
echo 'ANTHROPIC_API_KEY=sk-ant-...' >> .env
typescript — verify setup
import Anthropic from'@anthropic-ai/sdk';
const client = new Anthropic(); // reads ANTHROPIC_API_KEY from envconst message = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 256,
messages: [{ role: 'user', content: 'Say hello in one sentence.' }],
});
console.log(message.content[0].text);
Never hardcode your API key. Always read it from an environment variable or a .env file. Add .env to your .gitignore immediately — a leaked key gets invalidated and can rack up charges before you notice.
Which model to use?
claude-opus-4-7 — Most intelligent. Best for complex reasoning, nuanced writing, difficult code. Highest cost.
claude-sonnet-4-6 — Best balance. Excellent quality at moderate speed and cost. Default for most apps.
claude-haiku-4-5-20251001 — Fastest and cheapest. Great for classification, extraction, high-volume tasks.
Pattern: Use Haiku for cheap pre-processing (classify, filter, extract) and Sonnet for the final step (write, reason, synthesize). This combination cuts cost by 60–80% for pipeline-heavy apps.
📡 Anatomy of an API Call
Every call to messages.create follows the same structure. Understanding each parameter lets you control exactly what Claude does.
Set tone, role, constraints. Cached with prompt caching for big cost savings.
messages
Conversation history
Array of {role: "user" | "assistant", content: "..."} objects. Build multi-turn by appending.
max_tokens
Max tokens in the response
Controls cost and length. 1024 is a safe default. Output stops if reached — not an error.
tools
Functions Claude can call (optional)
Describe your functions in JSON schema. Claude returns structured calls for you to execute.
Multi-turn conversation pattern
python — multi-turn chat
importanthropic
client = anthropic.Anthropic()
history = []
defchat(user_message):
history.append({"role": "user", "content": user_message})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a concise writing assistant. Keep replies under 3 sentences.",
messages=history
)
assistant_msg = response.content[0].text
history.append({"role": "assistant", "content": assistant_msg})
return assistant_msg
# Start chattingprint(chat("Write a tagline for a SaaS invoicing app."))
print(chat("Make it shorter and punchier."))
print(chat("Now write 3 variations."))
typescript — multi-turn chat
import Anthropic from'@anthropic-ai/sdk';
const client = new Anthropic();
const history: Anthropic.MessageParam[] = [];
async functionchat(userMessage: string) {
history.push({ role: 'user', content: userMessage });
const response = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
system: 'You are a concise writing assistant. Keep replies under 3 sentences.',
messages: history,
});
const assistantMsg = response.content[0].text;
history.push({ role: 'assistant', content: assistantMsg });
return assistantMsg;
}
// Start chatting
console.log(await chat('Write a tagline for a SaaS invoicing app.'));
console.log(await chat('Make it shorter and punchier.'));
console.log(await chat('Now write 3 variations.'));
🔨 Tool Use — Claude Calls Your Functions
Tool use (also called function calling) lets Claude trigger structured actions in your code. Instead of asking Claude to return JSON you parse manually, you define a function schema and Claude returns a precise, typed call you execute.
📝
You define toolsJSON schema describing function name + parameters
→
🤖
Claude decidesReturns tool_use block with arguments
→
⚙️
You executeRun the function, get the result
→
✅
Claude respondsUses result to form final answer
python — tool use example
importanthropic, json
client = anthropic.Anthropic()
# 1. Define the tool Claude can call
tools = [{
"name": "get_word_count",
"description": "Count words in a piece of text",
"input_schema": {
"type": "object",
"properties": {
"text": {"type": "string", "description": "Text to count"}
},
"required": ["text"]
}
}]
# 2. Your actual functiondefget_word_count(text):
return {"count": len(text.split()), "characters": len(text)}
# 3. Send request — Claude may call the tool
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
tools=tools,
messages=[{"role": "user",
"content": "How many words are in: 'The quick brown fox jumps over the lazy dog'?"}]
)
# 4. Handle tool call if Claude decided to use itif response.stop_reason == "tool_use":
tool_call = next(b for b in response.content if b.type == "tool_use")
result = get_word_count(**tool_call.input)
# 5. Send result back so Claude can answer
final = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=256,
tools=tools,
messages=[
{"role": "user", "content": "How many words are in: 'The quick brown fox...'?"},
{"role": "assistant", "content": response.content},
{"role": "user", "content": [{
"type": "tool_result",
"tool_use_id": tool_call.id,
"content": json.dumps(result)
}]}
]
)
print(final.content[0].text)
typescript — tool use example
import Anthropic from'@anthropic-ai/sdk';
const client = new Anthropic();
// 1. Define the tool Claude can callconst tools: Anthropic.Tool[] = [{
name: 'get_word_count',
description: 'Count words in a piece of text',
input_schema: {
type: 'object',
properties: {
text: { type: 'string', description: 'Text to count' }
},
required: ['text']
}
}];
// 2. Your actual functionfunction getWordCount(text: string) {
return { count: text.split(' ').length, characters: text.length };
}
// 3. Send requestconst response = await client.messages.create({
model: 'claude-sonnet-4-6', max_tokens: 512, tools,
messages: [{ role: 'user', content: 'How many words in: "The quick brown fox..."?' }],
});
// 4. Handle tool callif (response.stop_reason === 'tool_use') {
const toolCall = response.content.find(b => b.type === 'tool_use') as Anthropic.ToolUseBlock;
const result = getWordCount((toolCall.input as any).text);
// 5. Send result backconst final = await client.messages.create({
model: 'claude-sonnet-4-6', max_tokens: 256, tools,
messages: [
{ role: 'user', content: 'How many words in: "The quick brown fox..."?' },
{ role: 'assistant', content: response.content },
{ role: 'user', content: [{ type: 'tool_result', tool_use_id: toolCall.id, content: JSON.stringify(result) }] }
],
});
console.log(final.content[0].text);
}
Real-world tool use patterns: search the web, query a database, call an external API, write/read files, send a notification. Claude decides when to call the tool — you decide what the tool does.
⚡ Streaming — Live Token-by-Token Output
Without streaming, your app waits silently until Claude finishes, then shows the full response at once. With streaming, tokens arrive as they're generated — your UI feels instant, even for long outputs.
WITHOUT STREAMING
User sees nothing for 3–10 seconds
Full response appears all at once
Feels slow even when it isn't
OK for backend pipelines, bad for UI
WITH STREAMING
First tokens appear in <1 second
Text builds in real time like Claude Code CLI
Feels fast and responsive
Essential for any user-facing chat UI
python — streaming
importanthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a short poem about shipping software."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True) # prints each chunk as it arrives# After loop: access the final complete message
final_message = stream.get_final_message()
print(f"\n\nTotal tokens: {final_message.usage.input_tokens + final_message.usage.output_tokens}")
typescript — streaming (server-sent events)
import Anthropic from'@anthropic-ai/sdk';
const client = new Anthropic();
// stream.text_stream is an async iterableconst stream = client.messages.stream({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Write a short poem about shipping software.' }],
});
for await (const chunk of stream.text_stream) {
process.stdout.write(chunk); // prints each token as it arrives
}
const finalMessage = await stream.finalMessage();
console.log(`\nTotal tokens: ${finalMessage.usage.input_tokens + finalMessage.usage.output_tokens}`);
In a web app: pipe the stream through a Server-Sent Events (SSE) endpoint. Your frontend reads chunks with EventSource or fetch with a ReadableStream and appends tokens to a <div>. This is exactly how claude.ai's chat interface works.
💾 Prompt Caching — Cut Costs by Up to 90%
If your app sends the same long system prompt (documentation, codebase context, instructions) with every request, you're paying to re-process that content every time. Prompt caching stores it at Anthropic — you pay 10% of normal input cost on cache hits.
Without caching — every request
System prompt (2000 tokens)$0.003
System prompt (2000 tokens)$0.003
System prompt (2000 tokens)$0.003
× 1000 requests = $3.00 just for system prompt
With caching — cache hit after first
System prompt (2000 tokens) — write$0.00375
System prompt — cache hit$0.0003
System prompt — cache hit$0.0003
× 1000 requests = $0.30 (90% savings)
python — prompt caching
importanthropic
client = anthropic.Anthropic()
# Large system prompt (e.g. documentation, codebase, instructions)
SYSTEM_PROMPT = """You are an expert technical writer...
[imagine 2000+ tokens of detailed instructions here]
"""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[{
"type": "text",
"text": SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"} # ← this is the only change
}],
messages=[{"role": "user", "content": "Summarize the authentication section."}]
)
# Check if cache was used
usage = response.usage
print(f"Cache write: {usage.cache_creation_input_tokens}")
print(f"Cache read: {usage.cache_read_input_tokens}")
Cache TTL is 5 minutes. The cache expires if no request hits it for 5 minutes. For apps with low traffic or infrequent use, caching won't help much — the first request after each gap always pays the full write cost.
📦 Batch API — Process Thousands Async
Need to run Claude on 500 support tickets, 10,000 product descriptions, or a month of commit messages? Don't loop through them in real time. Use the Batch API: submit all requests at once, get results when they're ready, pay 50% less.
Cost: 50% discount vs standard API pricing — no other change required
Throughput: Up to 100,000 requests per batch
Latency: Results typically within 1 hour, up to 24 hours max
Use case: Offline processing, data pipelines, bulk generation — not real-time user interactions
python — batch API
importanthropic
client = anthropic.Anthropic()
# Build a list of requests — each has a custom_id for tracking
tickets = ["Login fails on Safari", "Export button does nothing", "Billing page 500 error"]
requests = [
anthropic.types.beta.messages.MessageCreateParamsNonStreaming(
custom_id=f"ticket-{i}",
params={
"model": "claude-haiku-4-5-20251001",
"max_tokens": 128,
"messages": [{
"role": "user",
"content": f"Classify this bug report as Critical/High/Medium/Low. Reply with one word.\n\n{ticket}"
}]
}
)
for i, ticket in enumerate(tickets)
]
# Submit — returns immediately with a batch ID
batch = client.beta.messages.batches.create(requests=requests)
print(f"Batch submitted: {batch.id} — status: {batch.processing_status}")
# Poll until done (in production, use a webhook or cron job)import time
while batch.processing_status == "in_progress":
time.sleep(10)
batch = client.beta.messages.batches.retrieve(batch.id)
# Retrieve resultsfor result in client.beta.messages.batches.results(batch.id):
severity = result.result.message.content[0].text
print(f"{result.custom_id}: {severity}")
When to use Haiku for batch: Classification, tagging, extraction, summarization, sentiment — anything where you need a quick structured answer. Haiku + Batch = the cheapest possible way to run Claude at scale.
⚡Live Sandbox — Practice API Conceptsinteractive
Click a quick command to simulate an API scenario
Claude API & SDK sandbox
Simulates API calls and responses. Click a quick command to start.
api $
🎯 Challenge
Each task below targets a specific API concept. Complete them in order — each one builds on the last.
1
First call. Install the Anthropic SDK for your preferred language. Write a script that sends one message asking Claude to list 5 use cases for the API (not the CLI). Print the response. Verify you see output.
2
Multi-turn chat. Extend your script into a 3-turn conversation. First ask for a product name. Then ask Claude to write a one-line tagline for it. Then ask for 3 variations. Print each response as it comes back.
3
Stream it. Take your multi-turn script and make the final message (the 3 variations) stream to the terminal in real time. You should see tokens printing character-by-character, not all at once.
4
Add a tool. Define a format_as_list tool that takes an array of strings and returns them as a numbered markdown list. Ask Claude to generate 5 tagline ideas and call the tool to format them. Print the final formatted result.
🏗 Mini Project — AI Writing Assistant
Build a command-line AI writing assistant that uses streaming output, maintains conversation history, and has one tool — a save_draft function that writes the last Claude response to a file.
WHAT YOU'LL BUILD
💬
Chat loop
Multi-turn history
⚡
Streaming
Real-time output
💾
save_draft tool
Write to file
💰
Token counter
Cost per message
Step 1 — Create the project and install the SDKSet up a folder, create a virtual environment, install anthropic, and store your API key.
What we're doing: Creating an isolated Python environment so this project's dependencies don't conflict with anything else on your machine. Then installing the SDK and securing your API key.
Terminal — macOS
mkdir ~/Projects/ai-writer && cd ~/Projects/ai-writer
python3 -m venv venv
source venv/bin/activate
pip install anthropic
echo 'ANTHROPIC_API_KEY=sk-ant-your-key-here' > .env
echo '.env' >> .gitignore # Replace sk-ant-your-key-here with your real key from console.anthropic.com
Terminal — Windows (PowerShell)
mkdir $HOME\Projects\ai-writer; cd $HOME\Projects\ai-writer
python -m venv venv
venv\Scripts\Activate.ps1
pip install anthropic
'ANTHROPIC_API_KEY=sk-ant-your-key-here' | Out-File .env
'venv/' | Out-File .gitignore; '.env' | Add-Content .gitignore # Replace with your real key from console.anthropic.com
Terminal — Linux
mkdir -p ~/Projects/ai-writer && cd ~/Projects/ai-writer
python3 -m venv venv
source venv/bin/activate
pip install anthropic
echo 'ANTHROPIC_API_KEY=sk-ant-your-key-here' > .env
echo '.env' >> .gitignore # Replace with your real key from console.anthropic.com
What's a virtual environment? A self-contained Python install just for this project. When you run source venv/bin/activate, your terminal uses this isolated Python — packages you install here don't affect the rest of your system.
Get your API key: Go to console.anthropic.com → API Keys → Create Key. Copy the full key (starts with sk-ant-) and paste it into your .env file. You only see it once.
Step 2 — Build the streaming chat loopWrite writer.py — a loop that reads user input, streams Claude's reply, and maintains conversation history.
What we're doing: The core of the app — a loop that reads user input, sends it to Claude (with the full conversation history), streams the response live, and appends both sides to the history for context.
Ask Claude to write this file for you
Write writer.py — a Python CLI writing assistant using the Anthropic SDK.
Requirements:
- Read ANTHROPIC_API_KEY from .env using python-dotenv (install it too)
- System prompt: "You are a concise writing assistant. Help the user write, refine, and improve text."
- Main loop: read input from terminal, send to claude-sonnet-4-6 with full history, stream response token by token
- After each streamed response, print token usage (input + output) in dim text
- Type 'quit' to exit the loop
- Store history as a list and append both user and assistant messages after each turn
After Claude writes it, run it to test:
Terminal
pip install python-dotenv
python writer.py > Write a one-paragraph intro for a blog post about prompt engineering. # You should see Claude's response stream in character by character
Why stream? Without streaming, the script would hang silently for 2–5 seconds then dump the full text. With streaming, you see words appearing instantly — it's the same feeling as using Claude Code in the terminal.
Step 3 — Add the save_draft toolDefine a tool that lets Claude write the last response to a file when the user asks it to save.
What we're doing: Adding a tool so the user can say "save this as intro.md" and Claude will call save_draft with the right filename and content. Your code executes the actual file write.
Ask Claude to update writer.py
Update writer.py to add a save_draft tool:
- Tool name: save_draft
- Tool description: "Save text content to a file in the drafts/ folder"
- Parameters: filename (string), content (string)
- Implementation: create drafts/ if it doesn't exist, write content to drafts/filename
- Handle the tool_use stop_reason: execute the save, print a confirmation line, then continue the conversation
- Update the system prompt to say Claude can save drafts when the user asks
Test it by asking the assistant to write something, then saying "save this as draft.md":
Test in your running app> Write a 3-sentence product description for a password manager app. # Claude writes it... > Save that as product-desc.md ✓ Saved to drafts/product-desc.md
How tool use works here: When you say "save that," Claude returns a tool_use block instead of text. Your Python code detects this, calls your save_draft() function, then sends the result back so Claude can confirm with natural language.
If Claude doesn't call the tool: Try being more explicit — "Use the save_draft tool to save this as draft.md." Sometimes Claude needs a nudge if the intent isn't clear.
Step 4 — Add prompt caching and update CLAUDE.mdCache the system prompt to save on repeated calls, then document the project.
What we're doing: A one-line change that cuts the system prompt cost on every call after the first. Then documenting the project so future sessions start with full context.
Ask Claude to update writer.py
Update writer.py: change the system prompt to use prompt caching.
Instead of a plain string, pass system as a list:
[{"type": "text", "text": SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"}}]
Print cache_read_input_tokens and cache_creation_input_tokens in the usage line.
After updating, run the app again and make a few requests. The second request onward should show cache_read_input_tokens > 0.
Ask Claude to create CLAUDE.md
Create CLAUDE.md for this project documenting:
- What writer.py does and how to run it
- The save_draft tool: how it works, where files go
- API key setup (reference .env, never hardcode)
- How to extend: adding new tools, changing the system prompt, swapping models
You now have a working AI app. A streaming multi-turn assistant with tool use and prompt caching — built entirely with Claude prompts. The same patterns scale to web servers, browser extensions, Slack bots, and production applications.