Single source

Content library

This is the only reading surface. Every domain is grouped here with the original study guide, practice questions, flashcards, and the official Anthropic source links that back it up.

5 domain packs 15 source files 26 study entries

Open practice assessment Jump to Domain 1

How to use this page

Read one domain section at a time, then move straight into assessment when you want to test yourself.

Study guide

The README for each domain is the main explanation layer.

Questions

The question file gives recall and decision practice for the same domain.

Flashcards

The flashcards are compact memory cues for review sessions.

Domain map

Everything is divided by domain, with the source material underneath each heading.

Domain 1 Domain 2 Domain 3 Domain 4 Domain 5

Domain 1: Core Concepts

Model choice, sampling, prompt structure, and safety basics.

Why it matters

Use this pack to understand the controls that show up in many exam questions.

Focus areas

Model selection
Temperature and sampling
Prompt roles
Token and context limits

Common mistakes

Mixing up tokens and words
Using the wrong model for the task
Putting rules in the user message

Quick recall

Opus for hardest reasoning
Sonnet as the default balance
Haiku for speed and cost

Practice assessment Test this domain in the exam-style flow. Back to top Jump back to the domain map.

Study guide

Domain 1: Core AI Concepts & Claude Fundamentals

What this is

This domain is the foundation layer: model choice, token limits, context windows, prompt structure, sampling controls, and the behavior rules that make Claude useful and safe.

Why it matters on the exam

Most questions in this domain are really asking, “Can you choose the right default and explain the tradeoff?” If you know what each control changes, the answer becomes obvious.

What to focus on

Model choice: quality, balance, or speed/cost.
Sampling: how much variation you want in the output.
Prompt roles: what belongs in system instructions versus user input.
Context: what Claude can actually see at one time.
Safety: how Claude should respond when a request is risky or unclear.

Key decisions and tradeoffs

Use Opus when reasoning quality matters most.
Use Sonnet when you want the best general-purpose default.
Use Haiku when speed and cost matter more than depth.
Use low temperature for stable, repeatable outputs.
Use higher temperature for brainstorming or exploration.
Use system messages for rules and tone, and user messages for the task itself.

Common mistakes

Treating tokens as if they were the same as words.
Picking a lightweight model for a task that needs careful reasoning.
Putting behavior rules in the wrong message role.
Assuming higher temperature always improves results.
Forgetting that uncertainty should be acknowledged, not hidden.

Scenario examples

Choosing a model

You need a contract summary that will be reviewed by a legal team. The safest study answer is the model with the strongest reasoning, not the cheapest one.

Choosing randomness

You are generating code or configuration and want the output to stay consistent. Lower temperature is the right answer because you want less variation.

Structuring a prompt

You want a strict JSON answer with a fixed format. Put the format rule in the system message, then ask the actual question in the user message.

Handling uncertainty

The model cannot verify a detail from the provided context. The correct behavior is to say so and ask for more information or cite what is available.

Quick recall

Tokens power pricing and context.
Context window is the total input and output budget.
Temperature controls randomness.
Top-K limits the candidate set by count.
Top-P limits the candidate set by probability mass.
System messages set behavior.
User messages set the task.
Constitutional AI emphasizes harmlessness, honesty, and helpfulness.

Compare notes

Token vs word: tokens are the unit the model uses; words are only an approximation.
Temperature vs Top-P: temperature changes randomness; Top-P changes how wide the candidate pool is.
System vs user: system sets rules, user asks for work.
Opus vs Sonnet vs Haiku: strongest reasoning vs balanced default vs fastest/cheapest.

Official references

Study move

For any exam-style question, ask:

What control or model is being tested?
What tradeoff is most important?
What would go wrong if I chose the wrong setting?
What is the simplest correct rule to remember?

Flashcards

Domain 1: Flashcards

Model Choice

Q: Best model for the hardest reasoning tasks?
A: Claude Opus.

Q: Best model for balanced everyday use?
A: Claude Sonnet.

Q: Best model for speed and cost?
A: Claude Haiku.

Q: When is Opus worth the cost?
A: When reasoning quality matters most.

Q: When is Haiku the right choice?
A: When volume and latency matter most.

Tokens and Context

Q: What is a token?
A: The unit Claude uses for text and pricing.

Q: Roughly how many words is one token?
A: About three quarters of a word.

Q: What is the context window?
A: The total input plus output budget.

Q: Why do tokens matter?
A: They control cost and how much the model can see.

Sampling

Q: What does low temperature do?
A: Makes output more deterministic.

Q: What does high temperature do?
A: Makes output more varied.

Q: What does Top-K control?
A: The number of candidate tokens.

Q: What does Top-P control?
A: The probability mass of candidate tokens.

Prompt Structure

Q: What does the system message do?
A: Sets behavior and rules.

Q: What does the user message do?
A: States the actual task.

Q: What does the assistant message do?
A: Holds previous model replies.

Q: Where should output format rules go?
A: In the system message.

Prompt Patterns

Q: What is few-shot prompting?
A: Teaching by example.

Q: What is role prompting?
A: Giving the model a persona.

Q: What is structured output prompting?
A: Asking for JSON or another strict format.

Q: What is chain-of-thought prompting?
A: Asking the model to reason step by step.

Claude Safety

Q: What are the three principles of Constitutional AI?
A: Harmlessness, Honesty, Helpfulness.

Q: What should Claude do when unsure?
A: Say so honestly.

Q: What does harmlessness mean?
A: Refuse harmful requests.

Q: What is hallucination?
A: Confidently wrong output.

Capabilities

Q: What lets Claude interact with external systems?
A: Tool use.

Q: Do newer Claude models support images?
A: Yes.

Q: Can Claude browse the web by itself?
A: No, not natively.

Q: Does Claude remember every session automatically?
A: No.

Fast Review

Q: Best default for most apps?
A: Sonnet.

Q: Best choice for creative exploration?
A: Higher temperature.

Q: Best choice for reproducible output?
A: Lower temperature.

Q: Best study mantra for uncertainty?
A: Verify instead of inventing.

Practice questions

Domain 1: Practice Questions

Core AI Concepts & Claude Fundamentals

Question 1

A legal team needs the strongest reasoning possible for a complex document review and cost is secondary. Which model is the best default choice?

A) Claude Haiku B) Claude Sonnet C) Claude Opus D) Any model with a higher temperature

Answer: C Explanation: Opus is the safest choice when quality and reasoning depth matter most.

Question 2

Your team wants repeatable code generation with minimal variation across runs. Which control should you lower first?

A) Top-P B) Temperature C) Context window D) Token count

Answer: B Explanation: Lower temperature reduces randomness and makes output more consistent.

Question 3

A prompt is failing because the input document and the expected output are both long. What should you think about first?

A) Tokenization B) Context window C) Temperature D) System prompt

Answer: B Explanation: The context window is the total token budget for input plus output.

Question 4

Which statement best describes a token?

A) It is exactly one word B) It is the unit Claude uses for pricing and context C) It is only used for authentication D) It is the same thing as a response message

Answer: B Explanation: Tokens are the basic unit used for billing and context management.

Question 5

You want Claude to obey strict behavior and formatting rules. Where should those rules live?

A) User message B) Assistant message C) System message D) Tool result

Answer: C Explanation: The system message sets the behavior, role, and constraints.

Question 6

You want Claude to imitate a response format by example. Which prompting pattern fits best?

A) Few-shot prompting B) Sampling C) Tokenization D) Temperature tuning

Answer: A Explanation: Few-shot prompting uses examples to guide style, structure, and output shape.

Question 7

Claude is unsure about a fact from the provided context. What is the safest exam answer?

A) Invent a plausible answer B) Act confident and continue C) Acknowledge uncertainty and verify D) Repeat the same claim more forcefully

Answer: C Explanation: Honesty means acknowledging uncertainty instead of fabricating details.

Question 8

What is the best description of Top-K sampling?

A) It chooses tokens until a probability threshold is reached B) It limits the candidate set to the K most likely tokens C) It removes all randomness D) It changes the system prompt

Answer: B Explanation: Top-K bounds the candidate pool by count, not probability mass.

Question 9

Which choice best fits a high-volume, cost-sensitive, near-real-time workload?

A) Opus B) Sonnet C) Haiku D) The highest temperature setting

Answer: C Explanation: Haiku is the fastest and most cost-effective option for lighter workloads.

Question 10

What is the main purpose of Constitutional AI in a study answer?

A) Make the model generate longer answers B) Improve pricing consistency C) Keep the model helpful, honest, and harmless D) Increase the context window

Answer: C Explanation: Constitutional AI is the safety and behavior framework behind Claude’s responses.

Domain 2: API & Integration

Request flow, streaming, tools, retries, caching, and secure integration.

Why it matters

This is the most applied part of the study site and the one most likely to benefit from repeated review.

Focus areas

Messages API basics
Streaming and tool use
Retry behavior
Prompt caching and rate limits

Common mistakes

Exposing keys in the browser
Retrying 429s immediately
Treating tool_use like a final answer

Quick recall

Keep calls server-side
Use backoff for rate limits
Streaming improves responsiveness

Practice assessment Test this domain in the exam-style flow. Back to top Jump back to the domain map.

Study guide

Domain 2: Claude API & Integration

What this is

This domain covers the request lifecycle: authentication, request shape, streaming, tool use, structured responses, retries, rate limits, and secure server-side integration.

Why it matters on the exam

Questions here usually ask what the application should do next. You need to know the API flow well enough to pick the safe, correct response under pressure.

What to focus on

The minimum request shape for a Messages API call.
How system instructions differ from conversation messages.
What streaming changes for the user experience.
What tool use means for your application logic.
Which failures should be retried versus fixed.
How to protect secrets and limit abuse.

Key decisions and tradeoffs

Use the Messages API for standard conversational interactions.
Keep the system prompt separate from the message history.
Use streaming when responsiveness matters.
Use tool use when Claude must trigger an external action.
Retry rate limits and transient failures with backoff.
Use prompt caching when repeated instructions or context are stable.
Keep API calls server-side so secrets never reach the browser.

Common mistakes

Forgetting required request fields.
Mixing up system instructions and user content.
Handling a tool request like a normal final answer.
Retrying a rate-limited call immediately.
Exposing keys in client-side code.
Assuming streaming changes the model rather than the delivery pattern.

Scenario examples

Streaming

A user is waiting on a long answer and you want the UI to feel alive. Streaming is the right answer because it improves perceived latency.

Tool use

Claude needs live data from a database or external service. The application should execute the tool, return the result, and let Claude continue.

Error handling

The API returns 429. The best exam answer is exponential backoff, not a retry loop that makes the problem worse.

Security

You are building a browser app. Keep the Claude call on the server and protect the key with environment variables or secret storage.

Quick recall

Base URL: https://api.anthropic.com
Required fields: model, max_tokens, messages
anthropic-version belongs in the request headers.
Streaming uses server-sent events.
Tool results flow back through the conversation.
429 means rate limit.
invalid_request_error usually means fix the request.

Compare notes

Streaming vs non-streaming: same model, different delivery.
System vs messages: rules versus conversation history.
Tool use vs plain completion: application action is required before the final answer.
Retryable vs non-retryable: 429 and many transient 5xx errors may be retried; bad requests should be corrected.

Official references

Study move

When you read the docs, extract:

the required request fields
the response events you must handle
the failures you should retry
the security rule that protects the app

Flashcards

Domain 2: Flashcards

Request Basics

Q: Base URL for the Claude API?
A: https://api.anthropic.com

Q: Required version header?
A: anthropic-version

Q: Required output limit field?
A: max_tokens

Q: Where does the system prompt live?
A: Top-level system

Q: Which message role is the user?
A: user

Auth and Security

Q: Where should the API key never go?
A: Client-side code.

Q: Safe place for secrets?
A: Environment variables or secret storage.

Q: What does a 401 usually mean?
A: Bad or missing auth.

Q: What does a 429 usually mean?
A: Rate limit hit.

Streaming

Q: What format does streaming use?
A: Server-sent events.

Q: Why use streaming?
A: Better time to first token.

Q: Common stream event?
A: message_start

Q: Streamed text arrives in what kind of event?
A: content_block_delta

Tool Use

Q: What does tool_use mean?
A: Claude wants the app to run a tool.

Q: What should the app do after tool_use?
A: Execute and return tool_result.

Q: Tool schema format?
A: JSON Schema.

Q: Tool definitions usually include what three things?
A: Name, description, input schema.

Errors and Retry

Q: What error should usually be retried with backoff?
A: 429.

Q: Retry pattern for transient failures?
A: Exponential backoff.

Q: What error usually means the request itself is broken?
A: invalid_request_error

Q: What should you do with 4xx errors besides 429?
A: Fix the request.

Caching and Cost

Q: Why use prompt caching?
A: Lower cost for repeated prompt material.

Q: What is usually more expensive, input or output tokens?
A: Output.

Q: When is prompt caching useful?
A: Stable instructions and examples.

Q: What should you optimize first if cost is high?
A: Prompt size, model choice, caching.

Memory Hooks

Q: Streaming helps with what user perception?
A: Responsiveness.

Q: Tool use helps with what?
A: Actions outside the model.

Q: Backoff helps with what?
A: Retry safety.

Q: Server-side calls help with what?
A: Secret protection.

Practice questions

Domain 2: Practice Questions

Claude API & Integration

Question 1

Your app must call Claude from a browser-based UI. What is the safest architecture choice?

A) Put the API key in client-side JavaScript B) Call Claude directly from the browser C) Keep the API call on the server D) Disable authentication

Answer: C Explanation: API keys should never be exposed in client-side code.

Question 2

Which request field is required for a Messages API call?

A) temperature B) system C) max_tokens D) top_p

Answer: C Explanation: max_tokens is required; the others are optional controls.

Question 3

Claude returns tool_use. What should your application do next?

A) Ignore it and wait B) Execute the requested tool and return the result C) Immediately retry the request D) Change the model

Answer: B Explanation: tool_use means the application must act before Claude can finish the answer.

Question 4

The API returns 429 during a traffic spike. What is the best next step?

A) Retry immediately in a loop B) Retry later with exponential backoff C) Delete the key D) Switch to a higher temperature

Answer: B Explanation: 429 is a rate-limit response; backoff is the correct retry pattern.

Question 5

Which location is correct for the system prompt?

A) Inside the messages array B) As a top-level system field C) Inside a tool result D) Inside the user message text

Answer: B Explanation: The system prompt is separate from the conversation array.

Question 6

You want the UI to show partial answer text as it arrives. Which API feature fits best?

A) Prompt caching B) Streaming C) Retry logic D) Batch processing

Answer: B Explanation: Streaming improves perceived latency and updates the UI progressively.

Question 7

Which control is most appropriate for reducing cost on repeated instructions and examples?

A) Prompt caching B) Higher temperature C) More retries D) Larger output limits

Answer: A Explanation: Prompt caching reduces the cost of repeated prompt material.

Question 8

Which error type usually means the request itself needs to be fixed rather than retried?

A) invalid_request_error B) rate_limit_error C) api_error D) Temporary network timeout

Answer: A Explanation: Invalid request errors point to bad parameters, malformed JSON, or missing required fields.

Question 9

Your assistant must answer in a strict machine-readable format. Where should that requirement live?

A) system B) messages[0] C) The final assistant response only D) The browser UI

Answer: A Explanation: System instructions are the best place for behavior and output-format rules.

Question 10

Which retry pattern is best for transient failures?

A) Immediate repeated requests B) Exponential backoff C) Hard fail every time D) Switch to a different user message

Answer: B Explanation: Exponential backoff reduces pressure on the service and gives transient issues time to clear.

Domain 3: Architecture & Design

RAG, agents, multi-agent coordination, context management, and orchestration.

Why it matters

This pack helps you choose between multiple viable designs instead of memorizing feature names.

Focus areas

RAG vs agents
Multi-agent coordination
Context control
Caching and batching

Common mistakes

Using an agent when a single turn works
Choosing complexity before the problem is clear

Quick recall

Use RAG for grounding
Use agents for multi-step action
Use batching for throughput

Practice assessment Test this domain in the exam-style flow. Back to top Jump back to the domain map.

Study guide

Domain 3: Architecture & Design Patterns

What this is

This domain is about system shape: RAG, agents, multi-agent coordination, context management, prompt caching, streaming, batching, and observability.

Why it matters on the exam

The exam often gives you a constraint and asks you to choose the best architecture. The skill here is not memorizing buzzwords; it is matching the pattern to the problem.

What to focus on

When retrieval is better than reasoning.
When a simple request flow is enough.
When an agent is justified.
When specialization should become multi-agent coordination.
How to manage long or growing context.
Where caching, streaming, batching, and observability fit.

Key decisions and tradeoffs

Use RAG when answers must stay grounded in external knowledge.
Use an agent pattern when the system must reason and act over multiple steps.
Use multi-agent coordination when specialization improves the outcome.
Use summarization or sliding windows when context is getting too large.
Use prompt caching when stable instructions or examples repeat often.
Use streaming for responsiveness.
Use batching for throughput.

Common mistakes

Choosing RAG when the real problem is orchestration.
Using an agent when a single turn would be simpler.
Treating multi-agent as automatically better.
Confusing prompt caching with response caching.
Ignoring the complexity cost of adding more moving parts.

Scenario examples

RAG versus an agent

You need answers from a private knowledge base. If the problem is grounding, choose RAG. If the problem includes deciding which tools to use, the better answer is an agent flow.

Context management

The conversation is getting too long to fit comfortably. Summarize older material, retain key facts, or retrieve only what is needed instead of dropping everything blindly.

Multi-agent coordination

A workflow needs one specialist for policy, one for code, and one for summarization. The key idea is specialization plus synthesis, not just “more agents.”

Scaling choice

Requests are independent and the bottleneck is throughput. Horizontal scaling or batching is the more relevant architectural decision.

Quick recall

RAG = retrieve and ground.
Agent = reason, act, observe.
Multi-agent = specialize, then coordinate.
Prompt caching = reuse stable prompt material.
Streaming = better perceived speed.
Batching = throughput over latency.
Observability = logs, metrics, traces.

Compare notes

RAG vs agent: grounded retrieval versus decision-making and action.
Single-agent vs multi-agent: simplicity versus specialization.
Streaming vs batch: responsiveness versus throughput.
Summarization vs sliding window: compress versus truncate.

Official references

Study move

For every pattern, ask:

What problem does it solve?
What does it cost?
What failure mode is it preventing?
What simpler pattern might the exam expect me to overlook?

Flashcards

Domain 3: Flashcards

Core Patterns

Q: What does RAG do?
A: Retrieves documents and grounds the answer.

Q: What does an agent do?
A: Reasons, acts, and observes over multiple steps.

Q: What does multi-agent architecture add?
A: Specialization plus coordination.

Q: What does context management protect?
A: The token budget and useful history.

Context Strategies

Q: Sliding window means what?
A: Keep the newest context only.

Q: Summarization means what?
A: Compress older context.

Q: Selective retention means what?
A: Keep important pieces, drop the rest.

Q: RAG for context means what?
A: Retrieve context on demand.

RAG Building Blocks

Q: Why chunk documents?
A: To make them retrievable.

Q: Semantic chunking does what?
A: Preserves natural meaning boundaries.

Q: Fixed-size chunking does what?
A: Splits by token length.

Q: What stores embeddings?
A: A vector database.

Agent Workflows

Q: ReAct means what?
A: Reason + Act loop.

Q: Plan-and-execute means what?
A: Plan first, then execute.

Q: Reflexion means what?
A: Learn from mistakes and adjust.

Q: Tool-use workflow means what?
A: Model chooses a tool, app runs it.

Scaling and Operations

Q: Horizontal scaling means what?
A: More instances.

Q: Streaming integration means what?
A: Show output as it arrives.

Q: Batch processing means what?
A: Trade latency for throughput.

Q: Prompt caching helps with what?
A: Repeated prompt cost.

Design Principles

Q: Separation of concerns means what?
A: Split logic into clear layers.

Q: Observability means what?
A: Logs, metrics, traces.

Q: Fail-safe design means what?
A: Graceful fallback.

Q: Testability means what?
A: Easy to verify behavior.

Memory Hooks

Q: RAG = ?
A: Retrieve, then ground.

Q: Agent = ?
A: Decide, act, observe.

Q: Multi-agent = ?
A: Specialize, coordinate, synthesize.

Q: Streaming = ?
A: Faster perceived response.

Practice questions

Domain 3: Practice Questions

Architecture & Design Patterns

Question 1

Your product needs answers grounded in a private document set that changes over time. Which pattern fits best?

A) RAG B) Static prompt templates C) Tokenization D) Model version pinning

Answer: A Explanation: RAG retrieves current source material and injects it into the prompt so responses stay grounded.

Question 2

Which problem is the best use case for an agent pattern?

A) Return a fixed summary from a short prompt B) Execute a single deterministic conversion C) Solve a multi-step task that requires decision-making and tool use D) Format a JSON response once

Answer: C Explanation: Agents are useful when the system must reason, act, and observe across multiple steps.

Question 3

When would multi-agent coordination be the better architecture?

A) When the work is simple and fully deterministic B) When specialized subproblems benefit from different expertise C) When you want fewer moving parts D) When you want to remove all orchestration

Answer: B Explanation: Multi-agent systems shine when specialization is useful and the coordinator can synthesize the output.

Question 4

Your context is getting too large to keep every past message. What is a reasonable strategy?

A) Ignore the issue B) Use summarization or selective retention C) Double the temperature D) Remove the system prompt

Answer: B Explanation: Context management strategies help retain useful information while staying inside token limits.

Question 5

What is the main purpose of chunking in a RAG pipeline?

A) To make the text look shorter B) To create manageable retrieval units C) To increase the model’s context window D) To remove the need for embeddings

Answer: B Explanation: Chunking helps convert large documents into pieces that can be embedded and retrieved effectively.

Question 6

Your product needs lower latency for a user-facing response stream. Which pattern is most directly helpful?

A) Streaming integration B) Sequential batch processing C) Hard-coded prompts D) Bigger context windows

Answer: A Explanation: Streaming improves perceived speed by showing output as it is generated.

Question 7

What is the biggest tradeoff of prompt caching?

A) It eliminates the need for prompts B) It adds some architecture complexity while reducing repeated prompt cost C) It only works for images D) It makes the model less safe

Answer: B Explanation: Prompt caching is valuable when repeated instructions are stable, but it still adds design and lifecycle considerations.

Question 8

You need a retrieval system that works with vector similarity search. What is the main role of the vector database?

A) Authenticate users B) Store embeddings for retrieval C) Generate the final answer D) Replace the model

Answer: B Explanation: Vector databases store embeddings and make similarity search efficient for RAG.

Question 9

Which principle best describes separating API logic, business logic, and data access logic?

A) Observability B) Separation of concerns C) Rate limiting D) Hallucination control

Answer: B Explanation: Separation of concerns keeps systems easier to maintain, test, and reason about.

Question 10

What is the most important question when choosing between two architecture patterns?

A) Which one has the longest name? B) Which one solves the problem with the fewest tradeoffs? C) Which one is newer? D) Which one has more pages in the docs?

Answer: B Explanation: Exam questions usually reward matching the pattern to the problem and the constraint.

Domain 4: Safety, Security & Responsible AI

Guardrails, moderation, validation, secure defaults, and incident handling.

Why it matters

High-stakes exam questions often reward the safer control path, not the flashiest one.

Focus areas

Refusal behavior
Secret protection
Validation and moderation
Human review

Common mistakes

Hardcoding secrets
Skipping escalation paths
Assuming the model can police itself

Quick recall

Refuse harmful requests
Validate input early
Sanitize output before use

Practice assessment Test this domain in the exam-style flow. Back to top Jump back to the domain map.

Study guide

Domain 4: Safety, Security & Responsible AI

What this is

This domain covers the controls around safe Claude applications: refusal behavior, key protection, moderation, validation, privacy, fairness, accountability, and incident response.

Why it matters on the exam

The exam often tests whether you can protect users and data while still keeping the system useful. The safest answer is usually the one that adds guardrails instead of hoping the model will behave.

What to focus on

How Claude should respond to harmful requests.
How to protect secrets and sensitive data.
How validation and moderation differ.
When human review is required.
What to do after an incident.

Key decisions and tradeoffs

Refuse harmful requests clearly and briefly.
Keep API keys server-side and out of the browser.
Validate inputs before they reach the model.
Sanitize outputs before display or downstream use.
Rate limit to prevent abuse and control cost.
Use human oversight for high-stakes or ambiguous decisions.
Be transparent about AI use and limitations.

Common mistakes

Hardcoding secrets or exposing them client-side.
Treating safety as a final filter instead of a design rule.
Forgetting that a refusal is a correct safety outcome.
Skipping audit logs and escalation paths.
Assuming the model can safely handle every high-stakes decision on its own.

Scenario examples

Handling harmful input

A user asks for instructions that would obviously cause harm. The correct response is to refuse, explain briefly, and offer a safer alternative.

Protecting secrets

You are building a browser app and are tempted to call the API directly. The secure choice is to keep the API call on the server and protect the key with secrets management.

Moderation and validation

User-generated text may contain jailbreak patterns or malformed payloads. Validate and screen before the model sees it, then sanitize anything you show back to users.

High-stakes workflows

A decision affects finance, health, or legal outcomes. The right architecture includes human review, logging, and a clear escalation path.

Quick recall

Harmlessness, honesty, helpfulness.
Never hardcode API keys.
Validate input and sanitize output.
Rate limit to prevent abuse.
Use human review for high-stakes decisions.
Audit logs support accountability.
Transparency helps users understand AI use.

Compare notes

Moderation vs validation: moderation is policy screening; validation is input hygiene.
Transparency vs explainability: transparency says AI is involved; explainability helps users understand the output.
Human oversight vs automation: oversight matters when mistakes are expensive.
Safety vs helpfulness: helpful outputs must still stay within boundaries.

Official references

Study move

For every safety scenario, ask:

Is this harmful, sensitive, or high stakes?
Should the model answer, refuse, or escalate?
What control protects the user or the system?
Which principle is the exam trying to test?

Flashcards

Domain 4: Flashcards

Safety Principles

Q: Three principles of Constitutional AI?
A: Harmlessness, Honesty, Helpfulness.

Q: Harmlessness means what?
A: Refuse harmful requests.

Q: Honesty means what?
A: Be truthful and admit uncertainty.

Q: Helpfulness means what?
A: Be useful while staying safe.

Secure Implementation

Q: Where should API keys live?
A: Server-side secrets or environment variables.

Q: What should never be hardcoded?
A: API keys.

Q: Input validation protects against what?
A: Malformed or malicious input.

Q: Output sanitization protects against what?
A: Unsafe or sensitive output.

Safety Operations

Q: What is red-teaming?
A: Adversarial safety testing.

Q: What is jailbreaking?
A: Trying to bypass safety rules.

Q: Why rate limit?
A: Prevent abuse and control cost.

Q: Why log incidents?
A: Accountability and debugging.

Privacy and Compliance

Q: PII means what?
A: Personally identifiable information.

Q: Data minimization means what?
A: Collect only what is needed.

Q: GDPR right to forget means what?
A: Delete data on request.

Q: NIST AI RMF is what?
A: A voluntary AI risk framework.

Fairness and Oversight

Q: Representation bias means what?
A: Underrepresented groups in training data.

Q: Human oversight matters when?
A: High-stakes decisions.

Q: Transparency means what?
A: Tell users AI is involved.

Q: Explainability means what?
A: Make decisions understandable.

Incident Response

Q: First step in an incident?
A: Contain the issue.

Q: After containment, what next?
A: Assess, communicate, remediate, review.

Q: Why keep audit logs?
A: Traceability and accountability.

Q: Why test safety boundaries?
A: Find failure modes before users do.

Practice questions

Domain 4: Practice Questions

Safety, Security & Responsible AI

Question 1

Which behavior best matches a safe Claude response to a clearly harmful request?

A) Give the full instructions B) Refuse briefly and redirect to a safer alternative C) Ignore safety and answer anyway D) Stall without explanation

Answer: B Explanation: Harmlessness means refusing harmful requests while still being helpful.

Question 2

Where should an API key be stored in a production app?

A) In the browser bundle B) In source code comments C) In environment variables or secret storage D) In the README file

Answer: C Explanation: API keys must stay server-side and protected by secret management.

Question 3

What is the purpose of input validation in an LLM app?

A) To increase creativity B) To reduce token counts only C) To catch malicious or malformed input before it reaches the model D) To remove the need for moderation

Answer: C Explanation: Validation is a first line of defense against harmful or malformed input.

Question 4

Which practice is most directly associated with finding jailbreak weaknesses?

A) Red-teaming B) Prompt caching C) Blue-green deployment D) Tokenization

Answer: A Explanation: Red-teaming is adversarial testing that looks for abuse and safety gaps.

Question 5

Your app is used in a high-stakes workflow. What is the best design addition?

A) No guardrails because the model is smart B) Human oversight and escalation paths C) Higher temperature D) More caching only

Answer: B Explanation: High-stakes use cases need review, accountability, and clear escalation.

Question 6

What is the main goal of output sanitization?

A) Make the answer longer B) Validate or clean model output before display or downstream use C) Improve tokenization D) Replace human review

Answer: B Explanation: Output sanitization helps prevent bad output from becoming a second security problem.

Question 7

What does transparency mean in an AI product?

A) Hide AI usage to reduce friction B) Tell users an AI is involved and explain limitations C) Give the model all possible permissions D) Remove all error messages

Answer: B Explanation: Transparency helps users understand what the system is and what it is not.

Question 8

Which bias occurs when a dataset underrepresents certain groups?

A) Deployment bias B) Historical bias C) Representation bias D) Sampling noise only

Answer: C Explanation: Representation bias is a mismatch in who is present in the data.

Question 9

What is the primary purpose of rate limiting in a Claude app?

A) Make prompts shorter B) Prevent abuse and control cost C) Increase context windows D) Replace logging

Answer: B Explanation: Rate limiting protects the system, the budget, and the users.

Question 10

Which step should happen first when an incident affects a live AI workflow?

A) Immediate containment B) Post-mortem C) Feature expansion D) Model fine-tuning

Answer: A Explanation: Contain the issue first, then assess, communicate, remediate, and review.

Domain 5: Implementation & Operations

Testing, deployment, monitoring, recovery, versioning, and cost control.

Why it matters

This is the production-readiness layer that keeps the system working after launch.

Focus areas

Testing layers
Rollout safety
Monitoring and alerts
Rollback and recovery

Common mistakes

Treating deployment as the end
Skipping staging
Ignoring latency and cost

Quick recall

Unit, integration, E2E
Canary and blue-green reduce risk
Pin versions when behavior matters

Practice assessment Test this domain in the exam-style flow. Back to top Jump back to the domain map.

Study guide

Domain 5: Implementation & Operations

What this is

This domain covers the lifecycle after the code works: testing, deployment, monitoring, debugging, optimization, versioning, rollback, and recovery.

Why it matters on the exam

The exam asks whether a system can ship and stay healthy. The strongest answer balances reliability, observability, rollout safety, cost control, and recovery.

What to focus on

Test in layers: unit, integration, then end-to-end.
Use staging before production.
Roll out risky changes gradually.
Monitor latency, errors, cost, and user impact.
Improve cost and speed with better prompts, caching, batching, or model choice.
Pin versions when behavior needs to stay stable.

Key decisions and tradeoffs

Start with isolated tests, then add integration and end-to-end coverage.
Use feature flags, canary, or blue-green when risk is meaningful.
Track latency, time to first token, errors, and token usage.
Optimize prompt and model choices before overengineering infrastructure.
Pin model and API versions when predictability matters.

Common mistakes

Treating deployment as the end of the work.
Confusing test layers.
Ignoring latency and cost metrics.
Shipping risky changes without rollback planning.
Failing to test restores or recovery steps.

Scenario examples

Testing strategy

You changed a tool wrapper and want confidence it still works. Unit tests check the wrapper, integration tests check the tool path, and end-to-end tests confirm the user flow.

Deployment safety

A prompt update could change user-visible behavior. The safe answer is staging, then gradual release, with rollback ready.

Monitoring

The app is correct but feels slow. Look at latency, time to first token, retry rate, token count, and whether caching or batching would help.

Recovery

An incident breaks a critical path. First contain impact, then assess scope, communicate, remediate, and review the lesson afterward.

Quick recall

Unit tests check isolated parts.
Integration tests check connected parts.
End-to-end tests check full workflows.
Staging mirrors production.
Canary and blue-green reduce rollout risk.
Observability includes logs, metrics, and alerts.
Cost control starts with model choice and prompt efficiency.

Compare notes

Unit vs integration vs E2E: isolated logic versus connected components versus full flow.
Staging vs production: rehearsal versus live traffic.
Canary vs blue-green: gradual exposure versus environment switch.
Optimization vs monitoring: improvement versus measurement.

Official references

Study move

For any operations question, ask:

What failed or might fail?
What should be measured first?
What is the safest rollout or recovery path?
Which control reduces risk, cost, or latency?

Flashcards

Domain 5: Flashcards

Testing

Q: Unit testing checks what?
A: One component in isolation.

Q: Integration testing checks what?
A: Multiple parts working together.

Q: End-to-end testing checks what?
A: A full user flow.

Q: Why use mocks?
A: To test without live API calls.

Deployment

Q: Staging is for what?
A: Production-like validation.

Q: Blue-green deployment gives what?
A: Instant rollback.

Q: Canary deployment gives what?
A: Gradual exposure.

Q: Feature flags give what?
A: Controlled rollout.

Monitoring

Q: Time to first token measures what?
A: How fast output starts.

Q: Logs, metrics, traces together are what?
A: Observability.

Q: Info logs are for what?
A: Normal operations.

Q: High error rates usually mean what?
A: Something is broken or unstable.

Cost and Performance

Q: Prompt optimization helps with what?
A: Lower token usage.

Q: Caching helps with what?
A: Speed and cost.

Q: Model choice helps with what?
A: Quality, latency, and cost tradeoffs.

Q: Load balancing helps with what?
A: Throughput and resilience.

Debugging and Recovery

Q: First debugging step?
A: Reproduce the issue.

Q: Why pin model versions?
A: Predictability.

Q: Why pin API versions?
A: Reduce surprises.

Q: Backup testing matters because?
A: A backup only matters if it restores.

Operations Mindset

Q: Production work ends when deployment finishes?
A: No.

Q: What should be monitored continuously?
A: Errors, latency, cost, usage.

Q: What should every incident end with?
A: A review or post-mortem.

Q: What should every release have?
A: A rollback path.

Practice questions

Domain 5: Practice Questions

Implementation & Operations

Question 1

You changed a utility that formats API output. Which test type should prove the formatter works alone?

A) End-to-end B) Integration C) Unit D) Load

Answer: C Explanation: Unit tests verify isolated behavior without depending on the full system.

Question 2

Your app needs confidence before shipping a prompt update. What is the safest deployment sequence?

A) Production first, then test B) Staging, verify, then production C) Skip staging and hope for the best D) Increase temperature

Answer: B Explanation: Staging is the safer rehearsal step before production traffic sees the change.

Question 3

What metric best describes how fast a user sees the first streamed token?

A) Total runtime B) Time to first token C) Cache hit ratio D) Error budget

Answer: B Explanation: Time to first token is a key user-experience metric for streaming systems.

Question 4

Which rollout strategy gives the easiest instant rollback?

A) Blue-green deployment B) Manual code copy C) Direct production overwrite D) Rebuild from scratch

Answer: A Explanation: Blue-green keeps two environments so traffic can switch back quickly if needed.

Question 5

You want to release a risky feature gradually. What tool is most appropriate?

A) Feature flags B) Tokenization C) Prompt caching D) Hardcoding

Answer: A Explanation: Feature flags let you control exposure and roll back without a full redeploy.

Question 6

Which monitoring signal most directly suggests the user experience is degrading even if answers are correct?

A) Time to first token B) Source code line count C) Model name length D) README size

Answer: A Explanation: Response latency and first-token timing are major user experience signals.

Question 7

Which choice is the best first step when an integration feels expensive?

A) Increase max_tokens B) Check model choice, prompt size, and caching opportunities C) Turn off logging D) Add more retries only

Answer: B Explanation: Cost optimization starts with usage patterns, not just infrastructure.

Question 8

What does “pinning” a model version help prevent?

A) Better prompts B) Unexpected behavior changes from automatic upgrades C) API key leaks D) Rate limiting

Answer: B Explanation: Version pinning makes behavior more predictable.

Question 9

What is the first step in debugging a production issue?

A) Reproduce it consistently B) Rewrite the docs C) Increase the temperature D) Switch to a new architecture

Answer: A Explanation: Reproduction is the foundation of useful debugging.

Question 10

Which practice belongs to disaster recovery planning?

A) Ignoring backups B) Testing restores regularly C) Removing monitoring D) Deleting runbooks

Answer: B Explanation: Recovery only matters if restores are tested and verified.