The README for each domain is the main explanation layer.
Single source
Content library
This is the only reading surface. Every domain is grouped here with the original study guide, practice questions, flashcards, and the official Anthropic source links that back it up.
How to use this page
Read one domain section at a time, then move straight into assessment when you want to test yourself.
The question file gives recall and decision practice for the same domain.
The flashcards are compact memory cues for review sessions.
Domain map
Everything is divided by domain, with the source material underneath each heading.
Domain 1: Core Concepts
Model choice, sampling, prompt structure, and safety basics.
Study guide
Domain 1: Core AI Concepts & Claude Fundamentals
What this is
This domain is the foundation layer: model choice, token limits, context windows, prompt structure, sampling controls, and the behavior rules that make Claude useful and safe.
Why it matters on the exam
Most questions in this domain are really asking, “Can you choose the right default and explain the tradeoff?” If you know what each control changes, the answer becomes obvious.
What to focus on
- Model choice: quality, balance, or speed/cost.
- Sampling: how much variation you want in the output.
- Prompt roles: what belongs in system instructions versus user input.
- Context: what Claude can actually see at one time.
- Safety: how Claude should respond when a request is risky or unclear.
Key decisions and tradeoffs
- Use Opus when reasoning quality matters most.
- Use Sonnet when you want the best general-purpose default.
- Use Haiku when speed and cost matter more than depth.
- Use low temperature for stable, repeatable outputs.
- Use higher temperature for brainstorming or exploration.
- Use system messages for rules and tone, and user messages for the task itself.
Common mistakes
- Treating tokens as if they were the same as words.
- Picking a lightweight model for a task that needs careful reasoning.
- Putting behavior rules in the wrong message role.
- Assuming higher temperature always improves results.
- Forgetting that uncertainty should be acknowledged, not hidden.
Scenario examples
Choosing a model
You need a contract summary that will be reviewed by a legal team. The safest study answer is the model with the strongest reasoning, not the cheapest one.
Choosing randomness
You are generating code or configuration and want the output to stay consistent. Lower temperature is the right answer because you want less variation.
Structuring a prompt
You want a strict JSON answer with a fixed format. Put the format rule in the system message, then ask the actual question in the user message.
Handling uncertainty
The model cannot verify a detail from the provided context. The correct behavior is to say so and ask for more information or cite what is available.
Quick recall
- Tokens power pricing and context.
- Context window is the total input and output budget.
- Temperature controls randomness.
- Top-K limits the candidate set by count.
- Top-P limits the candidate set by probability mass.
- System messages set behavior.
- User messages set the task.
- Constitutional AI emphasizes harmlessness, honesty, and helpfulness.
Compare notes
- Token vs word: tokens are the unit the model uses; words are only an approximation.
- Temperature vs Top-P: temperature changes randomness; Top-P changes how wide the candidate pool is.
- System vs user: system sets rules, user asks for work.
- Opus vs Sonnet vs Haiku: strongest reasoning vs balanced default vs fastest/cheapest.
Official references
Study move
For any exam-style question, ask:
- What control or model is being tested?
- What tradeoff is most important?
- What would go wrong if I chose the wrong setting?
- What is the simplest correct rule to remember?
Flashcards
Domain 1: Flashcards
Model Choice
Q: Best model for the hardest reasoning tasks?
A: Claude Opus.
Q: Best model for balanced everyday use?
A: Claude Sonnet.
Q: Best model for speed and cost?
A: Claude Haiku.
Q: When is Opus worth the cost?
A: When reasoning quality matters most.
Q: When is Haiku the right choice?
A: When volume and latency matter most.
Tokens and Context
Q: What is a token?
A: The unit Claude uses for text and pricing.
Q: Roughly how many words is one token?
A: About three quarters of a word.
Q: What is the context window?
A: The total input plus output budget.
Q: Why do tokens matter?
A: They control cost and how much the model can see.
Sampling
Q: What does low temperature do?
A: Makes output more deterministic.
Q: What does high temperature do?
A: Makes output more varied.
Q: What does Top-K control?
A: The number of candidate tokens.
Q: What does Top-P control?
A: The probability mass of candidate tokens.
Prompt Structure
Q: What does the system message do?
A: Sets behavior and rules.
Q: What does the user message do?
A: States the actual task.
Q: What does the assistant message do?
A: Holds previous model replies.
Q: Where should output format rules go?
A: In the system message.
Prompt Patterns
Q: What is few-shot prompting?
A: Teaching by example.
Q: What is role prompting?
A: Giving the model a persona.
Q: What is structured output prompting?
A: Asking for JSON or another strict format.
Q: What is chain-of-thought prompting?
A: Asking the model to reason step by step.
Claude Safety
Q: What are the three principles of Constitutional AI?
A: Harmlessness, Honesty, Helpfulness.
Q: What should Claude do when unsure?
A: Say so honestly.
Q: What does harmlessness mean?
A: Refuse harmful requests.
Q: What is hallucination?
A: Confidently wrong output.
Capabilities
Q: What lets Claude interact with external systems?
A: Tool use.
Q: Do newer Claude models support images?
A: Yes.
Q: Can Claude browse the web by itself?
A: No, not natively.
Q: Does Claude remember every session automatically?
A: No.
Fast Review
Q: Best default for most apps?
A: Sonnet.
Q: Best choice for creative exploration?
A: Higher temperature.
Q: Best choice for reproducible output?
A: Lower temperature.
Q: Best study mantra for uncertainty?
A: Verify instead of inventing.
Practice questions
Domain 1: Practice Questions
Core AI Concepts & Claude Fundamentals
Question 1
A legal team needs the strongest reasoning possible for a complex document review and cost is secondary. Which model is the best default choice?
A) Claude Haiku B) Claude Sonnet C) Claude Opus D) Any model with a higher temperature
Answer: C Explanation: Opus is the safest choice when quality and reasoning depth matter most.
Question 2
Your team wants repeatable code generation with minimal variation across runs. Which control should you lower first?
A) Top-P B) Temperature C) Context window D) Token count
Answer: B Explanation: Lower temperature reduces randomness and makes output more consistent.
Question 3
A prompt is failing because the input document and the expected output are both long. What should you think about first?
A) Tokenization B) Context window C) Temperature D) System prompt
Answer: B Explanation: The context window is the total token budget for input plus output.
Question 4
Which statement best describes a token?
A) It is exactly one word B) It is the unit Claude uses for pricing and context C) It is only used for authentication D) It is the same thing as a response message
Answer: B Explanation: Tokens are the basic unit used for billing and context management.
Question 5
You want Claude to obey strict behavior and formatting rules. Where should those rules live?
A) User message B) Assistant message C) System message D) Tool result
Answer: C Explanation: The system message sets the behavior, role, and constraints.
Question 6
You want Claude to imitate a response format by example. Which prompting pattern fits best?
A) Few-shot prompting B) Sampling C) Tokenization D) Temperature tuning
Answer: A Explanation: Few-shot prompting uses examples to guide style, structure, and output shape.
Question 7
Claude is unsure about a fact from the provided context. What is the safest exam answer?
A) Invent a plausible answer B) Act confident and continue C) Acknowledge uncertainty and verify D) Repeat the same claim more forcefully
Answer: C Explanation: Honesty means acknowledging uncertainty instead of fabricating details.
Question 8
What is the best description of Top-K sampling?
A) It chooses tokens until a probability threshold is reached B) It limits the candidate set to the K most likely tokens C) It removes all randomness D) It changes the system prompt
Answer: B Explanation: Top-K bounds the candidate pool by count, not probability mass.
Question 9
Which choice best fits a high-volume, cost-sensitive, near-real-time workload?
A) Opus B) Sonnet C) Haiku D) The highest temperature setting
Answer: C Explanation: Haiku is the fastest and most cost-effective option for lighter workloads.
Question 10
What is the main purpose of Constitutional AI in a study answer?
A) Make the model generate longer answers B) Improve pricing consistency C) Keep the model helpful, honest, and harmless D) Increase the context window
Answer: C Explanation: Constitutional AI is the safety and behavior framework behind Claude’s responses.
Domain 2: API & Integration
Request flow, streaming, tools, retries, caching, and secure integration.
Study guide
Domain 2: Claude API & Integration
What this is
This domain covers the request lifecycle: authentication, request shape, streaming, tool use, structured responses, retries, rate limits, and secure server-side integration.
Why it matters on the exam
Questions here usually ask what the application should do next. You need to know the API flow well enough to pick the safe, correct response under pressure.
What to focus on
- The minimum request shape for a Messages API call.
- How system instructions differ from conversation messages.
- What streaming changes for the user experience.
- What tool use means for your application logic.
- Which failures should be retried versus fixed.
- How to protect secrets and limit abuse.
Key decisions and tradeoffs
- Use the Messages API for standard conversational interactions.
- Keep the system prompt separate from the message history.
- Use streaming when responsiveness matters.
- Use tool use when Claude must trigger an external action.
- Retry rate limits and transient failures with backoff.
- Use prompt caching when repeated instructions or context are stable.
- Keep API calls server-side so secrets never reach the browser.
Common mistakes
- Forgetting required request fields.
- Mixing up system instructions and user content.
- Handling a tool request like a normal final answer.
- Retrying a rate-limited call immediately.
- Exposing keys in client-side code.
- Assuming streaming changes the model rather than the delivery pattern.
Scenario examples
Streaming
A user is waiting on a long answer and you want the UI to feel alive. Streaming is the right answer because it improves perceived latency.
Tool use
Claude needs live data from a database or external service. The application should execute the tool, return the result, and let Claude continue.
Error handling
The API returns 429. The best exam answer is exponential backoff, not a retry loop that makes the problem worse.
Security
You are building a browser app. Keep the Claude call on the server and protect the key with environment variables or secret storage.
Quick recall
- Base URL:
https://api.anthropic.com - Required fields:
model,max_tokens,messages anthropic-versionbelongs in the request headers.- Streaming uses server-sent events.
- Tool results flow back through the conversation.
- 429 means rate limit.
invalid_request_errorusually means fix the request.
Compare notes
- Streaming vs non-streaming: same model, different delivery.
- System vs messages: rules versus conversation history.
- Tool use vs plain completion: application action is required before the final answer.
- Retryable vs non-retryable: 429 and many transient 5xx errors may be retried; bad requests should be corrected.
Official references
Study move
When you read the docs, extract:
- the required request fields
- the response events you must handle
- the failures you should retry
- the security rule that protects the app
Flashcards
Domain 2: Flashcards
Request Basics
Q: Base URL for the Claude API?
A: https://api.anthropic.com
Q: Required version header?
A: anthropic-version
Q: Required output limit field?
A: max_tokens
Q: Where does the system prompt live?
A: Top-level system
Q: Which message role is the user?
A: user
Auth and Security
Q: Where should the API key never go?
A: Client-side code.
Q: Safe place for secrets?
A: Environment variables or secret storage.
Q: What does a 401 usually mean?
A: Bad or missing auth.
Q: What does a 429 usually mean?
A: Rate limit hit.
Streaming
Q: What format does streaming use?
A: Server-sent events.
Q: Why use streaming?
A: Better time to first token.
Q: Common stream event?
A: message_start
Q: Streamed text arrives in what kind of event?
A: content_block_delta
Tool Use
Q: What does tool_use mean?
A: Claude wants the app to run a tool.
Q: What should the app do after tool_use?
A: Execute and return tool_result.
Q: Tool schema format?
A: JSON Schema.
Q: Tool definitions usually include what three things?
A: Name, description, input schema.
Errors and Retry
Q: What error should usually be retried with backoff?
A: 429.
Q: Retry pattern for transient failures?
A: Exponential backoff.
Q: What error usually means the request itself is broken?
A: invalid_request_error
Q: What should you do with 4xx errors besides 429?
A: Fix the request.
Caching and Cost
Q: Why use prompt caching?
A: Lower cost for repeated prompt material.
Q: What is usually more expensive, input or output tokens?
A: Output.
Q: When is prompt caching useful?
A: Stable instructions and examples.
Q: What should you optimize first if cost is high?
A: Prompt size, model choice, caching.
Memory Hooks
Q: Streaming helps with what user perception?
A: Responsiveness.
Q: Tool use helps with what?
A: Actions outside the model.
Q: Backoff helps with what?
A: Retry safety.
Q: Server-side calls help with what?
A: Secret protection.
Practice questions
Domain 2: Practice Questions
Claude API & Integration
Question 1
Your app must call Claude from a browser-based UI. What is the safest architecture choice?
A) Put the API key in client-side JavaScript B) Call Claude directly from the browser C) Keep the API call on the server D) Disable authentication
Answer: C Explanation: API keys should never be exposed in client-side code.
Question 2
Which request field is required for a Messages API call?
A) temperature
B) system
C) max_tokens
D) top_p
Answer: C
Explanation: max_tokens is required; the others are optional controls.
Question 3
Claude returns tool_use. What should your application do next?
A) Ignore it and wait B) Execute the requested tool and return the result C) Immediately retry the request D) Change the model
Answer: B
Explanation: tool_use means the application must act before Claude can finish the answer.
Question 4
The API returns 429 during a traffic spike. What is the best next step?
A) Retry immediately in a loop B) Retry later with exponential backoff C) Delete the key D) Switch to a higher temperature
Answer: B Explanation: 429 is a rate-limit response; backoff is the correct retry pattern.
Question 5
Which location is correct for the system prompt?
A) Inside the messages array
B) As a top-level system field
C) Inside a tool result
D) Inside the user message text
Answer: B Explanation: The system prompt is separate from the conversation array.
Question 6
You want the UI to show partial answer text as it arrives. Which API feature fits best?
A) Prompt caching B) Streaming C) Retry logic D) Batch processing
Answer: B Explanation: Streaming improves perceived latency and updates the UI progressively.
Question 7
Which control is most appropriate for reducing cost on repeated instructions and examples?
A) Prompt caching B) Higher temperature C) More retries D) Larger output limits
Answer: A Explanation: Prompt caching reduces the cost of repeated prompt material.
Question 8
Which error type usually means the request itself needs to be fixed rather than retried?
A) invalid_request_error
B) rate_limit_error
C) api_error
D) Temporary network timeout
Answer: A Explanation: Invalid request errors point to bad parameters, malformed JSON, or missing required fields.
Question 9
Your assistant must answer in a strict machine-readable format. Where should that requirement live?
A) system
B) messages[0]
C) The final assistant response only
D) The browser UI
Answer: A Explanation: System instructions are the best place for behavior and output-format rules.
Question 10
Which retry pattern is best for transient failures?
A) Immediate repeated requests B) Exponential backoff C) Hard fail every time D) Switch to a different user message
Answer: B Explanation: Exponential backoff reduces pressure on the service and gives transient issues time to clear.
Domain 3: Architecture & Design
RAG, agents, multi-agent coordination, context management, and orchestration.
Study guide
Domain 3: Architecture & Design Patterns
What this is
This domain is about system shape: RAG, agents, multi-agent coordination, context management, prompt caching, streaming, batching, and observability.
Why it matters on the exam
The exam often gives you a constraint and asks you to choose the best architecture. The skill here is not memorizing buzzwords; it is matching the pattern to the problem.
What to focus on
- When retrieval is better than reasoning.
- When a simple request flow is enough.
- When an agent is justified.
- When specialization should become multi-agent coordination.
- How to manage long or growing context.
- Where caching, streaming, batching, and observability fit.
Key decisions and tradeoffs
- Use RAG when answers must stay grounded in external knowledge.
- Use an agent pattern when the system must reason and act over multiple steps.
- Use multi-agent coordination when specialization improves the outcome.
- Use summarization or sliding windows when context is getting too large.
- Use prompt caching when stable instructions or examples repeat often.
- Use streaming for responsiveness.
- Use batching for throughput.
Common mistakes
- Choosing RAG when the real problem is orchestration.
- Using an agent when a single turn would be simpler.
- Treating multi-agent as automatically better.
- Confusing prompt caching with response caching.
- Ignoring the complexity cost of adding more moving parts.
Scenario examples
RAG versus an agent
You need answers from a private knowledge base. If the problem is grounding, choose RAG. If the problem includes deciding which tools to use, the better answer is an agent flow.
Context management
The conversation is getting too long to fit comfortably. Summarize older material, retain key facts, or retrieve only what is needed instead of dropping everything blindly.
Multi-agent coordination
A workflow needs one specialist for policy, one for code, and one for summarization. The key idea is specialization plus synthesis, not just “more agents.”
Scaling choice
Requests are independent and the bottleneck is throughput. Horizontal scaling or batching is the more relevant architectural decision.
Quick recall
- RAG = retrieve and ground.
- Agent = reason, act, observe.
- Multi-agent = specialize, then coordinate.
- Prompt caching = reuse stable prompt material.
- Streaming = better perceived speed.
- Batching = throughput over latency.
- Observability = logs, metrics, traces.
Compare notes
- RAG vs agent: grounded retrieval versus decision-making and action.
- Single-agent vs multi-agent: simplicity versus specialization.
- Streaming vs batch: responsiveness versus throughput.
- Summarization vs sliding window: compress versus truncate.
Official references
Study move
For every pattern, ask:
- What problem does it solve?
- What does it cost?
- What failure mode is it preventing?
- What simpler pattern might the exam expect me to overlook?
Flashcards
Domain 3: Flashcards
Core Patterns
Q: What does RAG do?
A: Retrieves documents and grounds the answer.
Q: What does an agent do?
A: Reasons, acts, and observes over multiple steps.
Q: What does multi-agent architecture add?
A: Specialization plus coordination.
Q: What does context management protect?
A: The token budget and useful history.
Context Strategies
Q: Sliding window means what?
A: Keep the newest context only.
Q: Summarization means what?
A: Compress older context.
Q: Selective retention means what?
A: Keep important pieces, drop the rest.
Q: RAG for context means what?
A: Retrieve context on demand.
RAG Building Blocks
Q: Why chunk documents?
A: To make them retrievable.
Q: Semantic chunking does what?
A: Preserves natural meaning boundaries.
Q: Fixed-size chunking does what?
A: Splits by token length.
Q: What stores embeddings?
A: A vector database.
Agent Workflows
Q: ReAct means what?
A: Reason + Act loop.
Q: Plan-and-execute means what?
A: Plan first, then execute.
Q: Reflexion means what?
A: Learn from mistakes and adjust.
Q: Tool-use workflow means what?
A: Model chooses a tool, app runs it.
Scaling and Operations
Q: Horizontal scaling means what?
A: More instances.
Q: Streaming integration means what?
A: Show output as it arrives.
Q: Batch processing means what?
A: Trade latency for throughput.
Q: Prompt caching helps with what?
A: Repeated prompt cost.
Design Principles
Q: Separation of concerns means what?
A: Split logic into clear layers.
Q: Observability means what?
A: Logs, metrics, traces.
Q: Fail-safe design means what?
A: Graceful fallback.
Q: Testability means what?
A: Easy to verify behavior.
Memory Hooks
Q: RAG = ?
A: Retrieve, then ground.
Q: Agent = ?
A: Decide, act, observe.
Q: Multi-agent = ?
A: Specialize, coordinate, synthesize.
Q: Streaming = ?
A: Faster perceived response.
Practice questions
Domain 3: Practice Questions
Architecture & Design Patterns
Question 1
Your product needs answers grounded in a private document set that changes over time. Which pattern fits best?
A) RAG B) Static prompt templates C) Tokenization D) Model version pinning
Answer: A Explanation: RAG retrieves current source material and injects it into the prompt so responses stay grounded.
Question 2
Which problem is the best use case for an agent pattern?
A) Return a fixed summary from a short prompt B) Execute a single deterministic conversion C) Solve a multi-step task that requires decision-making and tool use D) Format a JSON response once
Answer: C Explanation: Agents are useful when the system must reason, act, and observe across multiple steps.
Question 3
When would multi-agent coordination be the better architecture?
A) When the work is simple and fully deterministic B) When specialized subproblems benefit from different expertise C) When you want fewer moving parts D) When you want to remove all orchestration
Answer: B Explanation: Multi-agent systems shine when specialization is useful and the coordinator can synthesize the output.
Question 4
Your context is getting too large to keep every past message. What is a reasonable strategy?
A) Ignore the issue B) Use summarization or selective retention C) Double the temperature D) Remove the system prompt
Answer: B Explanation: Context management strategies help retain useful information while staying inside token limits.
Question 5
What is the main purpose of chunking in a RAG pipeline?
A) To make the text look shorter B) To create manageable retrieval units C) To increase the model’s context window D) To remove the need for embeddings
Answer: B Explanation: Chunking helps convert large documents into pieces that can be embedded and retrieved effectively.
Question 6
Your product needs lower latency for a user-facing response stream. Which pattern is most directly helpful?
A) Streaming integration B) Sequential batch processing C) Hard-coded prompts D) Bigger context windows
Answer: A Explanation: Streaming improves perceived speed by showing output as it is generated.
Question 7
What is the biggest tradeoff of prompt caching?
A) It eliminates the need for prompts B) It adds some architecture complexity while reducing repeated prompt cost C) It only works for images D) It makes the model less safe
Answer: B Explanation: Prompt caching is valuable when repeated instructions are stable, but it still adds design and lifecycle considerations.
Question 8
You need a retrieval system that works with vector similarity search. What is the main role of the vector database?
A) Authenticate users B) Store embeddings for retrieval C) Generate the final answer D) Replace the model
Answer: B Explanation: Vector databases store embeddings and make similarity search efficient for RAG.
Question 9
Which principle best describes separating API logic, business logic, and data access logic?
A) Observability B) Separation of concerns C) Rate limiting D) Hallucination control
Answer: B Explanation: Separation of concerns keeps systems easier to maintain, test, and reason about.
Question 10
What is the most important question when choosing between two architecture patterns?
A) Which one has the longest name? B) Which one solves the problem with the fewest tradeoffs? C) Which one is newer? D) Which one has more pages in the docs?
Answer: B Explanation: Exam questions usually reward matching the pattern to the problem and the constraint.
Domain 4: Safety, Security & Responsible AI
Guardrails, moderation, validation, secure defaults, and incident handling.
Study guide
Domain 4: Safety, Security & Responsible AI
What this is
This domain covers the controls around safe Claude applications: refusal behavior, key protection, moderation, validation, privacy, fairness, accountability, and incident response.
Why it matters on the exam
The exam often tests whether you can protect users and data while still keeping the system useful. The safest answer is usually the one that adds guardrails instead of hoping the model will behave.
What to focus on
- How Claude should respond to harmful requests.
- How to protect secrets and sensitive data.
- How validation and moderation differ.
- When human review is required.
- What to do after an incident.
Key decisions and tradeoffs
- Refuse harmful requests clearly and briefly.
- Keep API keys server-side and out of the browser.
- Validate inputs before they reach the model.
- Sanitize outputs before display or downstream use.
- Rate limit to prevent abuse and control cost.
- Use human oversight for high-stakes or ambiguous decisions.
- Be transparent about AI use and limitations.
Common mistakes
- Hardcoding secrets or exposing them client-side.
- Treating safety as a final filter instead of a design rule.
- Forgetting that a refusal is a correct safety outcome.
- Skipping audit logs and escalation paths.
- Assuming the model can safely handle every high-stakes decision on its own.
Scenario examples
Handling harmful input
A user asks for instructions that would obviously cause harm. The correct response is to refuse, explain briefly, and offer a safer alternative.
Protecting secrets
You are building a browser app and are tempted to call the API directly. The secure choice is to keep the API call on the server and protect the key with secrets management.
Moderation and validation
User-generated text may contain jailbreak patterns or malformed payloads. Validate and screen before the model sees it, then sanitize anything you show back to users.
High-stakes workflows
A decision affects finance, health, or legal outcomes. The right architecture includes human review, logging, and a clear escalation path.
Quick recall
- Harmlessness, honesty, helpfulness.
- Never hardcode API keys.
- Validate input and sanitize output.
- Rate limit to prevent abuse.
- Use human review for high-stakes decisions.
- Audit logs support accountability.
- Transparency helps users understand AI use.
Compare notes
- Moderation vs validation: moderation is policy screening; validation is input hygiene.
- Transparency vs explainability: transparency says AI is involved; explainability helps users understand the output.
- Human oversight vs automation: oversight matters when mistakes are expensive.
- Safety vs helpfulness: helpful outputs must still stay within boundaries.
Official references
Study move
For every safety scenario, ask:
- Is this harmful, sensitive, or high stakes?
- Should the model answer, refuse, or escalate?
- What control protects the user or the system?
- Which principle is the exam trying to test?
Flashcards
Domain 4: Flashcards
Safety Principles
Q: Three principles of Constitutional AI?
A: Harmlessness, Honesty, Helpfulness.
Q: Harmlessness means what?
A: Refuse harmful requests.
Q: Honesty means what?
A: Be truthful and admit uncertainty.
Q: Helpfulness means what?
A: Be useful while staying safe.
Secure Implementation
Q: Where should API keys live?
A: Server-side secrets or environment variables.
Q: What should never be hardcoded?
A: API keys.
Q: Input validation protects against what?
A: Malformed or malicious input.
Q: Output sanitization protects against what?
A: Unsafe or sensitive output.
Safety Operations
Q: What is red-teaming?
A: Adversarial safety testing.
Q: What is jailbreaking?
A: Trying to bypass safety rules.
Q: Why rate limit?
A: Prevent abuse and control cost.
Q: Why log incidents?
A: Accountability and debugging.
Privacy and Compliance
Q: PII means what?
A: Personally identifiable information.
Q: Data minimization means what?
A: Collect only what is needed.
Q: GDPR right to forget means what?
A: Delete data on request.
Q: NIST AI RMF is what?
A: A voluntary AI risk framework.
Fairness and Oversight
Q: Representation bias means what?
A: Underrepresented groups in training data.
Q: Human oversight matters when?
A: High-stakes decisions.
Q: Transparency means what?
A: Tell users AI is involved.
Q: Explainability means what?
A: Make decisions understandable.
Incident Response
Q: First step in an incident?
A: Contain the issue.
Q: After containment, what next?
A: Assess, communicate, remediate, review.
Q: Why keep audit logs?
A: Traceability and accountability.
Q: Why test safety boundaries?
A: Find failure modes before users do.
Practice questions
Domain 4: Practice Questions
Safety, Security & Responsible AI
Question 1
Which behavior best matches a safe Claude response to a clearly harmful request?
A) Give the full instructions B) Refuse briefly and redirect to a safer alternative C) Ignore safety and answer anyway D) Stall without explanation
Answer: B Explanation: Harmlessness means refusing harmful requests while still being helpful.
Question 2
Where should an API key be stored in a production app?
A) In the browser bundle B) In source code comments C) In environment variables or secret storage D) In the README file
Answer: C Explanation: API keys must stay server-side and protected by secret management.
Question 3
What is the purpose of input validation in an LLM app?
A) To increase creativity B) To reduce token counts only C) To catch malicious or malformed input before it reaches the model D) To remove the need for moderation
Answer: C Explanation: Validation is a first line of defense against harmful or malformed input.
Question 4
Which practice is most directly associated with finding jailbreak weaknesses?
A) Red-teaming B) Prompt caching C) Blue-green deployment D) Tokenization
Answer: A Explanation: Red-teaming is adversarial testing that looks for abuse and safety gaps.
Question 5
Your app is used in a high-stakes workflow. What is the best design addition?
A) No guardrails because the model is smart B) Human oversight and escalation paths C) Higher temperature D) More caching only
Answer: B Explanation: High-stakes use cases need review, accountability, and clear escalation.
Question 6
What is the main goal of output sanitization?
A) Make the answer longer B) Validate or clean model output before display or downstream use C) Improve tokenization D) Replace human review
Answer: B Explanation: Output sanitization helps prevent bad output from becoming a second security problem.
Question 7
What does transparency mean in an AI product?
A) Hide AI usage to reduce friction B) Tell users an AI is involved and explain limitations C) Give the model all possible permissions D) Remove all error messages
Answer: B Explanation: Transparency helps users understand what the system is and what it is not.
Question 8
Which bias occurs when a dataset underrepresents certain groups?
A) Deployment bias B) Historical bias C) Representation bias D) Sampling noise only
Answer: C Explanation: Representation bias is a mismatch in who is present in the data.
Question 9
What is the primary purpose of rate limiting in a Claude app?
A) Make prompts shorter B) Prevent abuse and control cost C) Increase context windows D) Replace logging
Answer: B Explanation: Rate limiting protects the system, the budget, and the users.
Question 10
Which step should happen first when an incident affects a live AI workflow?
A) Immediate containment B) Post-mortem C) Feature expansion D) Model fine-tuning
Answer: A Explanation: Contain the issue first, then assess, communicate, remediate, and review.
Domain 5: Implementation & Operations
Testing, deployment, monitoring, recovery, versioning, and cost control.
Study guide
Domain 5: Implementation & Operations
What this is
This domain covers the lifecycle after the code works: testing, deployment, monitoring, debugging, optimization, versioning, rollback, and recovery.
Why it matters on the exam
The exam asks whether a system can ship and stay healthy. The strongest answer balances reliability, observability, rollout safety, cost control, and recovery.
What to focus on
- Test in layers: unit, integration, then end-to-end.
- Use staging before production.
- Roll out risky changes gradually.
- Monitor latency, errors, cost, and user impact.
- Improve cost and speed with better prompts, caching, batching, or model choice.
- Pin versions when behavior needs to stay stable.
Key decisions and tradeoffs
- Start with isolated tests, then add integration and end-to-end coverage.
- Use feature flags, canary, or blue-green when risk is meaningful.
- Track latency, time to first token, errors, and token usage.
- Optimize prompt and model choices before overengineering infrastructure.
- Pin model and API versions when predictability matters.
Common mistakes
- Treating deployment as the end of the work.
- Confusing test layers.
- Ignoring latency and cost metrics.
- Shipping risky changes without rollback planning.
- Failing to test restores or recovery steps.
Scenario examples
Testing strategy
You changed a tool wrapper and want confidence it still works. Unit tests check the wrapper, integration tests check the tool path, and end-to-end tests confirm the user flow.
Deployment safety
A prompt update could change user-visible behavior. The safe answer is staging, then gradual release, with rollback ready.
Monitoring
The app is correct but feels slow. Look at latency, time to first token, retry rate, token count, and whether caching or batching would help.
Recovery
An incident breaks a critical path. First contain impact, then assess scope, communicate, remediate, and review the lesson afterward.
Quick recall
- Unit tests check isolated parts.
- Integration tests check connected parts.
- End-to-end tests check full workflows.
- Staging mirrors production.
- Canary and blue-green reduce rollout risk.
- Observability includes logs, metrics, and alerts.
- Cost control starts with model choice and prompt efficiency.
Compare notes
- Unit vs integration vs E2E: isolated logic versus connected components versus full flow.
- Staging vs production: rehearsal versus live traffic.
- Canary vs blue-green: gradual exposure versus environment switch.
- Optimization vs monitoring: improvement versus measurement.
Official references
Study move
For any operations question, ask:
- What failed or might fail?
- What should be measured first?
- What is the safest rollout or recovery path?
- Which control reduces risk, cost, or latency?
Flashcards
Domain 5: Flashcards
Testing
Q: Unit testing checks what?
A: One component in isolation.
Q: Integration testing checks what?
A: Multiple parts working together.
Q: End-to-end testing checks what?
A: A full user flow.
Q: Why use mocks?
A: To test without live API calls.
Deployment
Q: Staging is for what?
A: Production-like validation.
Q: Blue-green deployment gives what?
A: Instant rollback.
Q: Canary deployment gives what?
A: Gradual exposure.
Q: Feature flags give what?
A: Controlled rollout.
Monitoring
Q: Time to first token measures what?
A: How fast output starts.
Q: Logs, metrics, traces together are what?
A: Observability.
Q: Info logs are for what?
A: Normal operations.
Q: High error rates usually mean what?
A: Something is broken or unstable.
Cost and Performance
Q: Prompt optimization helps with what?
A: Lower token usage.
Q: Caching helps with what?
A: Speed and cost.
Q: Model choice helps with what?
A: Quality, latency, and cost tradeoffs.
Q: Load balancing helps with what?
A: Throughput and resilience.
Debugging and Recovery
Q: First debugging step?
A: Reproduce the issue.
Q: Why pin model versions?
A: Predictability.
Q: Why pin API versions?
A: Reduce surprises.
Q: Backup testing matters because?
A: A backup only matters if it restores.
Operations Mindset
Q: Production work ends when deployment finishes?
A: No.
Q: What should be monitored continuously?
A: Errors, latency, cost, usage.
Q: What should every incident end with?
A: A review or post-mortem.
Q: What should every release have?
A: A rollback path.
Practice questions
Domain 5: Practice Questions
Implementation & Operations
Question 1
You changed a utility that formats API output. Which test type should prove the formatter works alone?
A) End-to-end B) Integration C) Unit D) Load
Answer: C Explanation: Unit tests verify isolated behavior without depending on the full system.
Question 2
Your app needs confidence before shipping a prompt update. What is the safest deployment sequence?
A) Production first, then test B) Staging, verify, then production C) Skip staging and hope for the best D) Increase temperature
Answer: B Explanation: Staging is the safer rehearsal step before production traffic sees the change.
Question 3
What metric best describes how fast a user sees the first streamed token?
A) Total runtime B) Time to first token C) Cache hit ratio D) Error budget
Answer: B Explanation: Time to first token is a key user-experience metric for streaming systems.
Question 4
Which rollout strategy gives the easiest instant rollback?
A) Blue-green deployment B) Manual code copy C) Direct production overwrite D) Rebuild from scratch
Answer: A Explanation: Blue-green keeps two environments so traffic can switch back quickly if needed.
Question 5
You want to release a risky feature gradually. What tool is most appropriate?
A) Feature flags B) Tokenization C) Prompt caching D) Hardcoding
Answer: A Explanation: Feature flags let you control exposure and roll back without a full redeploy.
Question 6
Which monitoring signal most directly suggests the user experience is degrading even if answers are correct?
A) Time to first token B) Source code line count C) Model name length D) README size
Answer: A Explanation: Response latency and first-token timing are major user experience signals.
Question 7
Which choice is the best first step when an integration feels expensive?
A) Increase max_tokens B) Check model choice, prompt size, and caching opportunities C) Turn off logging D) Add more retries only
Answer: B Explanation: Cost optimization starts with usage patterns, not just infrastructure.
Question 8
What does “pinning” a model version help prevent?
A) Better prompts B) Unexpected behavior changes from automatic upgrades C) API key leaks D) Rate limiting
Answer: B Explanation: Version pinning makes behavior more predictable.
Question 9
What is the first step in debugging a production issue?
A) Reproduce it consistently B) Rewrite the docs C) Increase the temperature D) Switch to a new architecture
Answer: A Explanation: Reproduction is the foundation of useful debugging.
Question 10
Which practice belongs to disaster recovery planning?
A) Ignoring backups B) Testing restores regularly C) Removing monitoring D) Deleting runbooks
Answer: B Explanation: Recovery only matters if restores are tested and verified.