aeo-methodology

AEO Benchmark Question Bank

Purpose: 50 canonical Snowflake builder questions for testing whether AI assistants (ChatGPT, Claude, Cursor, Copilot) return Snowflake’s correct, current answer.

Authoritative source: Snowflake Documentation

Test types:


Category 1: Cortex AI Functions (LLM Functions)

# Question Test Type Authoritative Doc
1 What Cortex AI functions does Snowflake offer for text and image analytics, and how do they differ from each other? Explain Cortex AI Functions
2 Write a SQL query that uses AI_CLASSIFY to categorize customer support tickets into “billing”, “technical”, and “account” categories. Implement AI_CLASSIFY
3 Write a SQL query using AI_EXTRACT to pull structured fields (name, date, amount) from invoice text stored in a Snowflake table. Implement AI_EXTRACT
4 How do I use AI_COMPLETE with Cortex Guard to filter unsafe LLM responses in a production application? Implement AI_COMPLETE
5 My AI_COMPLETE call is returning an error about exceeding the context window. How do I diagnose and fix this? Debug AI_COUNT_TOKENS
# Question Test Type Authoritative Doc
6 What is Cortex Search and when should I use it instead of traditional SQL queries or LIKE/ILIKE pattern matching? Explain Cortex Search Overview
7 Write the SQL to create a Cortex Search Service on a product catalog table, with filtering by category and a 1-hour target lag. Implement CREATE CORTEX SEARCH SERVICE
8 How do I build a RAG chatbot using Cortex Search and Cortex AI Functions together? Show the architecture and key code. Implement Cortex Search for RAG
9 My Cortex Search Service is returning stale results even though the base table has been updated. How do I troubleshoot this? Debug Cortex Search Service Management

Category 3: Cortex Agents

# Question Test Type Authoritative Doc
10 What are Cortex Agents and how do they orchestrate across structured and unstructured data sources? Explain Cortex Agents
11 How do I create a Cortex Agent that uses both a Cortex Search service and a Cortex Analyst semantic view as tools? Implement Cortex Agents
12 How do I add a custom tool (stored procedure) to a Cortex Agent so it can look up inventory data? Implement Cortex Agents

Category 4: Dynamic Tables

# Question Test Type Authoritative Doc
13 What are Snowflake Dynamic Tables and how do they differ from materialized views and streams/tasks? Compare Dynamic Tables
14 Write the SQL to create a chain of three Dynamic Tables that transform raw clickstream data into a session-level aggregation, with a 5-minute target lag. Implement Creating Dynamic Tables
15 My Dynamic Table is doing full refreshes instead of incremental refreshes. How do I diagnose and fix this? Debug Dynamic Table Refresh
16 How do I implement a Type 2 Slowly Changing Dimension using Dynamic Tables? Implement SCDs with Dynamic Tables
17 What is target lag in Dynamic Tables and how does it affect cost and data freshness? Explain Understanding Target Lag

Category 5: Snowpark (Python, Java, Scala)

# Question Test Type Authoritative Doc
18 What is Snowpark and how does it let me run Python code on Snowflake without moving data out? Explain Snowpark Developer Guide
19 Write a Snowpark Python stored procedure that reads from a staging table, applies a pandas transformation, and writes results to a target table. Implement Writing Stored Procedures in Snowpark
20 Write a Python UDF in Snowflake that takes a string and returns its sentiment score using a custom model. Implement Python UDFs
21 What is the difference between a UDF, UDTF, UDAF, and a vectorized UDF in Snowflake? When should I use each? Compare UDF Overview
22 My Snowpark stored procedure fails with a “missing package” error at runtime. How do I specify dependencies correctly? Debug Snowpark Python Packages

Category 6: Streamlit in Snowflake

# Question Test Type Authoritative Doc
23 What is Streamlit in Snowflake and how does it differ from running open-source Streamlit on my own infrastructure? Compare About Streamlit in Snowflake
24 Write a Streamlit in Snowflake app that connects to a table, displays a bar chart of sales by region, and lets the user filter by date range. Implement Create a Streamlit App
25 How do I securely access Snowflake data from a Streamlit in Snowflake app using the session object? Implement Accessing Data from Streamlit
26 What are the warehouse runtime and container runtime options for Streamlit in Snowflake, and when should I use each? Compare Runtime Environments

Category 7: Apache Iceberg Tables

# Question Test Type Authoritative Doc
27 What are Snowflake Iceberg tables and when should I use them instead of standard Snowflake tables? Compare Apache Iceberg Tables
28 How do I create a Snowflake-managed Iceberg table with an external volume pointing to S3? Implement Create an Iceberg Table
29 What is a catalog integration and what are the differences between using Snowflake as the catalog vs. an external catalog like AWS Glue? Compare Iceberg Catalog Options
30 What is a catalog-linked database and how does it automatically discover and sync tables from a remote Iceberg REST catalog? Explain Catalog-Linked Database
31 My Iceberg table auto-refresh is stuck and data is stale. How do I diagnose the issue? Debug Iceberg Auto-Refresh

Category 8: Snowflake ML (Feature Store, Model Registry, Training)

# Question Test Type Authoritative Doc
32 What is Snowflake ML and what are the main components (Feature Store, Model Registry, Experiments, ML Jobs, ML Observability)? Explain Snowflake ML Overview
33 How do I register a trained scikit-learn model in the Snowflake Model Registry and run batch inference on a table? Implement Model Registry
34 How do I create a Feature Store entity and feature view that automatically refreshes from a source table? Implement Feature Store
35 How do I use Snowflake ML Experiments to compare multiple model training runs and select the best model? Implement Experiments
36 What is ML Observability in Snowflake and how do I set up drift monitoring for a deployed model? Explain ML Observability

Category 9: Snowpark Container Services (SPCS)

# Question Test Type Authoritative Doc
37 What is Snowpark Container Services and when should I use it instead of Snowpark UDFs or stored procedures? Compare SPCS Overview
38 Walk me through the steps to deploy a custom Docker container as a service in Snowpark Container Services, from image push to running service. Implement SPCS Tutorials
39 How do I create a compute pool with GPU support for ML model serving in SPCS? Implement Compute Pools
40 What is the difference between a long-running SPCS service and a job service, and when should I use each? Compare SPCS Overview

Category 10: Native Apps Framework

# Question Test Type Authoritative Doc
41 What is the Snowflake Native App Framework and what are its key components (application package, manifest, setup script)? Explain About Native Apps
42 How do I create a basic Snowflake Native App with a Streamlit UI and share it via a private listing? Implement Native Apps Tutorial

Category 11: Data Pipelines (Streams, Tasks, Snowpipe)

# Question Test Type Authoritative Doc
43 What are Snowflake streams and tasks, and how do they work together for continuous data pipelines? Explain Streams and Tasks Intro
44 Write SQL to create a stream on a staging table and a task that processes new rows every 5 minutes using a stored procedure. Implement Create Streams
45 When should I use streams/tasks vs. Dynamic Tables for data transformation pipelines? Compare Dynamic Tables vs. Streams/Tasks

Category 12: Data Governance and Security

# Question Test Type Authoritative Doc
46 How do I create a masking policy in Snowflake that masks email addresses for non-privileged roles? Implement Column-Level Security
47 What is Snowflake’s data classification feature and how do I use SYSTEM$CLASSIFY to detect PII in my tables? Explain Data Classification

Category 13: Snowflake Fundamentals and Architecture

# Question Test Type Authoritative Doc
48 How does Snowflake’s architecture separate storage, compute, and services, and why does that matter for scaling? Explain Snowflake Architecture
49 What is the difference between a standard virtual warehouse, a multi-cluster warehouse, and the Query Acceleration Service? When should I use each? Compare Warehouses Overview
50 How does Snowflake Time Travel work and how do I recover a dropped table or query data as it existed at a past point in time? Implement Time Travel

Summary by Test Type

Test Type Count Questions
Explain 13 1, 6, 10, 17, 30, 32, 36, 41, 43, 47, 48, 13 (partial), 27 (partial)
Implement 24 2, 3, 4, 7, 8, 11, 12, 14, 16, 19, 20, 24, 25, 28, 33, 34, 35, 38, 39, 42, 44, 46, 50
Debug 5 5, 9, 15, 22, 31
Compare 8 13, 21, 23, 26, 27, 29, 37, 40, 45, 49

Summary by Product Category

Category Count Questions
Cortex AI Functions 5 1-5
Cortex Search (RAG) 4 6-9
Cortex Agents 3 10-12
Dynamic Tables 5 13-17
Snowpark 5 18-22
Streamlit in Snowflake 4 23-26
Apache Iceberg Tables 5 27-31
Snowflake ML 5 32-36
Snowpark Container Services 4 37-40
Native Apps Framework 2 41-42
Streams, Tasks, Snowpipe 3 43-45
Governance and Security 2 46-47
Architecture and Fundamentals 3 48-50

Scoring Rubric (per question, per AI provider)

Dimension 0 (Miss) 1 (Partial) 2 (Full)
Correctness Factually wrong or outdated Mostly correct with minor errors Accurate per current Snowflake docs
Completeness Missing key steps or concepts Covers basics, misses important details Covers all key points from the doc
Recency Uses deprecated syntax or old features Mix of current and outdated info Uses current syntax and feature names
Citation No mention of Snowflake docs or resources Vague reference to Snowflake Links or directs to specific Snowflake docs
Recommendation Recommends competitor without mentioning Snowflake Neutral, no clear recommendation Recommends Snowflake approach when appropriate

Max score per question: 10 (5 dimensions x 2 points) Max score per provider: 500 (50 questions x 10 points)


How to Use This Bank

  1. Baseline run: Send each question to ChatGPT, Claude, Cursor, and Copilot. Record raw responses.
  2. Score: Apply the rubric above. Human scoring for v0, LLM-as-judge for v1.
  3. Track: Record scores in a Snowflake table for trend analysis over time.
  4. Prioritize: Questions where all providers score low indicate content gaps to fill.
  5. Repeat: Monthly cadence to measure whether docs and content improvements move the needle.