Executive summary of the AEO 2^4 factorial experiment.
| Rank | Run | Configuration | Engine | Score % | MH % |
|---|---|---|---|---|---|
| 1 | 4 | cite, agentic | cortex-code | 93.8% | 91.5% |
| 2 | 7 | cite, agentic, selfcritique | cortex-code | 93.2% | 93.5% |
| 3 | 10 | domain, cite, agentic | cortex-code | 76.0% | 90.5% |
| 4 | 8 | domain, cite, agentic, selfcritique | cortex-code | 73.0% | 91.2% |
| 5 | 3 | agentic | cortex-code | 72.2% | 93.5% |
| 6 | 2 | domain | claude | 71.5% | 63.5% |
| 7 | 5 | domain, cite | claude | 71.5% | 54.5% |
| 8 | 16 | domain, agentic, selfcritique | cortex-code | 71.2% | 88.7% |
| 9 | 6 | cite | claude | 71.1% | 48.5% |
| 10 | 15 | agentic, selfcritique | cortex-code | 70.8% | 88.2% |
| 11 | 9 | domain, agentic | cortex-code | 70.4% | 89.8% |
| 12 | 14 | domain, cite, selfcritique | claude | 67.4% | 69.3% |
| 13 | 13 | cite, selfcritique | claude | 65.7% | 60.7% |
| 14 | 12 | domain, selfcritique | claude | 62.0% | 69.0% |
| 15 | 1 | (baseline) | claude | 60.9% | 68.5% |
| 16 | 11 | selfcritique | claude | 56.9% | 66.5% |
Best run (Run 4, 93.8% overall):
| Category | Score % |
|---|---|
| Snowpark Container Services | 100.0% |
| Streams, Tasks, Snowpipe | 100.0% |
| Snowpark | 99.3% |
| Native Apps Framework | 98.4% |
| Architecture and Fundamentals | 97.8% |
| Dynamic Tables | 96.7% |
| Apache Iceberg Tables | 96.7% |
| Governance and Security | 96.6% |
| Cortex Search (RAG) | 94.1% |
| Cortex AI Functions | 91.3% |
| Snowflake ML | 90.0% |
| Streamlit in Snowflake | 88.3% |
| Cortex Agents | 68.9% |
Worst run (Run 11, 56.9% overall):
| Category | Score % |
|---|---|
| Streams, Tasks, Snowpipe | 70.0% |
| Snowpark Container Services | 68.3% |
| Cortex Search (RAG) | 62.5% |
| Architecture and Fundamentals | 62.2% |
| Dynamic Tables | 62.0% |
| Governance and Security | 61.7% |
| Snowpark | 58.7% |
| Streamlit in Snowflake | 57.9% |
| Apache Iceberg Tables | 55.3% |
| Native Apps Framework | 55.0% |
| Cortex Agents | 47.2% |
| Snowflake ML | 46.0% |
| Cortex AI Functions | 42.0% |
5 Hardest Questions:
| Q# | Category | Test Type | Avg Score | Avg % |
|---|---|---|---|---|
| 11 | Cortex Agents | Implement | 4.7/10 | 46.6% |
| 12 | Cortex Agents | Implement | 5.0/10 | 49.8% |
| 35 | Snowflake ML | Implement | 5.9/10 | 58.5% |
| 5 | Cortex AI Functions | Debug | 6.0/10 | 60.0% |
| 21 | Snowpark | Compare | 6.0/10 | 60.2% |
5 Easiest Questions:
| Q# | Category | Test Type | Avg Score | Avg % |
|---|---|---|---|---|
| 45 | Streams, Tasks, Snowpipe | Compare | 8.1/10 | 81.3% |
| 8 | Cortex Search (RAG) | Implement | 8.2/10 | 81.7% |
| 17 | Dynamic Tables | Explain | 8.4/10 | 83.5% |
| 18 | Snowpark | Explain | 8.4/10 | 83.6% |
| 6 | Cortex Search (RAG) | Explain | 8.6/10 | 85.6% |
See results-by-factor.md for detailed factor analysis.