aeo-methodology

Results by Execution Engine

Comparison of claude (single CORTEX.COMPLETE API call) vs cortex-code (Cortex Code subagent with tool access).

Aggregate Comparison

Engine Runs Avg Score % Avg MH %
claude 8 65.9% 62.6%
cortex-code 8 77.6% 90.9%

All Runs by Engine

claude (CORTEX.COMPLETE)

| Run | Configuration | Score % | MH % | |—-:|—————|——–:|—–:| | 1 | baseline | 60.9% | 68.5% |

Category Comparison: Best claude (Run 2) vs Best cortex-code (Run 4)

| 5 | domain-cite | 71.5% | 54.5% | | Category | claude R2 | cortex-code R4 | Delta | | 11 | selfcritique | 56.9% | 66.5% | | 12 | domain-selfcritique | 62.0% | 69.0% | | 13 | cite-selfcritique | 65.7% | 60.7% | | 14 | domain-cite-selfcritique | 67.4% | 69.3% |

cortex-code (Agentic Subagent)

Run Configuration Score % MH %
3 agentic 72.2% 93.5%
4 cite-agentic 93.8% 91.5%
7 cite-agentic-selfcritique 93.2% 93.5%
8 all4 73.0% 91.2%
9 domain-agentic 70.4% 89.8%
10 domain-cite-agentic 76.0% 90.5%
15 agentic-selfcritique 70.8% 88.2%
16 domain-agentic-selfcritique 71.2% 88.7%

Category Comparison: Best claude (Run 5) vs Best cortex-code (Run 4)

Category claude R5 cortex-code R4 Delta
Cortex AI Functions 80.7% 91.3% +10.6pp
Cortex Search (RAG) 80.0% 94.1% +14.1pp
Cortex Agents 71.1% 68.9% -2.2pp
Dynamic Tables 91.3% 96.7% +5.4pp
Snowpark 58.0% 99.3% +41.3pp
Streamlit in Snowflake 74.2% 88.3% +14.1pp
Apache Iceberg Tables 63.4% 96.7% +33.3pp
Snowflake ML 83.3% 90.0% +6.7pp
Snowpark Container Services 63.3% 100.0% +36.7pp
Native Apps Framework 50.0% 98.4% +48.4pp
Streams, Tasks, Snowpipe 68.9% 100.0% +31.1pp
Governance and Security 78.3% 96.6% +18.3pp
Architecture and Fundamentals 48.8% 97.8% +49.0pp