aeo-methodology

Results by Factor

Analysis of the 4 binary factors in the 2^4 factorial design.

Main Effects (Average Impact of Turning Each Factor ON)

Computed as: average score across all runs where factor=ON minus average where factor=OFF.

Factor Avg Score Effect Avg MH Effect
Domain Prompt -2.7pp +0.7pp
Citation +9.5pp -3.5pp
Agentic +11.7pp +28.3pp
Self-Critique -3.4pp +3.4pp

Domain Prompt Effect (Paired Comparisons)

Without Domain Prompt With Domain Prompt Score Effect MH Effect
Run 1 (60.9%) Run 2 (71.5%) +10.6pp -5.0pp
Run 6 (71.1%) Run 5 (71.5%) +0.4pp +6.0pp
Run 11 (56.9%) Run 12 (62.0%) +5.1pp +2.5pp
Run 13 (65.7%) Run 14 (67.4%) +1.7pp +8.6pp
Run 3 (72.2%) Run 9 (70.4%) -1.8pp -3.7pp
Run 4 (93.8%) Run 10 (76.0%) -17.8pp -1.0pp
Run 15 (70.8%) Run 16 (71.2%) +0.4pp +0.5pp
Run 7 (93.2%) Run 8 (73.0%) -20.2pp -2.3pp
Average   -2.7pp +0.7pp

Citation Effect (Paired Comparisons)

Without Citation With Citation Score Effect MH Effect
Run 1 (60.9%) Run 6 (71.1%) +10.2pp -20.0pp
Run 2 (71.5%) Run 5 (71.5%) +0.0pp -9.0pp
Run 11 (56.9%) Run 13 (65.7%) +8.8pp -5.8pp
Run 12 (62.0%) Run 14 (67.4%) +5.4pp +0.3pp
Run 3 (72.2%) Run 4 (93.8%) +21.6pp -2.0pp
Run 9 (70.4%) Run 10 (76.0%) +5.6pp +0.7pp
Run 15 (70.8%) Run 7 (93.2%) +22.4pp +5.3pp
Run 16 (71.2%) Run 8 (73.0%) +1.8pp +2.5pp
Average   +9.5pp -3.5pp

Agentic Effect (Paired Comparisons)

Without Agentic With Agentic Score Effect MH Effect
Run 1 (60.9%) Run 3 (72.2%) +11.3pp +25.0pp
Run 2 (71.5%) Run 9 (70.4%) -1.1pp +26.3pp
Run 6 (71.1%) Run 4 (93.8%) +22.7pp +43.0pp
Run 5 (71.5%) Run 10 (76.0%) +4.5pp +36.0pp
Run 11 (56.9%) Run 15 (70.8%) +13.9pp +21.7pp
Run 12 (62.0%) Run 16 (71.2%) +9.2pp +19.7pp
Run 13 (65.7%) Run 7 (93.2%) +27.5pp +32.8pp
Run 14 (67.4%) Run 8 (73.0%) +5.6pp +21.9pp
Average   +11.7pp +28.3pp

Self-Critique Effect (Paired Comparisons)

Without Self-Critique With Self-Critique Score Effect MH Effect
Run 1 (60.9%) Run 11 (56.9%) -4.0pp -2.0pp
Run 2 (71.5%) Run 12 (62.0%) -9.5pp +5.5pp
Run 6 (71.1%) Run 13 (65.7%) -5.4pp +12.2pp
Run 5 (71.5%) Run 14 (67.4%) -4.1pp +14.8pp
Run 3 (72.2%) Run 15 (70.8%) -1.4pp -5.3pp
Run 9 (70.4%) Run 16 (71.2%) +0.8pp -1.1pp
Run 4 (93.8%) Run 7 (93.2%) -0.6pp +2.0pp
Run 10 (76.0%) Run 8 (73.0%) -3.0pp +0.7pp
Average   -3.4pp +3.3pp

Two-Way Interaction Effects

Interaction = effect of factor A when B=ON minus effect of A when B=OFF. Large positive = factors amplify each other. Large negative = factors interfere.

Interaction Score Effect MH Effect
Domain Prompt x Citation -12.5pp +4.2pp
Domain Prompt x Agentic -14.3pp -4.7pp
Domain Prompt x Self-Critique -1.1pp +3.2pp
Citation x Agentic +6.7pp +10.2pp
Citation x Self-Critique +0.2pp +8.2pp
Agentic x Self-Critique +4.7pp -8.6pp