Results by Factor
Analysis of the 4 binary factors in the 2^4 factorial design.
Main Effects (Average Impact of Turning Each Factor ON)
Computed as: average score across all runs where factor=ON minus average where factor=OFF.
| Factor |
Avg Score Effect |
Avg MH Effect |
| Domain Prompt |
-2.7pp |
+0.7pp |
| Citation |
+9.5pp |
-3.5pp |
| Agentic |
+11.7pp |
+28.3pp |
| Self-Critique |
-3.4pp |
+3.4pp |
Domain Prompt Effect (Paired Comparisons)
| Without Domain Prompt |
With Domain Prompt |
Score Effect |
MH Effect |
| Run 1 (60.9%) |
Run 2 (71.5%) |
+10.6pp |
-5.0pp |
| Run 6 (71.1%) |
Run 5 (71.5%) |
+0.4pp |
+6.0pp |
| Run 11 (56.9%) |
Run 12 (62.0%) |
+5.1pp |
+2.5pp |
| Run 13 (65.7%) |
Run 14 (67.4%) |
+1.7pp |
+8.6pp |
| Run 3 (72.2%) |
Run 9 (70.4%) |
-1.8pp |
-3.7pp |
| Run 4 (93.8%) |
Run 10 (76.0%) |
-17.8pp |
-1.0pp |
| Run 15 (70.8%) |
Run 16 (71.2%) |
+0.4pp |
+0.5pp |
| Run 7 (93.2%) |
Run 8 (73.0%) |
-20.2pp |
-2.3pp |
| Average |
|
-2.7pp |
+0.7pp |
Citation Effect (Paired Comparisons)
| Without Citation |
With Citation |
Score Effect |
MH Effect |
| Run 1 (60.9%) |
Run 6 (71.1%) |
+10.2pp |
-20.0pp |
| Run 2 (71.5%) |
Run 5 (71.5%) |
+0.0pp |
-9.0pp |
| Run 11 (56.9%) |
Run 13 (65.7%) |
+8.8pp |
-5.8pp |
| Run 12 (62.0%) |
Run 14 (67.4%) |
+5.4pp |
+0.3pp |
| Run 3 (72.2%) |
Run 4 (93.8%) |
+21.6pp |
-2.0pp |
| Run 9 (70.4%) |
Run 10 (76.0%) |
+5.6pp |
+0.7pp |
| Run 15 (70.8%) |
Run 7 (93.2%) |
+22.4pp |
+5.3pp |
| Run 16 (71.2%) |
Run 8 (73.0%) |
+1.8pp |
+2.5pp |
| Average |
|
+9.5pp |
-3.5pp |
Agentic Effect (Paired Comparisons)
| Without Agentic |
With Agentic |
Score Effect |
MH Effect |
| Run 1 (60.9%) |
Run 3 (72.2%) |
+11.3pp |
+25.0pp |
| Run 2 (71.5%) |
Run 9 (70.4%) |
-1.1pp |
+26.3pp |
| Run 6 (71.1%) |
Run 4 (93.8%) |
+22.7pp |
+43.0pp |
| Run 5 (71.5%) |
Run 10 (76.0%) |
+4.5pp |
+36.0pp |
| Run 11 (56.9%) |
Run 15 (70.8%) |
+13.9pp |
+21.7pp |
| Run 12 (62.0%) |
Run 16 (71.2%) |
+9.2pp |
+19.7pp |
| Run 13 (65.7%) |
Run 7 (93.2%) |
+27.5pp |
+32.8pp |
| Run 14 (67.4%) |
Run 8 (73.0%) |
+5.6pp |
+21.9pp |
| Average |
|
+11.7pp |
+28.3pp |
Self-Critique Effect (Paired Comparisons)
| Without Self-Critique |
With Self-Critique |
Score Effect |
MH Effect |
| Run 1 (60.9%) |
Run 11 (56.9%) |
-4.0pp |
-2.0pp |
| Run 2 (71.5%) |
Run 12 (62.0%) |
-9.5pp |
+5.5pp |
| Run 6 (71.1%) |
Run 13 (65.7%) |
-5.4pp |
+12.2pp |
| Run 5 (71.5%) |
Run 14 (67.4%) |
-4.1pp |
+14.8pp |
| Run 3 (72.2%) |
Run 15 (70.8%) |
-1.4pp |
-5.3pp |
| Run 9 (70.4%) |
Run 16 (71.2%) |
+0.8pp |
-1.1pp |
| Run 4 (93.8%) |
Run 7 (93.2%) |
-0.6pp |
+2.0pp |
| Run 10 (76.0%) |
Run 8 (73.0%) |
-3.0pp |
+0.7pp |
| Average |
|
-3.4pp |
+3.3pp |
Two-Way Interaction Effects
Interaction = effect of factor A when B=ON minus effect of A when B=OFF.
Large positive = factors amplify each other. Large negative = factors interfere.
| Interaction |
Score Effect |
MH Effect |
| Domain Prompt x Citation |
-12.5pp |
+4.2pp |
| Domain Prompt x Agentic |
-14.3pp |
-4.7pp |
| Domain Prompt x Self-Critique |
-1.1pp |
+3.2pp |
| Citation x Agentic |
+6.7pp |
+10.2pp |
| Citation x Self-Critique |
+0.2pp |
+8.2pp |
| Agentic x Self-Critique |
+4.7pp |
-8.6pp |