In other words, set a timer at say 10 minutes an run your rotation with an instant cast methodology. Stop at 10 minutes precise. Then run your rotation again with a delay cast methodology for 10 minutes precise. Look at logs to see the stats for #1 and #2 above. Do this over multiple 10 minute iterations with both methodologies to increase sample size. Post results and theory from study.
As I said, test dummy results are (at the very, very best) still very weak in terms of proof. I feel like I just keep making lists, but:
1) Sample Size. If I run Simcraft and have it do 1,000 iterations of 6 minute fights, I still see large variations in dps. That's 100 hours of testing. I usually go to 10k (1,000 hours) to get consistent results. Even 10 hours on a test dummy isn't very significant compared to this.
2) Uncontrolled variables - if a mage runs up and hits the test dummy for 2 minutes near the end of a test and gives you a crit boost, do you throw out the results?
3) Human Error - how do you account for the fact that you will make inconsistent mistakes? What about the fact that your perception (including exact timers on cooldowns) may differ from attempt to attempt? As you go longer do you get tired and introduce additional error, and if so how do you correct for this?
4) Raid dps scaling - abilities scale differently with raid buffs and consumables. An unbuffed moonkin's damage profile will look very different when compared to a fully raid-buffed moonkin. This makes it very difficult to compare abilities.
If mathematical models show it I'd be much more inclined to believe those. Same with Simcraft if someone can get it to simulate a delay rotation like that.