Stock responses about statistical significance for reviewing machine learning papers
So many ML papers contain tables like
Method | Score(↑) |
---|---|
Baseline 1 | 49.9% |
Baseline 2 | 49.8% |
Baseline 3 | 50.0% |
Our super fancy SOTA method | 50.1% |
then say "results on the benchmark show that our method is state-of-the-art for task X."