目次

「統計学的に有意」なんてやめよう

Nature2019 で強調されている点

"Rather, and in line with many others over the decades, we are calling for a stop to the use of P values in the conventional, dichotomous way — to decide whether a result refutes or supports a scientific hypothesis."

有意かどうかのカテゴライズをやめよう

Confidence Interval なんてやめよう

We must learn to embrace uncertainty. One practical way to do so is to rename confidence intervals as ‘compatibility intervals’ and interpret them in a way that avoids overconfidence. Specifically, we recommend that authors describe the practical implications of all values inside the interval, especially the observed effect (or point estimate) and the limits. In doing so, they should remember that all the values between the interval’s limits are reasonably compatible with the data, given the statistical assumptions used to compute the interval. Therefore, singling out one particular value (such as the null value) in the interval as ‘shown’ makes no sense.

Compatible Intervals

CIを解釈する4つのポイント

  1. First, just because the interval gives the values most compatible with the data, given the assumptions, it doesn’t mean values outside it are incompatible; they are just less compatible. In fact, values just outside the interval do not differ substantively from those just inside the interval. It is thus wrong to claim that an interval shows all possible values.
    • その区間は,仮定に基づきデータに最も適合する値を与えるが,区間外の値が「適合しない」incompatible ことを意味するのではなく,「適合性が低い」 less compatible だけである.実際,区間のすぐ外側の値は区間のすぐ内側の値と実質的な差はない。したがって区間がすべての可能な値を示していると主張するのは間違いである。
  2. Second, not all values inside are equally compatible with the data, given the assumptions. The point estimate is the most compatible, and values near it are more compatible than those near the limits. This is why we urge authors to discuss the point estimate, even when they have a large P value or a wide interval, as well as discussing the limits of that interval.
    • 前提条件を考えれば,内部のすべての値がデータと等しく適合するわけではない.点推定値が最も適合性が高く most compatible,その近傍の値は限界値近傍の値よりも適合性が高い.このため論文著者には,P値が大きい場合や信頼区間(両立性区間)が広い場合でも,その区間の限界値を議論するだけでなく,点推定値について議論するよう強く勧めている.
  3. Third, like the 0.05 threshold from which it came, the default 95% used to compute intervals is itself an arbitrary convention. It is based on the false idea that there is a 95% chance that the computed interval itself contains the true value, coupled with the vague feeling that this is a basis for a confident decision. A different level can be justified, depending on the application.
    • 0.05のしきい値と同様に,信頼区間(両立性区間)を計算するために使われるデフォルトの95%は,それ自体が恣意的な慣例 arbitrary conventionである.これは,計算された区間自体が真の値を含む95%の確率があるという誤った考えと,これが自信のある決定の根拠であるという漠然とした感覚に基づいている.用途によっては,異なる水準が正当な場合もあるだろう.
  4. Last, and most important of all, be humble: compatibility assessments hinge on the correctness of the statistical assumptions used to compute the interval. In practice, these assumptions are at best subject to considerable uncertainty. Make these assumptions as clear as possible and test the ones you can, for example by plotting your data and by fitting alternative models, and then reporting all results.
    • 最も重要なことは,謙虚になることである.互換性 Compatibility の評価は,区間を計算するために使用した統計的仮定の正しさに依存している.そして実際これらの仮定は,良くてもかなりの不確実性に左右される.これらの仮定をできるだけ明確にし,可能なものは検査しよう.例えばデータをプロットしたり,代替モデルを当てはめたりして,すべての結果を報告しよう.

有意差を主張するのをやめることへの批判

The objection we hear most against retiring statistical significance is that it is needed to make yes-or-no decisions. But for the choices often required in regulatory, policy and business environments, decisions based on the costs, benefits and likelihoods of all potential consequences always beat those made based solely on statistical significance.

検定の教育は今後も重要

1)
ASA Statement on Statistical Significance and P-Values DOI邦訳
2)
The American Statistician, Volume 73, Issue sup1 (2019) Link
3)
Ronald L. Wasserstein, Allen L. Schirm & Nicole A. Lazar (2019) Moving to a World Beyond “p<0.05”, The American Statistician, 73:sup1, 1-19 DOI
4)
Stuart H. Hurlbert, Richard A. Levine & Jessica Utts (2019) Coup de Grâce for a Tough Old Bull: “Statistically Significant” Expires,The American Statistician, 73:sup1, 352-357 DOI
5) , 7)
Nature 567, 305-307 (2019) DOI
6)
Amrhein, V., & Greenland, S. (2022). Discuss practical importance of results based on interval estimates and p-value functions, not only on point estimates and null p-values. Journal of Information Technology, 37(3), 316–320. DOI
8)
Modern Epidemiology [Rothman et al., 2008]
9)
佐藤 俊哉, ASA声明と疫学研究におけるP値, 計量生物学, 2017-2018, 38 巻, 2 号, p. 109-115 J-Stage