Significance Tests for Event Studies
Event studies are concerned with the question of whether abnormal returns on an event date or, more generally, during a window around an event date (called the event window) are unusually large (in magnitude). To answer this question one carries out a formal hypothesis test where the null hypothesis specifies that the expected value of a certain random variable is zero; if the null hypothesis is rejected, one concludes that the event had an ‘impact’. It is customary in the literature to use two-sided tests, which specify as alternative hypothesis that the expected value is different from zero (as opposed to larger, or smaller, than zero). We follow this convention.
If there is only one instance under study, the random variable is the abnormal return on the event day itself (AR) or, more generally, the cumulative abnormal return during the event window (CAR). If there are multiple instances under study, the respective quantities are averaged across instances. Thus, the random variable is the average abnormal return on the respective event day (AAR) or the average cumulative abnormal return during the respective event window, which can alternatively be expressed as the cumulative average abnormal return (CAAR).
In terms of terminology, by an instance we mean a given event for a given firm. In the case of multiple instances, there are two possibilities: (i) a given event (type), such as inclusion to an index or a merger, for multiple firms or (ii) multiple repetitions of a given event (type) for a given firm. An example of the first possibility would be studying the effect of being included in the S&P500 index for multiple firms; an example of the second possibility would be studying the effect of mergers for a given firm. In terms of the statistical methodology, both possibilities are handled in the same way.
For the computation of the abnormal return of firm i on day t, denoted by ARi,t we refer the user to the introduction. In case one considers more than one instance, let N denote the number of instances considered and define
AARt=1NN∑i=1ARi,t
CARi=T2∑t=T1+1ARi,t
CAAR=1NN∑i=1CARi
The literature on event-study hypothesis testing covers a wide range of tests. Generally, significance tests can be classified into parametric and nonparametric tests. Parametric tests (at least in the field of event studies) assume that the individual firm's abnormal returns are normally distributed, whereas nonparametric tests do not rely on any such assumption. Applied researchers typically carry out both parametric and nonparametric tests to verify that the research findings are not driven by non-normal returns or outliers, which tend to affect the results of parametric tests but not the results of nonparametric tests; for example, see Schipper and Smith (1983).
Table 1 lists the various tests according to the null hypothesis for which they can be used. Table 2 lists them by their name and presents strengths and weaknesses compiled from Kolari and Pynnonen (2011).
H0:E(AR)=0 | T Test | Permutation Test | Single Instance |
H0:E(AAR)=0 | Cross-Sectional Test, Time-Series Standard Deviation Test, Patell Test, Adjusted Patell Test, Standardized Cross-Sectional Test, Adjusted Standardized Cross-Sectional Test, and Skewness Corrected Test | Generalized Sign Test, Generalized Rank T Test, Generalized Rank Z Test, and Wilcoxon Test | Multiple Instances |
H0:E(CAR)=0 | T Test | Permutation Test | Single Instance |
H0:E(CAAR)=0 | Cross-Sectional Test, Time-Series Standard Deviation Test, Patell Test, Adjusted Patell Test, Standardized Cross-Sectional Test, Adjusted Standardized Cross-Sectional Test, and Skewness Corrected Test | Generalized Sign Test, Generalized Rank T Test, and Generalized Rank Z Test | Multiple Instances |
# | Name | Key Reference | EST Abbreviation | Strengths and Weaknesses |
---|---|---|---|---|
1 | T Test |
|
||
2 | Cross-Sectional Test | CSect T | ||
3 | Time-Series Standard Deviation Test | CDA T | ||
4 | Patell Test | Patell (1976) | Patell Z |
|
5 | Adjusted Patell Test | Kolari and Pynnönen (2010) | Adjusted Patell Z |
|
6 | Standardized Cross-Sectional Test | Boehmer, Musumeci and Poulsen (1991) | StdCSect Z |
|
7 | Adjusted Standardized Cross-Section Test | Kolari and Pynnönen (2010) | Adjusted StdCSect Z |
|
8 | Skewness Corrected Test | Hall (1992) | Skewness-Corrected T |
|
9 | Jackknife Test | Giaccotto and Sfiridis (1996) | Jackknife T | |
10 | Corrado Rank Test | Corrado and Zivney (1992) | Rank Z |
|
11 | Generalized Rank Test | Kolari and Pynnönen (2011) | Generalized Rank T |
|
12 | Generalized Rank Test | Kolari and Pynnönen (2011) | Generalized Rank Z |
|
13 | Sign Test | Cowan (1992) | Sign Z |
|
14 | Cowan Generalized Sign Test | Cowan (1992) | Generalized Sign Z | |
15 | Wilcoxon signed-rank Test | Wilcoxon (1945) | Wilcoxon |
|
16 | Permutation Test | Nguyen and Wolf (2023) | Permutation |
|
In describing the formulas for the test statistics and their (approx.) distributions under the null, which are used to compute p-values, we follow the order in Table 2.
Some Preliminaries
The estimation window is given by {T0,…,T1} and thus has length L1=T1−T0+1. The event window is given by {T1+1,…,T2} and thus has length L2=T2−T1. This convention implies that the estimation window ends immediately before the event window. We will stick to this convention for simplicity in all the formulas below, but note that our methodology also allows for an arbitrary gap between the two windows, as specified by the user.
If the event window is of length one (that is, contains a single day only), we shall use the convention T1+1=0=T2. Otherwise, it always holds more generally that T1+1≤0≤T2.
If multiple instances are considered, N denotes the number of instances.
For any given firm i, SARi denotes the sample standard deviation of the returns during the estimation window, which is given as the square root of the corresponding sample variance
S2ARi=1Mi−KT1∑t=T0AR2i,t
Here, Mi denotes the number of non-missing returns during the estimation window; for example, Mi=T1−T0+1 in case of no missing observations. Furthermore, K denotes the degrees of freedom (given by the number of free parameters) in the benchmark model that was used to compute the abnormal returns; for example, K=1 for the constant-expected-return model, K=2 for the market model, and K=4 for the three-factor Fama-French factor (which also contains a constant in addition to the three stochastic factors).
Finally, N(0,1) denotes the standard normal distribution and tk denotes the t-distribution with k degrees of freedom.
Parametric Tests
[1] T Test
[1.1] Null hypothesis of interest: H0:E(ARi,0)=0
Test statistic:
t=ARi,tSARi
Approximate null distribution: t⋅∼tMi−K
[1.2] Null hypothesis of interest: H0:E(CARi)=0
Test statistic:
t=CARiSCARiwithS2CARi=L2S2ARi
Approximate null distribution: t⋅∼tMi−K
[2] Cross-Sectional Test (Abbr.: CSect T)
[2.1] Null hypothesis of interest: E(AAR0)=0
Test statistic:
t=√NAAR0SAAR,0withS2AAR,0=1N−1N∑i=1(ARi,0−AAR0)2
Approximate null distribution: t⋅∼tN−1
[2.2] Null hypothesis of interest: E(CAAR)=0
Test statistic:
t=√NCAARSCAARwithS2CAAR=1N−1N∑i=1(CARi−CAAR)2
Approximate null distribution: t⋅∼tN−1
[3] Time-Series Standard Deviation or Crude Dependence Test (Abbr.: CDA T)
[3.1] Null hypothesis of interest: E(AAR0)=0
Test statistic:
t=√NAAR0SAARwithS2AAR=1M−1T1∑t=T0(AARt−1MT1∑t=T0AARt)2
where M denotes the number of non-missing AARt during the estimation window.
Approximate null distribution: t⋅∼tM−1
[3.2] Null hypothesis of interest: E(CAAR)=0
Test statistic:
t=√NCAARSCAARwithS2CAAR=1M−1T1∑t=T0(CAARt−1MT1∑t=T0CAARt)2
where M denotes the number of non-missing CAARt during the estimation window.
Approximate null distribution: t⋅∼tM−1
[4] Patell or Standardized Residual Test (Abbr.: Patell Z)
[4.1] Null hypothesis of interest: H0:E(AAR0)=0
Test statistic:
z=ASAR0SASAR
The underlying idea is to standardize each ARi,t by the so-called forecast-error-corrected standard deviation before calculating the test statistic; for example, for the market model,
SARi,0=ARi,0SARi,0withS2ARi,0=S2ARi(1+1Mi+(Rm,0−¯Rm)2T1∑t=T0(Rm,t−¯Rm)2)and¯Rm=1L1T1∑t=T0Rm,t
where Rm,t denotes the market return on day t. (The standardization is analogous for any other day t in the event window.)
Then compute
ASAR0=N∑i=1SARi,0
Under the null, this statistic has expectation zero and variance
S2ASAR=N∑i=1Mi−2Mi−4
Approximate null distribution: z⋅∼N(0,1)
[4.2] Null hypothesis of interest: H0:E(CAAR)=0
Test statistic:
z=1√NN∑i=1CSARiSCSARi
where CSARi denotes the cumulative standardized abnormal return of firm i:
CSARi=T2∑t=T1+1SARi,t
which under the null has expectation zero and variance
S2CSARi=L2Mi−2Mi−4
Approximate null distribution: z⋅∼N(0,1)
[5] Kolari and Pynnönen adjusted Patell or Standardized Residual Test (Abbr.: Adjusted Patell Z)
[5.1] Null hypothesis of interest: H0:E(AAR0)=0
Test statistic:
zadj=z⋅√1−ˉr1+(N−1)ˉr
where z is defined as in [4.1] and ˉr denotes the average of the (pairwise) sample cross-correlations of the estimation-period abnormal returns.
Approximate null distribution: zadj⋅∼N(0,1)
[5.2] Null hypothesis of interest: H0:E(CAAR)=0
Test statistic:
zadj=z⋅√1−ˉr1+(N−1)ˉr
where z is defined as in [4.2] and ˉr denotes the average of the (pairwise) sample cross-correlations of the estimation-period abnormal returns.
Approximate null distribution: zadj⋅∼N(0,1)
[6] Standardized Cross-Sectional or BMP Test (Abbr.: StdCSect T)
[6.1] Null hypothesis of interest: H0:E(AAR0)=0
Test statistic:
t=ASAR0√NSASAR,0withS2ASAR,0=1N−1N∑i=1(SARi,0−1NN∑i=1SARi,0)2
with SARi,0 and ASAR0 defined as in [4.1]
Approximate null distribution: t⋅∼tM−1
[6.2] Null hypothesis of interest: H0:E(CAAR)=0
Test statistic:
t=√N¯SCARS¯SCAR
where
¯SCAR=1NN∑i=1SCARiandS2¯SCAR=1N−1N∑i=1(SCARi−¯SCAR)2
These statistics are based on
SCARi=CARiSCARi
where SCARi denotes the forecast-error-corrected standard deviation; for example, for the market model,
S2CARi=S2ARi(L2+L2Mi+∑T2t=T1+1(Rm,t−ˉRm)2∑T1t=T0(Rm,t−ˉRm)2)
Approximate null distribution: t⋅∼tM−1
[7] Kolari and Pynnönen Adjusted Standardized Cross-Sectional or BMP Test (Abbr.: Adjusted StdCSect T)
[7.1] Null hypothesis of interest: H0:E(AAR0)=0
Test statistic:
zadj=t⋅√1−ˉr1+(N−1)ˉr
where t is defined as in [6.1] and ˉr denotes the average of the (pairwise) sample cross-correlations of the estimation-period abnormal returns.
Approximate null distribution: tadj⋅∼tN−1
[7.2] Null hypothesis of interest: H0:E(CAAR)=0
Test statistic:
tadj=z⋅√1−ˉr1+(N−1)ˉr
where t is defined as in [6.2] and ˉr denotes the average of the (pairwise) sample cross-correlations of the estimation-period abnormal returns.
Approximate null distribution: tadj⋅∼tN−1
[8] Skewness-Corrected Test (Abbr.: Skewness-Corrected T)
[8.1] Null hypothesis of interest: H0:E(AAR0)=0
Test statistic:
t=√N(S+13γS2+127γ2S3+16Nγ)
As far as the ingredients are concerned, first recall the cross-sectional sample variance
S2AAR,0=1N−1N∑i=1(ARi,0−AAR0)2
Next, the corresponding sample skewness is given by
γ=N(N−2)(N−1)N∑i=1(ARi,0−AAR0)3S3AAR,0
Finally, let
S=AAR0SAAR,0
Approximate null distribution: t⋅∼tN−1
[8.2] Null hypothesis of interest: H0:E(CAAR)=0
Test statistic:
t=√N(S+13γS2+127γ2S3+16Nγ)
As far as the ingredients are concerned, first recall the cross-sectional sample variance
S2CAAR=1N−1N∑i=1(CARi−CAAR)2
Next, the corresponding sample skewness is given by
γ=N(N−2)(N−1)N∑i=1(CARi−CAAR)3S3CAAR
Finally, let
S=CAARSCAAR
Approximate null distribution: t⋅∼tN−1
[9] Jackknife Test (Abbr.: Jackknife T)
This test will be added in a future version.
Nonparametric Tests
[10] Corrado Rank Test (Abbr.: Rank Z)
[10.1] Null hypothesis of interest: H0:E(AAR0)=0
Test statistic:
z=ˉK0−0.5SˉK
Start by computing, for any i, a vector of `scaled' ranks based on the combined sample {ARi,t}T2i=T0:
Ki,t=rank(ARi,t)1+Mi+L2,i
where Li,2 denotes the number of non-missing ARi,t during the event window
Then, for any t, denote the number of non-missing Ki,t by Nt and define
ˉKt=1NtN∑i=1Ki,tandS2ˉK=1L1+L2T2∑t=T0(ˉKt−0.5)2
Approximate null distribution: z⋅∼N(0,1)
[10.2] Null hypothesis of interest: H0:E(CAAR)=0
Test statistic:
z=√L2(ˉKT1+1,T2−0.5SˉK)withˉKT1+1,T2=1L2T2∑t=T1+1ˉKt
Approximate null distribution: z⋅∼N(0,1)
[11] Generalized Rank T Test (Abbr.: Generalized Rank T)
[11.1] Null hypothesis of interest: H0:E(AAR0)=0
Test statistic:
t=Z⋅(L1−1L1−Z2)withZ=ˉUL1+1SˉU
Arguably, this is the most complicated test statistic of them all, so it will take a while to describe its construction. For simplicity, we will assume no missing data anywhere.
For any t during the estimation window, let SARi,t=ARi,t/SARi and then compute SARi,0 as described in [4.1]. Next, use cross-sectional standardization to compute
SAR∗i,0=SARi,0SSAR0withS2SAR0=1N−1N∑i=1(SARi,0−¯SAR0)2and¯SAR0=1NN∑i=1SARi,0
This, for any i, gives a time series of length L1+1:
{GSARi,1,…,GSARi,L1,GSARi,L1+1}={SARi,T0,…,SARi,T1,SAR∗i}
Next, for any i, let
Ui,t=rank(GSARi,t)L1+2−0.5
where the ranks are across t∈{1,…,L1+1}
Next, for any t, let
ˉUt=1NN∑i=1Ui,t
and then let
S2ˉU=1L1+1L1+1∑t=1ˉU2t
noting that, necessarily, the average of the values {ˉUt}L1+1t=1 is zero
Approximate null distribution:t⋅∼tL1−1
[11.2] Null hypothesis of interest: H0:E(CAAR)=0
Test statistic:
t=Z⋅(L1−1L1−Z2)withZ=ˉUL1+1SˉU
Arguably, this is the most complicated test statistic of them all, so it will take a while to describe its construction. For simplicity, we will assume no missing data anywhere.
Compute SCARi as described in [6.1]; use cross-sectional standardization to compute
SCAR∗i=SCARiSSCARwithS2SCAR=1N−1N∑i=1(SCARi−¯SCAR)2and¯SCAR=1NN∑i=1SCARi
This,for any i, gives a time series of length L1+1:
{GSARi,1,…,GSARi,L1,GSARi,L1+1}={SARi,T0,…,SARi,T1,SCAR∗i}
Next, for any i, let
Ui,t=rank(GSARi,t)L1+2−0.5
where the ranks are across t∈{1,…,L1+1}
Next, for any t, let
ˉUt=1NN∑i=1Ui,t
and then let
S2ˉU=1L1+1L1+1∑t=1ˉU2t
noting that, necessarily, the average of the values {ˉUt}L1+1t=1 is zero
Approximate null distribution:t⋅∼tL1−1
[12] Generalized Rank Z Test (Abbr.: G-Rank Z)
[12.1] Null hypothesis of interest: H0:E(AAR0)=0
Test statistic:
z=ˉUL1+1SˉUL1+1withS2ˉUL1+1=L112N(L1+2)
where the ingredients are defined as in [11.1]
Approximate null distribution: z⋅∼N(0,1)
[12.2] Null hypothesis of interest: H0:E(CAAR)=0
Test statistic:
z=ˉUL1+1SˉUL1+1withS2ˉUL1+1=L112N(L1+2)
where the ingredients are defined as in [11.2]
Approximate null distribution: z⋅∼N(0,1)
[13] Sign Test (Abbr.: Sign Z)
[13.1] Null hypothesis of interest: H0:E(AAR0)=0
Test statistic:
z=w−N⋅0.5√N⋅0.5⋅0.5
where w is the number of the ARi,0 that are positive
Approximate null distribution: z⋅∼N(0,1)
[13.2] Null hypothesis of interest: H0:E(CAAR)=0
Test statistic:
z=w−N⋅0.5√N⋅0.5⋅0.5
where w is the number of the CARi during the event window that are positive
Approximate null distribution: z⋅∼N(0,1)
[14] Generalized Sign Test (Abbr.: Generalized Sign Z)
[14.1] Null hypothesis of interest: H0:E(AAR0)=0
Test statistic:
z=w−N⋅ˆp√N⋅ˆp(1−ˆp)
where w is the number of the ARi,0 that are positive and ˆp is the fraction of the ARi,t during the estimation window (across both i and t) that are positive
Approximate null distribution: z⋅∼N(0,1)
[14.2] Null hypothesis of interest: H0:E(CAAR)=0
Test statistic:
z=w−N⋅ˆp√N⋅ˆp(1−ˆp)
where w is the number of the CARi during the event window that are positive and ˆp is the fraction of the ARi,t during the estimation window (across both i and t) that are positive
Approximate null distribution: z⋅∼N(0,1)
[15] Wilcoxon Test (Abbr.: Wilcoxon)
[15.1] Null hypothesis of interest: H0:E(AAR0)=0
The Wilcoxon test is a nonparametric test based on the ranks of the ARi,0 across i. The (exact) distribution of the test statistic under the null, upon which we base the p-value, is nonstandard and we refer the user to the original paper of Wilcoxon (1945) or any suitable textbook for the details.
[15.2] Null hypothesis of interest: H0:E(CAAR)=0
The Wilcoxon test is not available for this null hypothesis.
[16] Permutation Test (Abbr.: Permutation)
[16.1] Null hypothesis of interest: H0:E(ARi,0)=0
The permutation test is a non-parametric test that computes the p-value in a data-dependent (or resampling-based) fashion. We refer the user to Nguyen and Wolf (2023) for the details.
[16.2] Null hypothesis of interest: H0:E(CARI)=0
The permutation test is a non-parametric test that computes the p-value in a data-dependent (or resampling-based) fashion. We refer the user to Nguyen and Wolf (2023) for the details.