Elsevier

Journal of Financial Economics

Simple formulas for standard errors that cluster by both firm and time

Abstract

When estimating finance panel regressions, it is common practice to adjust standard errors for correlation either across firms or across time. These procedures are valid only if the residuals are correlated either across time or across firms, but not across both. This paper shows that it is very easy to calculate standard errors that are robust to simultaneous correlation along two dimensions, such as firms and time. The covariance estimator is equal to the estimator that clusters by firm, plus the estimator that clusters by time, minus the usual heteroskedasticity-robust ordinary least squares (OLS) covariance matrix. Any statistical package with a clustering command can be used to easily calculate these standard errors.

Introduction

A typical finance panel data set contains observations on multiple firms across multiple time periods. Although OLS standard errors will be consistent as long as the regression residuals are uncorrelated across both firms and months, such uncorrelatedness is unlikely to hold in a finance panel. For example, market-wide shocks will induce correlation between firms at a moment in time, and persistent firm-specific shocks will induce correlation across time. Furthermore, persistent common shocks, like business cycles, can induce correlation between different firms in different years.

A number of techniques are available for adjusting standard errors for correlation along a single dimension. Fama and MacBeth (1973) propose a sequential time-series of cross-sections procedure that produces standard errors robust to correlation between firms at a moment in time. Huber (1967) and Rogers (1983) show how to compute "clustered" standard errors which are robust either to correlation across firms at a moment in time or to correlation within a firm across time. None of these techniques correctly adjusts standard errors for simultaneous correlation across both firms and time. If one clusters by firm, observations may be correlated within each firm, but must be independent across firms. If one clusters by time, observations may be correlated within each time period, but correlation across time periods is ruled out.

This paper describes a method for computing standard errors that are robust to correlation along two dimensions. To make the discussion concrete, we call one dimension time, and the other firm, but the results trivially generalize to any two-dimensional panel data setting. In addition, these standard errors are easy to compute. In the simplest case, we have firm and time effects, but no persistent common shocks. In this case, the variance estimate for an OLS estimator β ^ is Var ^ ( β ^ ) = V ^ firm + V ^ time , 0 V ^ white , 0 , where V ^ firm and V ^ time , 0 are the estimated variances that cluster by firm and time, respectively, and V ^ white , 0 is the usual heteroskedasticity-robust OLS variance matrix (White, 1980).1 Thus, any statistical package with a clustering command (e.g., STATA) can be used to easily calculate these standard errors. The paper also provides valid standard errors for the more complicated case which allows for persistent common shocks.

This paper also discusses the pros and cons of double-clustered standard errors. I analyze the standard error formulas using the familiar trade-off between bias and variance. The various standard error formulas are estimates of true, unknown standard errors. The more robust formulas have less bias, but more estimation variance. The lower bias improves the performance of test statistics, but the increased variance can lead to size distortions. I use Jensen's inequality to show that, when sample sizes are small, the more robust standard errors lead us to find statistical significance even when it does not exist.

When is the bias reduction likely to be important? I argue that double clustering is likely to be most helpful in data sets with the following characteristics: the regression errors include significant time and firm components, the regressors themselves include significant firm and time components, and the number of firms and time periods is not too different. So, if the regressors vary by time but not by firm, then clustering by time may be good enough, and double clustering may not make a large difference. If there are far more firms than time periods, clustering by time eliminates most of the bias unless within-firm correlations are much larger than within-time period correlations.

I also point out special considerations related to persistent common shocks. Correcting for correlations between different firms in different time periods involves estimating autocovariances between residuals. As Hurwicz (1950) and many subsequent authors have shown, autocovariance estimates are biased downward. Thus, standard errors that correct for persistent common shocks will tend to be biased downward. Eliminating the bias requires a large number of time periods.

I use a Monte Carlo to evaluate how large sample sizes must be in practice. When I apply pure double clustering, and do not adjust for persistent common shocks, the standard errors are reliable in data sets with at least 25 firms observed over 25 time periods. When I correct for persistent common shocks, the number of time periods should be greater than 50.

This leads to reasonably simple advice for applied researchers. Double clustering is worth doing because it is an easy robustness check, and the standard error estimates are accurate in small samples. However, we should not expect it to make a big difference in all data sets, especially when there are far more firms than time periods. I do not make as strong a case for adjusting for persistent common shocks. The standard error formulas are a bit more complicated, and a larger number of time periods is needed for the estimates to be accurate.

Section snippets

Firm effects, time effects, and persistent common shocks

Consider the panel regression y it = x it β + ɛ it . y it is the dependent variable, ɛ it is the error term, x it is the covariate vector, and β is the coefficient vector. We have i=1,…,N firms observed over t=1,…,T time periods. More generally, index i could refer to any unit of observation, such as an industry- or country-level observation, and t could refer to any other unit. I write in terms of firms and time periods because it makes the discussion more concrete. The errors may be heteroskedastic, but

Standard error formulas

What is the variance of the OLS estimator? The estimator satisfies β ^ β = H 1 i , t u it , with u it = x it ɛ it and H = i , t x it x it . In large samples, the estimator variance can be approximated by H −1 GH −1, where G = Var [ i , t u it ] . The term G may be written as G = i , j , t , k E ( u it u jk ) .

Under the error assumptions, we can simplify the formula as G = G firm + G time , 0 G white , 0 + l = 1 L ( G time , l + G time , l ) l = 1 L ( G white , l + G white , l ) , with G firm i E ( c i c i ) , G time , l t E ( s t s t + l ) , G white , l i , t E ( u it u i , t + l ) . c i = t u it is the sum over all

When should we use robust standard errors?

Is there a downside to double-clustering the standard errors? Should we always adjust standard errors to handle persistent common shocks? In fact, it is not always best to use the "most robust" standard error formula. The various standard error formulas are estimates of true, unknown standard errors. In this section, I point out that the more robust standard error formulas tend to have less bias, but more variance. The lower bias improves the performance of test statistics. But the increased

Monte Carlo experiments

In this section, I use Monte Carlo simulations to investigate the small-sample performance of the robust standard errors. I simulate 5,000 draws from the panel regression, y it = β 0 + β 1 x 1 , it + β 2 x 2 , it + ɛ it , with β 0 = 0 and β 1 = β 2 = 1 . The simulation is repeated for various sample sizes and error dependencies. For each sample, I estimate the regression and carry out two-sided t-tests of the nulls that β 1 = 1 and β 2 = 1 . Table 1 reports rejection frequencies for t-tests constructed from many different variance

Application to modeling industry profitability

I demonstrate the standard errors with an application to modeling industry profitability. I consider the hypothesis that profits are higher in more concentrated industries, and measure concentration with a forward-looking variant of the Herfindahl-Hirschman Index (HHI) (Hirschman, 1964). The HHI   is a widely used measure of industry concentration. For example, the U.S. Department of Justice uses the index to help determine whether a merger is anticompetitive (see USDOJ and FTC, 1997).

The HHI is

Conclusion

This paper derives easy-to-compute formulas for standard errors that cluster by both firm and time. Both the statistical theory and the Monte Carlo results suggest that simultaneously clustering by firms and time leads to significantly more accurate inference in finance panels. Monte Carlo experiments suggest that, as long as we do not allow for persistent common shocks, clustering on both firm and time works adequately when we have at least 25 firms and time periods. However, allowing for

References (18)

  • Predictive regressions

    Journal of Financial Economics

    (1999)

  • Cameron, C., Gelbach, J., Miller, D., 2006. Robust inference with multi-way clustering. NBER Technical Working Paper...
  • R. Cohen et al.

    The value spread

    Journal of Finance

    (2003)

  • E. Fama et al.

    Risk, return, and equilibrium

    Journal of Political Economy

    (1973)

  • E. Fama et al.

    Forecasting profitability and earnings

    Journal of Business

    (2000)

  • L. Hansen et al.

    Forward exchange rates as optimal predictors of future spot rates: an econometric analysis

    Journal of Political Economy

    (1980)

  • A. Hirschman

    The paternity of an index

    The American Economic Review

    (1964)

  • P. Huber

    The behavior of the maximum likelihood estimates under nonstandard conditions

  • L. Hurwicz

    Least-squares bias in time series

There are more references available in the full text version of this article.

Cited by (777)

View full text