Joint Statistical Meetings Abstracts
"Celebrating the International Year of Statistics"
August 3-8, 2013—Montréal, Québec, Canada
This page lists only Mathematica staff. For a full listing of all presenters, click here.
Variation in Quality by Hospital Characteristics: True or False?
David Jones, Sam Stalley, Alex Bohl, Frank Yoon, Eric Schone, Joseph Zickafoose, and Jessica Ross
Recent critiques of the AHRQ QIs in the academic literature and popular press have called into question the validity of the indicators in making comparisons of hospital quality. A key component of the critiques is the assertion that certain hospital types are fundamentally different in their mission, patient populations, and service delivery. These assertions are typically supported by limited evidence that the group means of the QIs (average risk- and reliability-adjusted point estimates) vary by hospital characteristics. We present a systematic review of the variation in the QIs by hospital characteristics. We add to the evidence base by considering the estimated variance of the risk- and reliability-adjusted rates when testing the differences between hospital group means. We also compare the distributions of hospital performance categories reported on CMS's Hospital Compare website (an alternate way of comparing hospitals) to determine if there is variation by hospital groups. Finally, we explore sources of the observed differences in a multivariate framework, by estimating the degree to which various hospital characteristics drive the variation in hospital results.
The Role of Hospital Characteristics in Setting Appropriate Yardsticks for Quality Measurement
Frank Yoon, Alex Bohl, David Jones, Dmitriy Poznyak, Jessica Ross, Eric Schone, Joseph Zickafoose, and Dejene Ayele
Hierarchical models in the AHRQ Quality Indicators (QI) adjust for patient-level risk factors but not for potential variation in quality by hospital attributes, such as teaching status or bed size. This variation is typically modeled by hospital-level random effects, for example, by setting prior means in a Normal distribution that depend on hospital attributes. Alternatively, we evaluate the performance of hierarchical models that assume more flexible random effect distributions than the Normal to better account for underlying variation in quality. Flexible distributions for the random effects might avoid the need to explicitly specify models with hospital attributes while addressing concerns about their use and interpretation in hospital profiling. We will discuss this advantage and demonstrate the application of these enhanced models in a nationally representative inpatient claims database.
Hospital Peer Groups, Reliability, and Stabilization: Shrinking to the Right Mean
Alex Bohl, David Jones, Dmitriy Poznyak, Jessica Ross, Eric Schone, Frank Yoon, Joseph Zickafoose, and Sam Stalley
The AHRQ Quality Indicators (QIs) are reliability-adjusted or "smoothed" to the national mean to deal with the unstable QIs estimates due to hospitals with small numbers of denominator cases and rare outcomes. Differences in hospital scope, size, and other characteristics allude to the possibility that smoothing to target means determined by hospital attributes, or "peer groups," may reduce bias when comparing hospitals on their estimated QIs. Current research suggests that incorporating peer-group targets into the risk-adjustment model through random effects is not feasible due to high-dimensional parameters and computational limits of MCMC estimation. Two alternative approaches are to: (1) smooth to a peer-group's risk-adjusted mean in the current framework (using an empirical estimate of reliability using the signal-to-noise ratio); or (2) add fixed effects for hospital characteristics to the risk-adjustment model. This study aims to compare the performance of these alternative peer-group smoothing methods and discuss their conceptual implications. We will judge model performance based on changes to model efficiency and reduction in comparison bias.
Alternative Weighting Schemes for the AHRQ QI Composites
Eric Schone, Alex Bohl, David Jones, Dmitriy Poznyak, Jessica Ross, Frank Yoon, and Joseph Zickafoose
In the current AHRQ QI composite scheme, composites are constructed of weighted averages of smoothed component rates, where components are smoothed to the national mean. The same composite weights are applied to all hospitals regardless of their characteristics. If the true performance of small hospitals is different from the national mean, the current compositing methodology may introduce bias when comparing hospital performance. As part of our research investigating the relation between hospital characteristics to the performance of QI measures, we estimate bias in component values among groups of hospitals defined by different characteristics, including volume. We will carry through our analyses from the component to the composite level by comparing the reliability and total estimated bias of the composite across hospital groups. We will explore several options for combining QI component values and discuss the challenges and strengths of each approach. We will further compare the influence of these weighting schemes on the predictive validity of the AHRQ QI composites.
Impact on Weights and Sampling Errors of Using Hybrid Frame and Composite MOS
John Hall, Mark Denbaly (Economic Research Service-USDA), and Pheny Weidman (Economic Research Service-USDA)
The National Household Food Acquisition and Purchase Survey (FoodAPS) employed two strategies to efficiently oversample households receiving U.S. Supplemental Nutrition Assistance Program (SNAP) benefits and other low income households: (1) composite measures of size (MOS) in a multi-stage sampling design; (2)use of a hybrid sampling frame at the penultimate level of sampling. FoodAPS was fielded in 2012 and collected data from nearly 5,000 households. The sample used a three-stage design, where primary and secondary selection units (PSUs and SSUs) were selected using composite MOS that reflected the projected prevalence of and sampling rate for each of four target groups: SNAP households and three income-defined strata of households not receiving SNAP. Within SSUs the study employed a hybrid frame approach: addresses from SNAP administrative records were merged with addresses from a commercial Address Based Sampling (ABS) frame. The paper will review all phases of sampling: how the composite size measures were constructed; how the two frames were used within SSUs. In addition the paper will attempt to evaluate the impacts of the approach on analysis weights and sampling errors.
Adjustments for Temporal Misclassification of Exposure Status in Surveys of Health Outcomes
Donsig Jang, Frank Yoon, Amang Sukasih, Amii Kress (Department of Veteran Affairs), Shannon K. Barth (Veterans Health Administration), Clare M. Mahan (Veterans Health Administration), Steven S. Coughlin (Veterans Health Administration), Erin K. Dursa (Veterans Health Administration), and Aaron Schneiderman (Department of Veteran Affairs)
In large, complex sample surveys, administrative data used to construct the survey frame may contain information that does not agree with self-reported information. In many cases, misclassification is the result of erroneous recordkeeping; additionally, when there is a delay between sampling and survey fielding, temporal changes in the values of sampling frame variables may occur. We present a motivating example of the National Health Study for a New Generation of U.S. Veterans; in it, deployment status is a primary sampling and analysis variable that indicates whether a Veteran had served in a combat theater in Operation Enduring Freedom or Operation Iraqi Freedom. About 11 percent of Veterans in the sample had self-reported a deployment status that differed from the administrative records used in sampling. Generally, misclassification of sampling variables requires post-stratification adjustments to the survey weights so that the weighted respondent sample is representative of the target population. We address the nature of misclassified deployment status and then discuss and implement an approach using updated administrative records to adjust the survey weights in the study.
Critique of a Modification to the Census-Recommended American Community Survey Variance Estimator
Eric Grau
Scheer and Levitan propose a modification to the Census-recommended American Community Survey variance estimator, designed to incorporate additional error introduced from variables imputed from auxiliary data sets. This paper will assess and critique this approach. It will consider both the statistical properties of the estimation strategy and its usefulness for applied researchers.
Use of R-Indicators to Assess Survey Response Representativeness
Jared Coopersmith
It has been shown that response rates are inadequate for measuring response representativeness and nonresponse bias (Groves 2006; Groves and Peytcheva 2008). Further, Schouten et al. state that "subgroup response rates come closest [to supporting the data collection monitoring, targeting and prioritizing] but do not account for subgroup size, are univariate and are not available at the variable level" (2011, p.1). Recent work has led to the development of "R-indicators", which are "designed to measure the degree to which the respondents to a survey resemble the complete sample" (Schouten et al. 2011, p.232). We examine R-indicators and partial R-indicators for weekly data collection returns in order to assess the representativeness of the respondents for the 2008 and 2010 National Survey of Recent College Graduates (NSRCG), sponsored by NSF. We use sample frame data to estimate response propensities and to examine unconditional partial R-indicators. We show the overall R-indicator provides a better indicator of representativeness than the response rate and that several frame variables are disproportionately contributing to an overall lack of representativeness in the response.
Response Rates Revisited
Barbara Lepidus Carlson
Response rates are an important indicator of survey quality and the potential for nonresponse bias. Until the American Association for Public Opinion Research (AAPOR) developed a standard definition for response rates in 1998, the survey research community used different formulas or rules to calculate them. By having a set of industry standards, response rates became easier to interpret and to compare across surveys. While this was a major improvement, the response rates (essentially one formula with six variations) were overly simplistic in terms of how they dealt with eligibility rates for those with undetermined status. The AAPOR standards give some guidance on computing the eligibility rate and applying the response rate formulas to more complex samples. This paper provides additional guidance and examples for estimating the eligibility rate, implementing the response rate formulas in complex samples, and applying multiple eligibility rates when eligibility is nested. This paper also provides alternative but algebraically equivalent response rate formulas for one-, two-, and three-stage samples, some of which may be easier to interpret or implement than the AAPOR versions.
Social Security Numbers in State Medicaid Records: Completeness and Quality
John Czajka and Shinu Verghese
The Medicaid records that states submit to the Centers for Medicare & Medicaid Services (CMS) through the Medicaid Statistical Information System (MSIS) do not contain names and addresses, but they do contain Social Security numbers (SSNs). Any attempt to link Medicaid records to other databases must rely almost exclusively on these SSNs. The effectiveness of these linkages and the validity of any research based on these linked data are directly dependent on the quality of the SSNs recorded in the MSIS files. This paper documents how often SSNs were present in the MSIS records that states submitted for the final quarter of federal fiscal year 2009 and uses Social Security Administration data enhanced by the Census Bureau to assess the validity of reported SSNs by age group and state of residence. Implications for linkage of Medicaid records to other databases are discussed.
Dealing with Negative Contributions in Protecting Tabular Data
Amang Sukasih and John Czajka
Some variables in magnitude tables may have negative values: net income, net production, capital expenditures, and so on. Negative values pose a computational challenge for the application of disclosure-limitation techniques to tabular data. A table cell may be identified differently (as sensitive or not) if some of cell contributions are negative rather than positive. A negative value reduces rather than increases the cell magnitude, which increases the percentage contributions of the largest observations. There are three common strategies for dealing with variables with negative values: (1) taking absolute values of the negative contributions and applying sensitivity rules to these positive values, (2) increasing disclosure limitation by increasing or decreasing the parameters of the sensitivity rules, and (3) changing the variable to a positive variable and then applying the sensitivity rules as usual. This paper will discuss the application of sensitivity rules in statistical disclosure limitation to magnitude tabular data with negative values.
Measurement Error Properties in an Accelerometer Sample of Elementary School Children
Nicholas Beyler, Susanne James-Burdumy, Martha Bleeker, Jane Fortson, Max Benjamin, and Emily Evans
Measurement error modeling approaches have been used extensively in nutritional research to estimate distributions of usual dietary intake and to investigate the sources of bias and errors in measurement instruments such as food frequency questionnaires and 24-hour dietary recalls. Similar procedures have recently been developed for studies of physical activity and energy expenditure, but applications usually focus on study data from adult populations. In this paper, we review the existing measurement error model procedures that have been used for assessing physical activity in adult populations and use these procedures to investigate the measurement error properties from a sample of 4th and 5th grade children that wore accelerometers during the school day as part of the Randomized Experiment of Playworks study. Measurement error and variability in the child accelerometer data are examined and compared to findings from previous studies conducted on adult populations.