Get Updates via Email Get Updates Get our RSS Feed
  Follow Mathematica on Twitter  Share/Save/Bookmark

Joint Statistical Meetings Abstracts

"Statistics: Growing to Serve a Data-Dependent Society"

July 28-August 2, 2012—San Diego Convention Center/Hilton San Diego Bayfront—San Diego, CA

This page lists only Mathematica staff. For a full listing of all presenters, click here.

What Are Error Rates for Classifying Teacher and School Performance Using Value-Added Models?
Peter Schochet and Hanley Chiang

This article addresses likely error rates for measuring teacher and school performance in the upper elementary grades using value-added models applied to student test score gain data. Using a realistic performance measurement system scheme based on hypothesis testing, we develop error rate formulas based on OLS and Empirical Bayes estimators. Empirical results suggest that value-added estimates are likely to be noisy using the amount of data that are typically used in practice. Type I and II error rates for comparing a teacher's performance to the average are likely to be about 25 percent with three years of data and 35 percent with one year of data. Corresponding error rates for overall false positive and negative errors are 10 and 20 percent, respectively. Lower error rates can be achieved if schools are the performance unit.

Finding an Optimal Composite Factor: Weighting Data from a Household Survey Using a Cell Overlap Design
John Hall, Barbara Lepidus Carlson, and Karen CyBulski

Two approaches currently used to sample the U.S. residential population for random digit dialing (RDD) survey are the "cell only" and "cell overlap" designs. Each involves supplementing an RDD landline sample with an RDD cell sample. In either design, all households reachable by landline phone are retained in the landline sample. However, in a "cell only" design interviewers screen the cell sample for households with no landline service ( households with landline service are screened out), while in a "cell overlap design", all households accessible by cell phone are retained for the cell sample. This paper reports on weight construction for a large national telephone survey (the Health Tracking Household Survey) that used a cell overlap design. In a cell overlap design, households with both landline and cell service have a chance of being selected from either frame. To address this multiplicity issue we employed a composite weight for the dual-service group. The paper explains how the compositing factor was derived and investigates the impact of using different compositing factors on sampling error and potential bias.

Asymptotic Variance Estimation and Comparison of Model-Assisted Regression Estimators in Sample Surveys
Sheng Wang and Jun Shao (University of Wisconsin-Madison)

Model-assisted regression estimators are popular in sample surveys for making use of auxiliary information and improving the Horvitz-Thompson estimators of population totals. In the presence of strata and unequal probability sampling, however, there are several ways to form model-assisted regression estimators, i.e., regression within each stratum or regression by combining all strata, and a separate ratio adjustment for population size, or a combined ratio adjustment, or no adjustment. In our paper, the asymptotic normalities of the estimators are established under two different asymptotic settings. In both cases, we consider variance estimation by applying substitution or the bootstrap, which is useful in large sample inference. The relative efficiencies among these six estimators are obtained based on the asymptotic properties under two settings. Some simulation results are presented to examine finite sample performances of regression estimators and their variance estimators.

Effectiveness of a Composite Size Measure for Sampling Students with Disabilities
Frank Potter

Rare and hard-to-reach populations pose significant challenges to the design and implementation of cost-efficient sample surveys. To find and enumerate these populations, multi-stage surveys are often used to avoid the construction of a sampling frame for the entire target population, and primary sampling units (PSUs) are selected with probability proportional to a size measure related to the population sizes in the PSUs. When multiple populations are of interest, composite size measures are used that are based on the population counts in the PSUs to achieve equal or nearly equal selection rates within the populations. Some composite size measures were described by Folsom, Potter and Williams (1987) and by Fahimi and Judkins (1991). The purpose of this paper is to identify the capabilities and limitations of using a composite size measure for a survey of students with disabilities in which some disabilities are prevalent and some are very rare. The paper will provide guidance on when the use of composite size measure achieves the desired objectives and when it cannot.

Weighting in the Dark: What to Do in the Absence of Benchmarks
Barbara Lepidus Carlson and Jerry West

The 2009 Head Start Family and Child Experiences Survey (FACES) involved four stages of sampling: Head Start programs, centers, classrooms, and children. At the time of sampling, eligible children were those who were one or two years away from kindergarten and were new to Head Start in fall 2009. These children were followed through their first year of Head Start, and then followed for one or two more years, depending on their age, through kindergarten. Children who left Head Start after fall 2009 but did not go to kindergarten were considered ineligible for followup. There existed no published population counts for the study's baseline population, nor were there existing benchmarks for the Head Start retention and kindergarten transfer rates needed to define the study population at followup. This paper shows the steps we took to make use of an earlier cohort of FACES to ensure that the baseline and followup weights, which adjusted for sampling and response patterns, appropriately reflected their respective target populations. We also show how different assumptions about eligibility among those with undetermined status can substantively affect estimated totals and mean estimates.

Fixing the Sample and the Sample Weights When Problems Arise in the Frame: Experiences with a Survey of Physicians
Eric Grau

Probability samples are selected under the assumption that the sample frame closely resembles the target population. Occasionally, however, the sample frame has problems that are only discovered after the sample is selected. In this paper, we discuss a physician survey that was part of an evaluation of a Medicare pay-for-performance demonstration. Several problems with the sample frame were discovered after sample selection. In particular, the frame included duplicate physicians within and across practices. Moreover, for practices that participated in the demonstration, the frame identified all physicians in those practices as also participating in the demonstration, when this was not the case. This paper describes the process of calculating sampling weights that properly account for the original selected sample, the replacement sample, and the multiplicity adjustment for duplicates. We compare estimates obtained using the appropriately calculated weights (adjusted for nonresponse) with those obtained using a simple-minded weighting procedure.

Using an Experimental Evaluation of Charter Schools to Test Whether Nonexperimental Comparison Group Methods Can Replicate Experimental Impact Estimates
Natalya Verbitsky-Savitz, Kenneth Fortson, Emma Kopa, and Philip Gleason

Randomized control trials (RCTs) are widely considered the gold standard in evaluating the impact of a social program. When an RCT is not feasible, quasi-experimental designs (QEDs) are often used. A popular class of QEDs uses a non-randomly selected comparison group to represent what would have happened to the treatment group had they not participated in the program. Under certain assumptions, QEDs can produce unbiased impact estimates; however, these assumptions are generally untestable in practice. We test the validity of four comparison group approaches-OLS regression modeling, exact matching, propensity score matching, and fixed effects modeling-comparing QED impact estimates from these methods with an experimental benchmark. The analysis uses data from an experimental evaluation of charter schools and comparison data for other students in the same school districts. We find that the use of pre-intervention baseline data considerably reduces but might not completely eliminate bias. While the matching and regression-based estimates do not greatly differ, the matching estimators perform slightly better than do estimators that rely on parametric assumptions.

Response Surface Matching for Survey Nonresponse
Frank Yoon

Prognostic score-based methods have been proposed as a complementary approach to propensity score-based methods for achieving covariate balance in observational studies. Likewise, standard adjustments for survey nonresponse, such as response propensity matching, can be supplemented by incorporating information from respondent outcomes. In this talk I will discuss how methods for matching on response profiles can be extended and applied to adjustments for survey nonresponse. I will illustrate these ideas through simulation studies and applications to large scale surveys.

Statistical Pipeline: Made In The U.S.A. in Honor of Nampeo McKenney and Nagambal Shah
Clemencia Cosentino, Guillermina Jasso (New York University), and Derrick Rollins (Iowa State University)

Census data show changing geographic distributions and greater diversity of minority and new immigrant populations. ASA, federal agencies and higher education respond with various initiatives to include these populations in a competitive US labor force, including STEM. Future growth of younger Americans is primarily racial/ethnic minorities and new immigrants. Yet, the statistical industry and higher education continue to draw from foreign student and labor force pipelines disproportionately relative to "Made in the USA" ones. What are successful investments and returns in higher education and industry to reverse this trend? What are post-1964 successes in federal agencies, both civilian and military, and other sectors, particularly in STEM occupations to include historical racial/ethnic minorities and new immigrants?