Get Updates via Email Get Updates Get our RSS Feed
  Follow Mathematica on Twitter  Share/Save/Bookmark

Joint Statistical Meetings Abstracts

"Statistics: An All-Encompassing Discipline"

July 30-August 4, 2011—Miami Beach Convention Center—Miami Beach, FL

This page lists only Mathematica staff. For a full listing of all presenters, click here.

Imputation of Multiple-Response Items in SESTAT and Its Component Surveys
Nicholas Beyler, Donsig Jang, and Amang Sukasih

The Scientists and Engineers Statistical Data System (SESTAT) is a comprehensive, integrated system of information about the characteristics of scientists and engineers in the United States. Unweighted sequential hot-deck imputation is used to handle item nonresponse in each of the SESTAT component surveys. To maintain consistency across surveys and survey years, regression models are fit to determine control variables and the control variables are used to identify donors for hot-deck imputation. Many items in the SESTAT component surveys contain multiple, potentially correlated questions, yet the current imputation protocol calls for the responses to these questions to be imputed separately, without accounting for the correlation structure in the outcomes. To evaluate the validity of this approach, we consider an alternative method for imputing multi-question items, where control variables are determined using the Sequential Regression Multivariate Imputation (SRMI) method, which sequentially fits regression models to account for correlation among the outcome variables. SESTAT estimates based on the current protocol are compared to those based on this alternative approach. PDF of slides.

Exploring Nonresponse Bias in a National Health Expenditures Survey of Institutions
Sameena Salvucci and Eric Grau

We examined potential nonresponse bias in a new multimode national survey of mental health and substance abuse treatment facilities and its association with the response rate. We used data from the 2010 SAMHSA Survey of Revenues and Expenses (SSR&E), which was linked to data from the census of substance abuse facilities in the 2010 National Survey of Substance Abuse Treatment Services (N-SSATS) and the census of mental health facilities in the 2010 National Mental Health Services Survey (N-MHSS). We compared a range of facility characteristics of respondents and nonrespondents to SSR&E using chi-squared statistics. We also examined the nature and strength of the association between response rates and a set of independent characteristics using a multivariate logistic regression model.

Using a Multimode Survey Design on a Panel Study of New Businesses
David DesRoches

Since 2005, Mathematica has conducted the Kauffman Firm Survey (KFS) for the Ewing Marion Kauffman Foundation. The baseline KFS survey recruited a panel of U.S. businesses which were founded in the same calendar year (2004) using a multimode web/CATI design. This group of businesses (the KFS panel) has been contacted annually for follow-up data collection since 2006. The main goal of the KFS is to investigate how new businesses are structured and funded in their early years, and to measure the changes in business financing and productivity over this period. With the use of this multimode design, much of the data collection in the follow-up surveys has been migrated from CATI to the web, reducing the costs and respondent burden associated with extensive telephone follow-up efforts. This paper will explore the experiences of recruiting a panel of establishments through a multimode survey, as well as the technological improvements made over the course of the study. The paper will also examine the cost effects of increasing web survey data collection in the follow-up surveys.

An Argument for Teaching Metrology in Introductory Statistics Classes
Emily Casleton (Iowa State University) Amy Borgen, Ulrike Genschel (Iowa State University), and Alyson Wilson (Iowa State University)

Undergraduate students in introductory statistics courses often struggle with the concepts of variability and how statistics will translate to their lives beyond the classroom. The aim of this research is the use of metrology, the science of measurement, to increase the understanding of these difficult concepts. Measurement quality and the inherent variability introduced through the measurement process are under emphasized topics in the statistics curriculum. To this end, materials and methods have been developed for use in introductory statistics courses. This material explains how to characterize sources of variability in a data set which is natural and accessible because sources of variability are observable, i.e. device or operator. Everyday examples of measurements, such as the amount of gasoline pumped into a car, are presented and the consequences of variability within those measurements are discussed. These materials were implemented into an introductory statistics course at Iowa State University. Student's subsequent understanding of variability and attitude toward the usefulness of statistics were analyzed in a comparative study.

Composite Size Measures in Surveys of Rare or Hard-to-Reach Populations
Frank Potter, Eric Grau, and John Hall

Rare and hard-to-reach populations pose significant challenges to the design and implementation of cost-efficient sample surveys. To find and enumerate these populations, multi-stage surveys are often used to avoid the construction of a sampling frame for the entire target population, and primary sampling units (PSUs) are selected with probability proportional to a size measure related to the population sizes in the PSUs. When multiple populations are of interest, composite size measures are used that are based on the population counts in the PSUs. Some composite size measures were described by Folsom, Potter, and Williams (1987) and by Fahimi and Judkins (1991). The purpose of this paper is to discuss the advantages and disadvantages of these methods for various study populations and when to use these algorithms. We will demonstrate these size measures in surveys of students with disabilities, of persons receiving unemployment insurance compensation, and of persons in households receiving Supplemental Nutrition Assistance Program (SNAP, formerly called the Food Stamps program) payments and low-income households not receiving SNAP payments.

Survey Quality Indicator Measures: Response Rates and Alternatives
Donsig Jang

Response rate is often used as an indication of measuring the quality of the survey response. However, it only tells one side of the survey story; the other side about the association between respondents and nonrespondents are unknown. Researchers continue to seek the tools to assess and compare the quality of the response to different surveys. For example, introduced by Schouten et al. (2009), R-indicators are used to measure how well a respondent set represents the sample or population from which it was drawn. This measure may be a better indicator of survey nonresponse bias than response rates for survey outcomes closely related to auxiliary variables used for R-indicator calculation. In this roundtable, we will lead a discussion of the use of alternatives to response rates in measuring survey quality.

NSCG Estimation Issues When Using ACS-Based Sampling Frame
John Finamore and Steve Cohen (National Science Foundation), David Hall and Julie Walker (U.S. Census Bureau), and Donsig Jang

The National Survey of College Graduates (NSCG) is the nation's only source of detailed statistics on the science and engineering labor force. Historically, the NSCG selected its sample once a decade from the decennial census long form respondents. In the 2010 NSCG survey cycle, the NSCG began using the American Community Survey (ACS) as the sampling frame for the NSCG. After considering numerous sample design options proposed by the NSCG survey sponsor, the National Science Foundation (NSF), and reviewed by the Committee on National Statistics (CNSTAT), the NSF approved the use of a rotating panel design for the 2010 decade of the NSCG. This rotating panel design allows the NSCG to address certain deficiencies of the previous long form-based design including the undercoverage of key interest groups. However, along with numerous improvements, the use of the ACS as a sampling frame for the NSCG and the implementation of the NSCG rotating panel design also introduced new challenges. This document summarizes the rotating panel design planned for the 2010 decade of the NSCG and discusses results from two research tasks related to NSCG estimation.

Evaluating 2003 NSCG Dual-Frame Estimates for 2010 NSCG Planning Purposes
Donsig Jang and David Hall (U.S. Census Bureau)

The National Survey of College Graduates (NSCG) is the nation's leading source of detailed statistics on the science and engineering labor force. Beginning in the 2010 survey cycle, the NSCG will be constructed using multiple sampling frames. The NSCG had attempted a similar dual-frame approach in the 2003 NSCG survey cycle, but differing population estimates between the frames led the survey sponsor, the National Science Foundation (NSF), to abandon the dual-frame estimates for publication purposes in favor of the single frame estimates. New research into the 2003 NSCG dual-frame design has presented an opportunity to reevaluate the 2003 estimates and the 2003 decision. This paper looks deeper into the 2003 NSCG dual frame design issues, including potential causes of the differing estimates and comparisons of single-frame and dual estimates two frames for key estimates of interest. The goal of this research is to better understand the 2003 dual-frame estimates in preparation for the 2010 NSCG.

Using Tau-Argus and R-Statistical Package sdcTable to Conduct Secondary Cell Suppression for Linked Tables
Amang Sukasih, Donsig Jang, and David Edson

When a data cell in a table is suppressed by dropping its value based on a primary cell suppression rule, the value of that cell can still be determined if the table, subtable, or linked tables provide totals, marginal totals, or subtotals. Secondary cell suppression is therefore needed to avoid such disclosures. Two software packages are available to assist researchers with secondary cell suppression: Tau-Argus (Statistics Netherland 2009) and R-statistical package sdcTable (Meindl 2010). But even with this software, there is no simple way to perform secondary suppression for linked tablesthat is, tables presenting data on the same cells that share some categories of at least one explanatory variable. Computation may not be trivial and may still require manual reviews, especially when dealing with a large number of linked tables. With an eye toward finding the most straightforward and effective method of suppressing linked tables, we will explore the capabilities of the aforementioned software programs in performing linked-table suppression, identifying the strengths and limitations of each program and comparing the results.

RDD Unplugged: Findings from a Household Survey Using a Cell Overlap Design
Barbara Lepidus Carlson and John Hall

Sampling the U.S. residential population using list-assisted random digit dialing (RDD) of landline telephone numbers has become problematic due to the increasing proportion of the population that is reachable only through cell phones. To address this coverage problem, round 6 of the Health Tracking Household Survey (HTHS6) employed an RDD dual-frame "cell overlap design": sample was selected from landline and cell frames, and interviews attempted with all contacted households. Other approaches sometimes used to address the coverage issue include address-based sampling and dual-frame RDD designs where the cell frame is screened for cell-only households. HTHS6 asked a series of questions about telephone usage from respondents in both the landline and cell sample frames. This paper will discuss contact and cooperation rates, and number of calls per complete, by sample frame. In addition, this paper will provide information about landline and cell telephone usage by sample type, and compare characteristics among the various telephone usage categories (cell-only, cell-mostly, landline-mostly, landline-only), including demographics, health status, and insurance coverage.