HRSA - U.S Department of Health and Human Services, Health Resources and Service Administration U.S. Department of Health and Human Services
Home
Questions
Order Publications
 
Grants Find Help Service Delivery Data Health Care Concerns About HRSA

The Registered Nurse Population: Findings from the 2004 National Sample Survey of Registered Nurses

 

Appendix B. Survey Methodology

The eighth cycle of the National Sample Survey of Registered Nurses (NSSRN) followed the same basic sample design as its predecessors.  The sample design was originally developed by Westat, Inc. under a contract with the Division of Nursing, BHPr, HRSA in 1975-76 and can be best described as a systematic sample of alphabetic clusters of names in each State using a ‘nested alpha segment design’.  Prior to sampling, each State was ranked by the sampling rate such that the highest priority States were those with the highest sampling rate (for the most part, small States).  As a result, the alphabetic clusters of names for lower priority States are ‘nested’, or included, within those of higher priority States.  This means that a sample name selected in one State (such as California) will also have been selected in every State with a higher priority (in the case of California, this is all other States).

This design approach takes into account two key characteristics of the sampling frame. First, no single list of all individuals with licenses to practice as registered nurses in the United States exists, although lists of those who have licenses in any one State are available.  Second, a nurse may be licensed in more than one State. The advantage of the nested alpha-segment design is that one can determine the probabilities of selection and appropriate multiplicity adjusted weights for those nurses that are listed in more than one State.  In addition, the design also permits the use of each sample registered nurses’ data for State estimates of each of her/his States of licensure.

This appendix provides a brief summary of the methodology of the NSSRN including the sampling frame, sample design and the statistical techniques used in summarizing the data.  It also includes a discussion of sampling errors, provides the standard errors for key variables in the study and presents a simplified methodology for estimating standard errors.

Sampling Frame

The target population for the eighth NSSRN included all registered nurses with an active license in the United States as of March 2004.  A sampling frame was required to select a probability sample of nurses from which valid inferences could be made to the target population.  The sampling frame for the eighth NSSRN consisted of all registered nurses who are currently eligible to practice as an RN in the U.S.  This sampling frame included RNs who have received a specialty license or have been certified by a State agency as an advanced practiced nurse (APNs) such as nurse practitioner, certified nurse midwives, certified registered nurse anesthetist, or clinical nursing specialist and excluded licensed practical nurses (LPNs)/licensed vocational nurses (LVNs).

State Boards of Nursing in the 50 States and in the District of Columbia (hereafter also referred to as a State) provided files containing the name, address, and license number of every RN currently holding an active license in that State.  These files formed the basis of the sampling frame from which the RNs for each State were selected.  The licensure files provided by the States were submitted on diskette or compact disk (twenty States), or electronically as an attachment to an e-mail message (twenty-seven States). Three States sent the data via FTP and another provided the data on their website.  For this study, States were also asked to identify nurses for whom the State provided advanced practice nurse (APN) status.  In some cases, the State identified these nurses on the basic list provided.  However, some APNs were identified on separate lists and their APN status was appended to the information on the RN sampling frame. 

Each of the 51 State files was checked for consistency, names were standardized, and duplicates and ineligible records were removed from the State list to prepare the list for sampling. 

Sample Design

The NSSRN 2004, the eighth in the series, continued to oversample nurses in small States in order to better support HRSA’s National Center for Health Workforce Analysis’ State level supply and demand projections for registered nurses.  The basic design was enhanced by using sample design optimization methodology developed by Chromy [1] to determine the sample allocation to the States that would simultaneously satisfy variance constraints defined by the 51 States and the total U.S. 

In the original sample design, and in the 1988 redesign, the universe of RNs was sorted alphabetically by last name and approximately equal-sized clusters of RNs were constructed by partitioning the alphabetically ordered list into 250 alpha-segment clusters with equal (or nearly equal) numbers of RNs.  An alpha-segment was defined as all alphabetically adjacent names falling within pre-specified boundaries. For example, all names beginning with the lower boundary, up to but not including the name that defined the upper boundary. 

From the frame of 250 equally divided alpha-segments, a total of 40 alpha-segments were randomly selected, representing a 16 percent sampling rate overall.  Registered nurses are selected in the sample based on their name, with an RN being included in the sample if the name of licensure falls into one of the alphabetic segments that are in sample for that State. 

Although each State had 40 sample segments, the sample size of each State differed in size depending on the State’s sampling rate.  While uniform-sampling rates would have produced the best national estimates, the resulting sample sizes for the smallest States would have been inadequate to support State-level estimates.  Since both national and State-level estimates are required for the 2004 NSSRN, as was done is prior surveys, sampling rates were increased in the smaller States to obtain larger State-level sample sizes.  While this disproportionate sampling improved the precision of estimates in the smaller States, it also reduced precision of national estimates due to unequal weighting effects.  .

To accommodate the differing State sampling rates, a planned variation in the size of the segments, i.e., “portions of alpha segments” was used.  Each of the 40 alpha-segments selected for sample was divided into ½-, ¼-, 1/8-, 1/16-, and 1/32- portions.  These fractions indicate the size of the alpha segment portion relative to the size of the basic alpha-segment. 

The sampling rate for a particular State was achieved using a combination of the alpha-segment portions. As a result, each State contains some sample (i.e., a portion) from each of the 40 alpha-segments, depending on the sampling rate for the State.  For example, selecting the entire 40 complete alpha segments on a State list is expected to constitute a 16 percent sampling rate (40 ÷ 250 = 0.16) in the State. This is because each alpha segment contained an expected 0.4% of the State’s RN names (40 X 0.4 percent = 16 percent).  Likewise, the sample for a State with an 8 percent sampling rate consisted of the 40 ½ portion selections.  Several sampling rates use a combination of portions for each alpha-segment in sample (rather than one fractional portion for all alpha-segments).  For example, a 5 percent sampling rate was achieved by first randomly dividing the 40 alpha-segments into two groups, the first containing 30 alpha-segments and the other containing 10; and by using the ¼ portions from the first group and the ½ portions from the second group (0.4 percent x [(30 x ¼) + (10 x ½)] = 5 percent). 

To identify and account for nurses appearing in more than one of the 51 State lists, the portions were constructed such that each portion was “nested” (or included) in the boundaries of the larger portion.  As a result, the alpha segment clusters from the States with lower sampling rates (typically larger States) were automatically included in the alpha segment clusters selected from the States with higher sampling rates (typically smaller States).  

As a result, a RN who was licensed under the same name in two States with identical sampling rates was selected (or not selected) for both States, since the alphabetic name boundaries defining the portions are the same for both States.  However, if the RN was licensed under the same name in two States that are sampled with different sampling rates, then, if the RN was sampled in the State with a lower sampling rate, they were also included in the sample for the State with the higher sampling rate (as the alphabetic name boundaries defining the portions for the State with the lower sampling rate are nested within those of the State with the higher sampling rate).  This nesting property of the sample design maximizes the chances that the RN will be selected in all States that they have an active license in.  A nurse that is licensed in two or more States under the same name will have a probability of selection corresponding to the State with the highest sampling rate.

Sample design optimization techniques developed by Chromy (1996) were used to determine how to allocate the sample of 54,000 RNs to the 51 State lists.  This sample size was then converted to a sampling rate, and the rate was rounded to one of the admissible rates for the nesting design.  For example, the original rate for the State of Washington was 1.59%, the closest admissible rate was 1.5%.   Rates were rounded down only such that the change in sampling rate still left their effective sample size at or above the 1996 NSSRN level. 

After determination of frame sizes and expected sampling rates, the States were assigned a priority order to properly determine selection probabilities for nurses appearing on more than one of the 51 State lists.  Traditionally, States were ordered by size, with larger States having lower sampling rates and smaller States having higher sampling rates.  However, as in the 2000 NSSRN, States were priority ordered based on their sampling rate.  As such, it is mostly, but not necessarily, the case that States with larger RN populations had lower sampling rates.

Essentially the same procedure was followed for sample selection for all States. Once a State provided a licensure file containing all appropriate names of individuals with active RN licenses and meeting all specifications, the required sample names in that file were selected.  Regardless of the way a State alphabetized and standardized the names in its files, the sample names were selected according to the standards established by the survey design.  That is, sample selections ignored blanks and punctuation in the last names (except a dash in hyphenated names) and ignored titles (e.g.,”Sister”). 

Registered nurses were selected in the sample on the basis of name, with an RN being included in the sample if the name of licensure fell within a specific alpha-segment portion as defined by the State sampling rate.  In other words, the sample for a given State consisted of all RN names falling into any one of the State’s pre-designated 40 alphabetic portions that corresponded to the State sampling rate (one portion from each of the complete 40 alpha-segments in sample). 

The pairs of names that defined the alpha-segment portion constituted the lower and upper boundaries corresponding to the sampling rate.  Thus, the membership of the alpha-segment portion was defined by all names, beginning with the lower boundary (i.e., the last name in alphabetical order of all the names included in that segment), up to but not including a name that defined the upper boundary.  This latter name fell into the next alpha-segment.  As was done in the NSSRN 2000, any deviations of more than 8 percent were candidates for either an increased or decreased rate. 

Because the survey is longitudinal in nature, a panel structure was constructed to allow for several of the sample alpha-segments to be systematically replaced each survey.  Under the original survey design, the 40 sample alpha-segments were arranged in alphabetical order and then partitioned into eight groups of five successive alpha-segments each.  One segment from each group was randomly assigned to each panel, so that each panel consisted of segments that spanned the entire alphabet.  For each successive survey, a new panel (consisting of eight new alpha-segments or 20 percent of the sample) was entered into the sample, replacing one of the five panels from the previous survey.  Under this scheme, a nurse who maintained an active license in the same State(s) could be retained in the sample for up to five surveys. 

The planned NSSRN 2004 sample size was 54,000 cases, similar to that of the NSSRN 2000, and up from the 45,000 used in previous studies.  Planned sampling rates ranged from 1.125 percent in several of the largest States to 15 percent in Wyoming.  This translated into planned sample sizes ranging from 3,225 RNs in California to approximately 796 in Wyoming.  The initial round of sampling, however, yielded a much smaller sample than expected due to the variable size of the alpha-segments in each State.  Thus, a second round of sampling was done by increasing the sampling rates from 1 to 1.125 in the eleven largest States and “adding to” the sample selected in the first round, yielding a total of 56,917 sample cases.  After eliminating cross-State duplications, the expected the sample size to be fielded was still approximately 54,000 cases. 

Table B-1 in Appendix B shows the sampling rates and sample sizes that were planned and actually obtained for the 51 States in the survey.  Differences between planned and actual sampling rates result from State-specific variation in the distribution of nurses’ names. States are priority ordered by sampling rate and size.

Because many nurses are licensed in more than one State, their names could be selected in the sample more than once.  In accordance with the sample design, we ensured that each sampled RN was retained in the outgoing sample file exactly once to avoid multiple questionnaires being sent to nurses.  If we identified an exact duplicate, the nurse in the lower priority State was coded as a duplicate of the sample member in the higher priority State.  For example, an Alaska record was coded as a duplicate to the sample record in Wyoming.  Following data collection, these expected duplicates were reviewed to ensure that the nurse reported a license in both of the States. 

Table B-1. State Sampling Rates and Sample Sizes (Priority Ordered)

Sampling Rate Percentage

State

Priority Order

Frame Size

Planned

Actual [2]

Actual Sample Size

TOTAL

 

3,252,548

   

56,917

Wyoming

1

5,309

15.00%

15.60%

828

Alaska

2

7,389

13.00%

11.88%

878

Vermont

3

8,728

10.00%

9.53%

832

District of Columbia

4

17,104

10.00%

9.71%

1,661

North Dakota

5

8,139

9.00%

9.74%

793

Delaware

6

10,407

9.00%

8.87%

923

Montana

7

10,885

8.00%

8.15%

887

South Dakota

8

10,773

7.00%

6.88%

741

Idaho

9

12,769

7.00%

6.75%

862

Hawaii

10

13,548

7.00%

7.44%

1,008

Nevada

11

19,201

7.00%

6.25%

1,200

Rhode Island

12

17,203

5.50%

5.37%

923

New Mexico

13

17,544

5.00%

4.98%

874

New Hampshire

14

19,108

5.00%

4.71%

900

Utah

15

19,210

4.50%

4.97%

954

Maine

16

19,869

4.50%

4.50%

894

Nebraska

17

20,100

3.50%

3.56%

716

Arkansas

18

27,878

3.50%

3.52%

982

West Virginia

19

21,295

3.50%

3.13%

667

Mississippi

20

31,734

3.00%

3.13%

994

Oklahoma

21

32,185

3.00%

2.93%

944

Kansas

22

34,047

3.00%

3.10%

1,057

Iowa

23

40,312

2.50%

2.31%

933

South Carolina

24

38,265

2.50%

2.47%

944

Oregon

25

38,453

2.00%

1.95%

750

Louisiana

26

43,299

2.00%

1.75%

757

Colorado

27

48,586

2.00%

2.14%

1,042

Connecticut

28

52,364

2.00%

1.96%

1,025

Alabama

29

46,974

1.75%

1.81%

852

Kentucky

30

47,123

1.75%

1.77%

832

Arizona

31

51,482

1.75%

1.72%

887

Maryland

32

56,922

1.50%

1.47%

835

Washington

33

66,397

1.50%

1.44%

954

Minnesota

34

66,434

1.50%

1.59%

1,056

Wisconsin

35

63,865

1.25%

1.24%

793

Tennessee

36

65,827

1.25%

1.29%

849

Indiana

37

70,488

1.25%

1.23%

867

Missouri

38

74,508

1.25%

1.28%

953

Georgia

39

86,369

1.25%

1.26%

1,086

Virginia

40

85,705

1.25%

1.21%

1,036

North Carolina

41

96,877

1.125%

1.146%

1,110

Massachusetts

42

105,206

1.125%

1.350%

1,420

New Jersey

43

109,726

1.125%

1.067%

1,171

Michigan

44

117,360

1.125%

1.161%

1,363

Ohio

45

140,689

1.125%

1.124%

1,581

Illinois

46

154,572

1.125%

1.124%

1,738

Texas

47

176,652

1.125%

1.066%

1,883

Pennsylvania

48

191,628

1.125%

1.037%

1,988

Florida

49

201,113

1.125%

1.086%

2,184

New York

50

244,288

1.125%

1.061%

2,592

California

51

286,639

1.125%

1.018%

2,918

Weighting Procedures

The probability sample design of the survey permits the computation of unbiased estimates of characteristics of the RN population at the National and State level.  These estimates are based on weights that reflect the complex design and compensate for the potential risk of nonresponse bias to the extent feasible.  The weights that are assigned to each sample nurse may be interpreted as the number of nurses in the target population that the sample nurse represents. The sampling weight for an RN is the reciprocal of the nurse’s probability of selection in her/his priority State, adjusted to account for nonresponse and multiple licenses.

Before computing the weights, the original State frame sizes (shown above) were adjusted to account for duplicate licenses within States and ineligible licenses (i.e., frame errors) found in the sample.  Most within-State duplicates were identified at the time of initial list processing, but a few were identified after sample selection.  The ineligible licenses were identified in the process of reconciling the State and nurse reported licenses.  Some of the inconsistencies between the State reported data and the nurse reported data are due to the time period that elapsed between frame construction and data collection (a period during which changes and license expirations naturally occur).  Other differences are due to errors in either the State list or the nurse’s questionnaire.  Cases that could not be reconciled by Gallup were sent to the State Boards of Nursing for resolution.

In both cases, the frame total is computed by subtracting the estimated number of ineligible and duplicate licenses from the State’s original frame count.  The adjusted frame total used to compute the resulting weights for State i can be computed as:

where:

Ni = the total number of licenses on State i list,

= the estimated number of within-State duplicates in State i,and

= the estimated number of frame errors in State i (e.g., licenses listed by State that were not reported by a responding nurse).

Each responding nurse was assigned a weight corresponding to their unique ‘priority State’; that is, the State with the highest sampling rate from which he or she was licensed and selected into the sample.In other words, the weight is reflective of the probability of selecting the sampled nurse in their “priority” State.  All nurses with the same priority State have an equal probability of being selected and, consequently, have equal initial sampling weights.  The sum of the weights for all nurse respondents assigned to a specific priority State will equal, approximately, the total number of active licenses on the list (at the time the sample was drawn) less the number of those licenses assigned to higher priority lists.

The weights were computed sequentially for each State A, B, etc., where A was the highest-priority State, and B the next-highest-priority State.  The weight for an RN sampled from the highest priority State, State A, was the ratio of the adjusted count of licenses in the sampling frame for State A to the number of eligible respondents licensed in State A.  For State B, and the remaining States, the numerator and denominator of this ratio were adjusted to account for State A and other higher-priority States.  To describe the basic method, the following terms are defined:

N(i) = total number of licenses for State i (adjusted for within-State duplicates and frame errors)

m(i) = number of  eligible respondents  for   State i that did not have a license in a higher-priority State

n(i,j) = number of eligible respondents with a license in both State i and State j [note n(i,i) denotes the number of eligible respondents with a license only in State i]

W(i) = the adjusted weight for eligible respondents who were assigned to the higher priority State i

The weight for State A was computed as follows:

W(A) = N(A) / m(A).

For the State B weight, W(B), the numerator was the adjusted frame count of licenses for State B, N(B), after removing the estimated total count of State B nurses who were also licensed in State A (i.e., W(A) n(A,B)).  Similarly, the numerator of W(C) excluded State C nurses who were also licensed in either State A or State B (i.e., W(A) n(A,C) + W(B) n(B,C)).  That is, for the State B weight and the State C weight, the computations were:

W(B)  = [N(B) - W(A) n(A,B)] / m(B)

W(C)  = [N(C) - W(A) n(A,C) - W(B) n(B,C)] / m(C) .

In either case, the denominator was the number (m(B) or m(C)) of respondents in the State not licensed in a higher-priority State.

In general, the numerator of a State I weight, W(I), was the total adjusted frame count of RN licenses in State I after removing the estimated total count of State I nurses also licensed in higher-priority States.  The denominator, m(I), was the number of State I respondents not licensed in a higher-priority State.  This weighting scheme incorporated both a nonresponse adjustment that inflated the respondents’ data to account for those that did not respond to the survey and a duplication adjustment to account for duplication in the sampling frame across States.  These final analysis weights will serve to differentially weight responding nurses to reflect the level of disproportionality in the final respondent sample relative to the population.

Estimation Procedure

Final NSSRN estimates can be computed using the final set of sampling weights, Wk (for sample nurse k).  For example, an estimate of the total number of RNs working in a particular State is based on the following indicator variable, Xk:

Xk = 1 if nurse k worked in a particular State,

= 0 otherwise.

The desired estimated total may then be written as

the sum being over all sample nurses.

Estimates of ratios and averages are obtained as the ratio of estimated totals.

Sampling and Nonsampling Errors

To the extent that samples are sufficiently large, relatively precise estimates of characteristics of the licensed RN population of the United States can be made because of the underlying probability structure of the sample data.  Such estimates are, sometimes, an imperfect approximation of the truth.  Several sources of error could cause sample estimates to differ from the corresponding true population value.  These sources of error are commonly classified into two major categories: sampling errors and nonsampling errors.

A probability sample such as the one used in this study is designed so that estimates of the magnitude of the sampling error can be computed from the sample data.  In addition, nonsystematic components of nonsampling error are also reflected in the sampling error estimates.

Nonsampling Errors

Some sources of error, such as unusable responses to vague or sensitive questions; no responses from some nurses; and errors in coding, scoring, and processing the data are, to a considerable extent, beyond the control of the sampling statistician.  They are called “nonsampling errors” and also occur in cases where there is a complete enumeration of a target population, such as the U.S. Census. Among the activities that were directed at reducing nonsampling errors to the lowest level feasible for this survey included careful planning, keeping nonresponses to the lowest feasible level, and coding and processing of the sample data.

If nonsampling errors are random, in the sense that they are independent and tend to be compensating from one respondent to another, then they do not cause bias in estimates of totals, percents, or averages.  Furthermore, the contribution from such nonsampling errors will automatically be included in the sampling errors that are estimated from the sample data.  However, correlations or relationships in cross-tabulations are often decreased by such errors, and sometimes substantially.  Thus, random errors that tend to be compensated for