home

Test Score Precision

Precision is a measure of consistency or agreement between scores and concerns the degree to which errors of measurement affect test scores. Measurement errors do not usually refer to inconsistencies in the aptitudes or behaviors being assessed; rather, these errors are related to factors that prevent an individual from achieving a score identical to their true latent ability or score.

There are many ways in which precision can be measured. In traditional test theory, precision is measured by a reliability coefficient, which is the ratio of true score variance to observed score variance (Lord & Novick, 1968):

Because true score variance can be computed as the difference between observed score variance and error variance, classical reliability can be represented as:

Item Response Theory (IRT) provides a means of estimating reliability that operates on the item characteristics and the individual pattern of responses given by examinees to items within a test. The IRT analogue to classical reliability is called marginal reliability, and operates on the variance of the theta scores and the average of the expected error variance (Sireci, Thissen, & Wainer, 1991):

If it can be safely assumed that theta is distributed N(0,1), then marginal reliability can be measured as:

When sample sizes are large, the average of the expected error variance can be computed by averaging the variance of the estimated posterior distributions across individuals. In the reliabilities reported below, the posterior standard deviation (PSD) for individual i was estimated using the methodology given in Bock and Mislevy (1982):

ASVAB Reliabilities
For each ASVAB subtest, Equation 6 was used to compute EAP ability estimates for applicants that completed the test during the 2009 fiscal year (FY2009; October 1, 2008 — September 30, 2009). Equation 5 was then used to compute PSDs (using the EAP ability estimates, and assuming a N(0,1) population distribution). The average of the squared PSDs was then computed over applicants, and substituted into Equation 4 to compute subtest reliability.

For AFQT scores, reliability was computed using the methodology for computing composite reliabilities reported in Gulliksen (1987; pg. 346-347, Equation 74).

Reliability estimates were computed over all FY2009 applicants, and by gender (Male, Female), ethnic group (Hispanic, Non-Hispanic), and race (American-Indian/Alaska Native, Asian, Black/African-American, Native Hawaiian/other Pacific Islander, White/Caucasian).

The sample sizes used to compute the reliability estimates across subtests and AFQT scores are given in the table below.

Table 1 click to display table

The estimated reliabilities for AFQT scores and the subtests that comprise AFQT scores are reported in the table below. [Click here to learn more about AFQT scores.] [Click here to learn more about the content of the ASVAB subtests.]

Table 2 click to display table

The estimated reliabilities for the remaining ASVAB subtests are given in the table below. Note that AI and SI are administered as separate subtests in CAT-ASVAB, but combined into one single score (labeled AS). AI and SI are combined into one single subtest (AS) in P&P-ASVAB. Scores on the combined subtest (AS) are reported for both CAT-ASVAB and P&P-ASVAB.

Table 3 click to display table

ASVAB Standard Errors of Measurement
The standard error of measurement (SEM) provides an alternate way of summarizing the amount of error or inconsistency in test scores. It is computed as:

where is the observed score standard deviation for test x. If the measurement error is normally distributed and the reported scores are unbiased, then the true scores for approximately 68% of the applicants would fall in the interval created by adding and subtracting one SEM from their reported score.

The SEM of each ASVAB subtest and AFQT score was computed over all FY2009 applicants, and by gender (Male, Female), ethnic group (Hispanic, Non-Hispanic), and race (American-Indian/Alaska Native, Asian, Black/African-American, Native Hawaiian/other Pacific Islander, White/Caucasian). The sample sizes are shown above.

The SEMs for AFQT scores and the subtests that comprise AFQT scores are reported in the table below.


Table 4 click to display table

The SEMs for the remaining ASVAB subtests are given in the table below. Note that the SEM computations for AI and SI are based on the observed standard deviation of the AS score, since separate scores are not reported for AI and SI.

Table 5 click to display table

 

 

 

Researchers Home

History Of Military Testing

ASVAB Scoring

The CAT-ASVAB

Test Score Precision

Validity Information

Fairness Information

Norming Information

References & Documentation

ASVAB Fact Sheet

 

ASVAB Subtests