Signal Detection Analysis

Purpose

This skill encodes expert methodological knowledge for applying Signal Detection Theory (SDT) to behavioral and cognitive science data. SDT separates an observer's perceptual sensitivity from their decision criterion -- a distinction that raw accuracy conflates. A competent programmer without cognitive science training would typically compute percent correct, missing the critical insight that two observers with identical accuracy can differ drastically in their ability to detect signals vs. their willingness to say "yes."

When to Use SDT (Not Simple Accuracy)

Use SDT whenever:

Stimuli belong to two classes (signal vs. noise, old vs. new, present vs. absent) and the observer makes a binary classification
You need to distinguish how well someone can discriminate (sensitivity) from how willing they are to respond in a particular way (bias/criterion)
Response bias may differ across conditions, groups, or time points, making raw accuracy misleading
You want a measure that is independent of base rates and payoff structures

Do not use standard SDT when:

There are more than two stimulus classes (use multi-class extensions or confusion matrices)
Responses are continuous rather than categorical (use regression-based approaches)
The task has no noise distribution (e.g., simple threshold detection with catch trials absent)

Research Planning Protocol

Before executing the domain-specific steps below, you MUST:

State the research question — What sensitivity or bias question is this SDT analysis addressing?
Justify the method choice — Why SDT (not simple accuracy, logistic regression, etc.)? What alternatives were considered?
Declare expected outcomes — Do you expect sensitivity differences, bias differences, or both?
Note assumptions and limitations — What does SDT assume (e.g., Gaussian distributions, equal variance)? Where could it mislead?
Present the plan to the user and WAIT for confirmation before proceeding.

For detailed methodology guidance, see the research-literacy skill.

⚠️ Verification Notice

This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.

Core Concepts

The 2x2 Response Matrix

Every SDT analysis begins with classifying each trial into one of four categories:

	Signal Present	Signal Absent
"Yes" Response	Hit (H)	False Alarm (FA)
"No" Response	Miss (M)	Correct Rejection (CR)

From these four cells, compute two rates:

Hit Rate: H / (H + M) = proportion of signal trials correctly identified
False Alarm Rate: FA / (FA + CR) = proportion of noise trials incorrectly called "signal"

Sensitivity: d' (d-prime)

d' measures the distance between the signal and noise distributions in standard deviation units, assuming equal-variance Gaussian distributions (Green & Swets, 1966, Ch. 1):

d' = z(Hit Rate) - z(False Alarm Rate)

where z() is the inverse of the standard normal CDF (the z-transform).

What d' values mean in practice (Macmillan & Creelman, 2005; Table 1.1):

d' Value	Yes/No Interpretation	2AFC % Correct	Practical Meaning
0	Chance performance	50%	No discrimination ability
0.5	Low sensitivity	~60%	Barely above chance
1.0	Moderate sensitivity	~69%	Often used as threshold (Green & Swets, 1966, Ch. 4)
2.0	Good sensitivity	~84%	Reliable discrimination
2.5	High sensitivity	~90%	Strong discrimination
3.0+	Near-ceiling	>93%	Approaching perfect; check for floor/ceiling issues

The typical experimental range avoiding floor/ceiling effects is d' = 0.5 to 2.5 (Macmillan & Creelman, 2005).

Bias Measures: c, beta, and c'

SDT provides three interchangeable bias measures. The choice matters when d' varies across conditions.

Criterion location c (Macmillan & Creelman, 2005, Ch. 2):

c = -0.5 x [z(Hit Rate) + z(False Alarm Rate)]

c = 0: unbiased (optimal for equal base rates and symmetric payoffs)
c > 0: conservative (tendency to say "no" / fewer false alarms, fewer hits)
c < 0: liberal (tendency to say "yes" / more hits, more false alarms)

Likelihood ratio beta (Green & Swets, 1966, Ch. 1):

ln(beta) = d' x c

beta = 1: unbiased
beta > 1: conservative
beta < 1: liberal

Relative criterion c' (Macmillan & Creelman, 2005, Ch. 2):

c' = c / d'

Normalizes criterion placement by sensitivity; useful when comparing bias across conditions with different d' values.

Which bias measure to use (Macmillan & Creelman, 2005, Ch. 2):

Use c as the default -- it is statistically independent of d', defined when d' = 0, and symmetric around chance
Use beta when testing whether observers approximate an optimal likelihood-ratio decision rule (e.g., recognition memory; Stretch & Wixted, 1998)
Use c' when you need to compare bias across conditions where d' changes substantially

Decision Logic: Choosing a Sensitivity Measure

Is the task a single-interval (yes/no) design?
|
+-- YES --> Are assumptions of equal-variance Gaussian distributions met?
| |
| +-- YES --> Use d' = z(H) - z(FA) (Green & Swets, 1966)
| |
| +-- NO, distributions have unequal variance
| | --> Use da with estimated variance ratio
| | (Macmillan & Creelman, 2005, Ch. 3)
| |
| +-- NO, distributions are non-Gaussian or unknown
| --> Use Az (area under the ROC curve)
| (Swets, 1986; Macmillan & Creelman, 2005, Ch. 3)
|
+-- NO --> Is it a two-interval forced choice (2AFC/2IFC)?
 |
 +-- YES --> d'(2AFC) = z(proportion correct) x sqrt(2)
 | (Green & Swets, 1966, Ch. 6; Macmillan & Creelman, 2005, Ch. 5)
 |
 +-- NO --> Is it same-different or ABX?
 |
 +-- YES --> Use paradigm-specific formulas
 | (see references/sdt-formulas.md)
 |
 +-- NO --> Is it a rating-scale (confidence) design?
 |
 +-- YES --> Construct ROC from rating data;
 use Az or fit parametric model
 (Macmillan & Creelman, 2005, Ch. 3)

When to Use Az Instead of d'

Use the area under the ROC curve (Az) when:

You have rating-scale data (multiple confidence levels) and can construct a full ROC
The equal-variance assumption is violated (common in recognition memory, where the zROC slope is typically ~0.80 rather than 1.0; Mickes, Wixted, & Wais, 2007; Ratcliff, Sheu, & Gronlund, 1992)
You want a distribution-free sensitivity measure that does not assume Gaussian internals (Swets, 1986)

AUC benchmarks (Swets, Dawes, & Monahan, 2000):

AUC Range	Interpretation
0.50	Chance (no discrimination)
0.70 - 0.80	Fair diagnostic accuracy
0.80 - 0.90	Good diagnostic accuracy
0.90 - 1.00	Excellent diagnostic accuracy

Common Paradigms

Yes/No Detection

The canonical SDT paradigm. On each trial, either a signal or noise is presented; the observer responds "yes" (signal present) or "no" (signal absent). Yields H and FA rates directly.

d' = z(H) - z(FA)
Bias c = -0.5 x [z(H) + z(FA)]

Two-Alternative Forced Choice (2AFC / 2IFC)

Two intervals are presented (one signal, one noise); the observer selects the signal interval. Only proportion correct is measured; there is no independent FA rate, and no bias measure can be computed.

d'(2AFC) = z(proportion correct) x sqrt(2) (Green & Swets, 1966, Ch. 6)
d'(2AFC) = d'(yes/no) x sqrt(2) (Macmillan & Creelman, 2005, Ch. 5)

Critical domain pitfall: A task where the observer chooses between two labels (e.g., "left" or "right") on a single stimulus is not a 2AFC -- it is a yes/no task in disguise (Macmillan & Creelman, 2005). True 2AFC requires two temporal or spatial intervals.

Rating Scale (Confidence Ratings)

Observers make a detection judgment plus a confidence rating (e.g., 1-6 scale from "sure noise" to "sure signal"). Each confidence boundary yields a separate (H, FA) pair, constructing a multi-point ROC.

Fit with parametric (Gaussian) or nonparametric methods
Compute Az from the fitted ROC
The zROC slope estimates the variance ratio of the two distributions

Same-Different

Two stimuli are presented; the observer judges "same" or "different." Two observer models exist (Macmillan & Creelman, 2005, Ch. 6):

Independent observations model: observer compares each stimulus to an internal criterion
Differencing model: observer computes the difference between the two percepts

These yield different d' formulas; see references/sdt-formulas.md.

ABX (Oddity / AX)

Stimulus A, then B, then X (which matches A or B); the observer identifies X. Sensitivity depends on assumed observer strategy (Macmillan & Creelman, 2005, Ch. 6). See references/sdt-formulas.md.

Handling Extreme Hit/False Alarm Rates

When H = 1.0 or FA = 0.0, z-scores become infinite and d' is undefined. This is a common computational pitfall that requires correction.

Correction Methods

1. The 1/(2N) rule (Macmillan & Kaplan, 1985):

Replace 0 with 0.5/N
Replace 1 with (N - 0.5)/N
Where N = number of signal trials (for H) or noise trials (for FA)
Applied only to extreme values

2. The log-linear rule (Hautus, 1995) -- recommended:

Add 0.5 to every cell in the 2x2 matrix (hits, misses, FA, CR) before computing rates
Applied to all cells, regardless of whether extremes are present
Produces less biased estimates that consistently underestimate true d' (Hautus, 1995)

Which to use: The log-linear rule is preferred because it produces less biased d' estimates and avoids the asymmetric bias of the 1/(2N) rule, which can either over- or underestimate d' (Hautus, 1995). Apply the log-linear correction routinely, not just when extremes occur, for consistency across participants and conditions.

The Unequal-Variance Problem

Why It Matters

Standard d' assumes signal and noise distributions have equal variance. In recognition memory, this assumption is routinely violated: zROC slopes are typically ~0.80 (not 1.0), indicating the old-item (target) distribution has ~25% more variance than the new-item (lure) distribution (Ratcliff, Sheu, & Gronlund, 1992; Mickes, Wixted, & Wais, 2007).

Consequences of Ignoring It

If variances are unequal and you compute standard d', the measure is not criterion-free -- it will vary with criterion placement even if true sensitivity is constant (Macmillan & Creelman, 2005, Ch. 3).

What to Do

Collect rating-scale data to construct a zROC
If the zROC slope deviates from 1.0, use the unequal-variance model
Compute da (the unequal-variance sensitivity measure); see references/sdt-formulas.md

Common Pitfalls

Using percent correct instead of d': Percent correct confounds sensitivity and bias. Two observers with identical discrimination ability but different criteria will have different accuracy scores (Green & Swets, 1966, Ch. 1).
Treating a single-stimulus forced choice as 2AFC: If only one stimulus is presented per trial and the observer picks a label, this is a yes/no design, not 2AFC. Using the 2AFC formula will yield incorrect d' values (Macmillan & Creelman, 2005).
Ignoring extreme rate corrections: Computing d' without correcting H = 1 or FA = 0 produces infinite values. Always apply the log-linear correction (Hautus, 1995).
Assuming equal variance in recognition memory: Recognition memory data almost always show unequal variance (zROC slope ~0.80). Standard d' is not criterion-free in this domain (Ratcliff, Sheu, & Gronlund, 1992).
Interpreting c as "response bias" without checking: c measures where the criterion is placed relative to distributions, not why it is placed there. A shift in c can reflect rational adaptation to base rates, not irrational bias (Macmillan & Creelman, 2005, Ch. 2).
Comparing d' across paradigms without conversion: d' values from yes/no and 2AFC designs are not directly comparable. d'(2AFC) = d'(yes/no) x sqrt(2). Failure to convert leads to erroneous sensitivity comparisons (Green & Swets, 1966, Ch. 6).
Averaging d' across participants without caution: d' is nonlinearly related to H and FA rates. Averaging H and FA rates first, then computing d', gives different results than averaging individual d' values. The appropriate method depends on the research question (Macmillan & Creelman, 2005, Ch. 8).

Minimum Reporting Checklist

Based on Macmillan & Creelman (2005) and Stanislaw & Todorov (1999):

References

Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley.
Hautus, M. J. (1995). Corrections for extreme proportions and their biasing effects on estimated values of d'. Behavior Research Methods, Instruments, & Computers, 27, 46-51.
Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user's guide (2nd ed.). Mahwah, NJ: Erlbaum.
Macmillan, N. A., & Kaplan, H. L. (1985). Detection theory analysis of group data. Psychological Bulletin, 98, 185-199.
Maniscalco, B., & Lau, H. (2012). A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings. Consciousness and Cognition, 21, 422-430.
Mickes, L., Wixted, J. T., & Wais, P. E. (2007). A direct test of the unequal-variance signal detection model of recognition memory. Psychonomic Bulletin & Review, 14, 858-865.
Ratcliff, R., Sheu, C. F., & Gronlund, S. D. (1992). Testing global memory models using ROC curves. Psychological Review, 99, 518-535.
Stanislaw, H., & Todorov, N. (1999). Calculation of signal detection theory measures. Behavior Research Methods, Instruments, & Computers, 31, 137-149.
Stretch, V., & Wixted, J. T. (1998). On the difference between strength-based and frequency-based mirror effects in recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 1379-1396.
Swets, J. A. (1986). Indices of discrimination or diagnostic accuracy. Psychological Bulletin, 99, 100-117.
Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285-1293.
Swets, J. A., Dawes, R. M., & Monahan, J. (2000). Psychological science can improve diagnostic decisions. Psychological Science in the Public Interest, 1, 1-26.

See references/sdt-formulas.md for detailed mathematical formulas and lookup tables. See references/application-guide.md for domain-specific applications.