Signal Detection Analysis
Purpose
This skill encodes expert methodological knowledge for applying Signal Detection Theory (SDT) to behavioral and cognitive science data. SDT separates an observer's perceptual sensitivity from their decision criterion -- a distinction that raw accuracy conflates. A competent programmer without cognitive science training would typically compute percent correct, missing the critical insight that two observers with identical accuracy can differ drastically in their ability to detect signals vs. their willingness to say "yes."
When to Use SDT (Not Simple Accuracy)
Use SDT whenever:
- Stimuli belong to two classes (signal vs. noise, old vs. new, present vs. absent) and the observer makes a binary classification
- You need to distinguish how well someone can discriminate (sensitivity) from how willing they are to respond in a particular way (bias/criterion)
- Response bias may differ across conditions, groups, or time points, making raw accuracy misleading
- You want a measure that is independent of base rates and payoff structures
Do not use standard SDT when:
- There are more than two stimulus classes (use multi-class extensions or confusion matrices)
- Responses are continuous rather than categorical (use regression-based approaches)
- The task has no noise distribution (e.g., simple threshold detection with catch trials absent)
Research Planning Protocol
Before executing the domain-specific steps below, you MUST:
- State the research question — What sensitivity or bias question is this SDT analysis addressing?
- Justify the method choice — Why SDT (not simple accuracy, logistic regression, etc.)? What alternatives were considered?
- Declare expected outcomes — Do you expect sensitivity differences, bias differences, or both?
- Note assumptions and limitations — What does SDT assume (e.g., Gaussian distributions, equal variance)? Where could it mislead?
- Present the plan to the user and WAIT for confirmation before proceeding.
For detailed methodology guidance, see the research-literacy skill.
⚠️ Verification Notice
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
Core Concepts
The 2x2 Response Matrix
Every SDT analysis begins with classifying each trial into one of four categories:
| Signal Present | Signal Absent | |
|---|---|---|
| "Yes" Response | Hit (H) | False Alarm (FA) |
| "No" Response | Miss (M) | Correct Rejection (CR) |
From these four cells, compute two rates:
- Hit Rate: H / (H + M) = proportion of signal trials correctly identified
- False Alarm Rate: FA / (FA + CR) = proportion of noise trials incorrectly called "signal"
Sensitivity: d' (d-prime)
d' measures the distance between the signal and noise distributions in standard deviation units, assuming equal-variance Gaussian distributions (Green & Swets, 1966, Ch. 1):
d' = z(Hit Rate) - z(False Alarm Rate)
where z() is the inverse of the standard normal CDF (the z-transform).
What d' values mean in practice (Macmillan & Creelman, 2005; Table 1.1):
| d' Value | Yes/No Interpretation | 2AFC % Correct | Practical Meaning |
|---|---|---|---|
| 0 | Chance performance | 50% | No discrimination ability |
| 0.5 | Low sensitivity | ~60% | Barely above chance |
| 1.0 | Moderate sensitivity | ~69% | Often used as threshold (Green & Swets, 1966, Ch. 4) |
| 2.0 | Good sensitivity | ~84% | Reliable discrimination |
| 2.5 | High sensitivity | ~90% | Strong discrimination |
| 3.0+ | Near-ceiling | >93% | Approaching perfect; check for floor/ceiling issues |
The typical experimental range avoiding floor/ceiling effects is d' = 0.5 to 2.5 (Macmillan & Creelman, 2005).
Bias Measures: c, beta, and c'
SDT provides three interchangeable bias measures. The choice matters when d' varies across conditions.
Criterion location c (Macmillan & Creelman, 2005, Ch. 2):
c = -0.5 x [z(Hit Rate) + z(False Alarm Rate)]
- c = 0: unbiased (optimal for equal base rates and symmetric payoffs)
- c > 0: conservative (tendency to say "no" / fewer false alarms, fewer hits)
- c < 0: liberal (tendency to say "yes" / more hits, more false alarms)
Likelihood ratio beta (Green & Swets, 1966, Ch. 1):
ln(beta) = d' x c
- beta = 1: unbiased
- beta > 1: conservative
- beta < 1: liberal
Relative criterion c' (Macmillan & Creelman, 2005, Ch. 2):
c' = c / d'
Normalizes criterion placement by sensitivity; useful when comparing bias across conditions with different d' values.
Which bias measure to use (Macmillan & Creelman, 2005, Ch. 2):
- Use c as the default -- it is statistically independent of d', defined when d' = 0, and symmetric around chance
- Use beta when testing whether observers approximate an optimal likelihood-ratio decision rule (e.g., recognition memory; Stretch & Wixted, 1998)
- Use c' when you need to compare bias across conditions where d' changes substantially
Decision Logic: Choosing a Sensitivity Measure
Is the task a single-interval (yes/no) design?
|
+-- YES --> Are assumptions of equal-variance Gaussian distributions met?
| |
| +-- YES --> Use d' = z(H) - z(FA) (Green & Swets, 1966)
| |
| +-- NO, distributions have unequal variance
| | --> Use da with estimated variance ratio
| | (Macmillan & Creelman, 2005, Ch. 3)
| |
| +-- NO, distributions are non-Gaussian or unknown
| --> Use Az (area under the ROC curve)
| (Swets, 1986; Macmillan & Creelman, 2005, Ch. 3)
|
+-- NO --> Is it a two-interval forced choice (2AFC/2IFC)?
|
+-- YES --> d'(2AFC) = z(proportion correct) x sqrt(2)
| (Green & Swets, 1966, Ch. 6; Macmillan & Creelman, 2005, Ch. 5)
|
+-- NO --> Is it same-different or ABX?
|
+-- YES --> Use paradigm-specific formulas
| (see references/sdt-formulas.md)
|
+-- NO --> Is it a rating-scale (confidence) design?
|
+-- YES --> Construct ROC from rating data;
use Az or fit parametric model
(Macmillan & Creelman, 2005, Ch. 3)
When to Use Az Instead of d'
Use the area under the ROC curve (Az) when:
- You have rating-scale data (multiple confidence levels) and can construct a full ROC
- The equal-variance assumption is violated (common in recognition memory, where the zROC slope is typically ~0.80 rather than 1.0; Mickes, Wixted, & Wais, 2007; Ratcliff, Sheu, & Gronlund, 1992)
- You want a distribution-free sensitivity measure that does not assume Gaussian internals (Swets, 1986)
AUC benchmarks (Swets, Dawes, & Monahan, 2000):
| AUC Range | Interpretation |
|---|---|
| 0.50 | Chance (no discrimination) |
| 0.70 - 0.80 | Fair diagnostic accuracy |
| 0.80 - 0.90 | Good diagnostic accuracy |
| 0.90 - 1.00 | Excellent diagnostic accuracy |
Common Paradigms
Yes/No Detection
The canonical SDT paradigm. On each trial, either a signal or noise is presented; the observer responds "yes" (signal present) or "no" (signal absent). Yields H and FA rates directly.
- d' = z(H) - z(FA)
- Bias c = -0.5 x [z(H) + z(FA)]
Two-Alternative Forced Choice (2AFC / 2IFC)
Two intervals are presented (one signal, one noise); the observer selects the signal interval. Only proportion correct is measured; there is no independent FA rate, and no bias measure can be computed.
- d'(2AFC) = z(proportion correct) x sqrt(2) (Green & Swets, 1966, Ch. 6)
- d'(2AFC) = d'(yes/no) x sqrt(2) (Macmillan & Creelman, 2005, Ch. 5)
Critical domain pitfall: A task where the observer chooses between two labels (e.g., "left" or "right") on a single stimulus is not a 2AFC -- it is a yes/no task in disguise (Macmillan & Creelman, 2005). True 2AFC requires two temporal or spatial intervals.
Rating Scale (Confidence Ratings)
Observers make a detection judgment plus a confidence rating (e.g., 1-6 scale from "sure noise" to "sure signal"). Each confidence boundary yields a separate (H, FA) pair, constructing a multi-point ROC.
- Fit with parametric (Gaussian) or nonparametric methods
- Compute Az from the fitted ROC
- The zROC slope estimates the variance ratio of the two distributions
Same-Different
Two stimuli are presented; the observer judges "same" or "different." Two observer models exist (Macmillan & Creelman, 2005, Ch. 6):
- Independent observations model: observer compares each stimulus to an internal criterion
- Differencing model: observer computes the difference between the two percepts
These yield different d' formulas; see references/sdt-formulas.md.
ABX (Oddity / AX)
Stimulus A, then B, then X (which matches A or B); the observer identifies X. Sensitivity depends on assumed observer strategy (Macmillan & Creelman, 2005, Ch. 6). See references/sdt-formulas.md.
Handling Extreme Hit/False Alarm Rates
When H = 1.0 or FA = 0.0, z-scores become infinite and d' is undefined. This is a common computational pitfall that requires correction.
Correction Methods
1. The 1/(2N) rule (Macmillan & Kaplan, 1985):
- Replace 0 with 0.5/N
- Replace 1 with (N - 0.5)/N
- Where N = number of signal trials (for H) or noise trials (for FA)
- Applied only to extreme values
2. The log-linear rule (Hautus, 1995) -- recommended:
- Add 0.5 to every cell in the 2x2 matrix (hits, misses, FA, CR) before computing rates
- Applied to all cells, regardless of whether extremes are present
- Produces less biased estimates that consistently underestimate true d' (Hautus, 1995)
Which to use: The log-linear rule is preferred because it produces less biased d' estimates and avoids the asymmetric bias of the 1/(2N) rule, which can either over- or underestimate d' (Hautus, 1995). Apply the log-linear correction routinely, not just when extremes occur, for consistency across participants and conditions.
The Unequal-Variance Problem
Why It Matters
Standard d' assumes signal and noise distributions have equal variance. In recognition memory, this assumption is routinely violated: zROC slopes are typically ~0.80 (not 1.0), indicating the old-item (target) distribution has ~25% more variance than the new-item (lure) distribution (Ratcliff, Sheu, & Gronlund, 1992; Mickes, Wixted, & Wais, 2007).
Consequences of Ignoring It
If variances are unequal and you compute standard d', the measure is not criterion-free -- it will vary with criterion placement even if true sensitivity is constant (Macmillan & Creelman, 2005, Ch. 3).
What to Do
- Collect rating-scale data to construct a zROC
- If the zROC slope deviates from 1.0, use the unequal-variance model
- Compute da (the unequal-variance sensitivity measure); see
references/sdt-formulas.md
Common Pitfalls
-
Using percent correct instead of d': Percent correct confounds sensitivity and bias. Two observers with identical discrimination ability but different criteria will have different accuracy scores (Green & Swets, 1966, Ch. 1).
-
Treating a single-stimulus forced choice as 2AFC: If only one stimulus is presented per trial and the observer picks a label, this is a yes/no design, not 2AFC. Using the 2AFC formula will yield incorrect d' values (Macmillan & Creelman, 2005).
-
Ignoring extreme rate corrections: Computing d' without correcting H = 1 or FA = 0 produces infinite values. Always apply the log-linear correction (Hautus, 1995).
-
Assuming equal variance in recognition memory: Recognition memory data almost always show unequal variance (zROC slope ~0.80). Standard d' is not criterion-free in this domain (Ratcliff, Sheu, & Gronlund, 1992).
-
Interpreting c as "response bias" without checking: c measures where the criterion is placed relative to distributions, not why it is placed there. A shift in c can reflect rational adaptation to base rates, not irrational bias (Macmillan & Creelman, 2005, Ch. 2).
-
Comparing d' across paradigms without conversion: d' values from yes/no and 2AFC designs are not directly comparable. d'(2AFC) = d'(yes/no) x sqrt(2). Failure to convert leads to erroneous sensitivity comparisons (Green & Swets, 1966, Ch. 6).
-
Averaging d' across participants without caution: d' is nonlinearly related to H and FA rates. Averaging H and FA rates first, then computing d', gives different results than averaging individual d' values. The appropriate method depends on the research question (Macmillan & Creelman, 2005, Ch. 8).
Minimum Reporting Checklist
Based on Macmillan & Creelman (2005) and Stanislaw & Todorov (1999):
- Paradigm type (yes/no, 2AFC, rating, same-different, ABX)
- Number of signal and noise trials per condition
- Hit rate and false alarm rate (or full rating distribution)
- Correction method used for extreme proportions (log-linear or 1/2N)
- Sensitivity measure (d', da, Az) with justification for choice
- Bias measure (c, beta, c') with justification for choice
- Whether equal- or unequal-variance model was used (and estimated variance ratio if unequal)
- If rating data: ROC and/or zROC plot with slope reported
- Statistical tests on SDT measures (not on raw accuracy)
- Software and version used for computation
References
- Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley.
- Hautus, M. J. (1995). Corrections for extreme proportions and their biasing effects on estimated values of d'. Behavior Research Methods, Instruments, & Computers, 27, 46-51.
- Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user's guide (2nd ed.). Mahwah, NJ: Erlbaum.
- Macmillan, N. A., & Kaplan, H. L. (1985). Detection theory analysis of group data. Psychological Bulletin, 98, 185-199.
- Maniscalco, B., & Lau, H. (2012). A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings. Consciousness and Cognition, 21, 422-430.
- Mickes, L., Wixted, J. T., & Wais, P. E. (2007). A direct test of the unequal-variance signal detection model of recognition memory. Psychonomic Bulletin & Review, 14, 858-865.
- Ratcliff, R., Sheu, C. F., & Gronlund, S. D. (1992). Testing global memory models using ROC curves. Psychological Review, 99, 518-535.
- Stanislaw, H., & Todorov, N. (1999). Calculation of signal detection theory measures. Behavior Research Methods, Instruments, & Computers, 31, 137-149.
- Stretch, V., & Wixted, J. T. (1998). On the difference between strength-based and frequency-based mirror effects in recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 1379-1396.
- Swets, J. A. (1986). Indices of discrimination or diagnostic accuracy. Psychological Bulletin, 99, 100-117.
- Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285-1293.
- Swets, J. A., Dawes, R. M., & Monahan, J. (2000). Psychological science can improve diagnostic decisions. Psychological Science in the Public Interest, 1, 1-26.
See references/sdt-formulas.md for detailed mathematical formulas and lookup tables.
See references/application-guide.md for domain-specific applications.