Distribution Search
Overview
This skill provides structured guidance for finding probability distributions that satisfy specific statistical constraints. These problems typically involve constructing a discrete probability distribution over a large vocabulary that achieves target values for metrics like KL divergence, entropy, or other information-theoretic quantities.
Mathematical Analysis First
Before writing any code, perform thorough mathematical analysis to constrain the solution space:
Derive analytical relationships between the target metrics and distribution properties
-
For KL divergence from uniform: KL(P||U) = -H(P) + log(V) where H(P) is entropy and V is vocabulary size
-
For KL divergence to uniform: KL(U||P) = log(V) - (1/V) * Σ log(P(i))
Calculate fixed quantities from the problem specification
-
Vocabulary size V determines log(V)
-
Target values combined with log(V) constrain feasible entropy ranges
Identify implied constraints from simultaneous requirements
-
Multiple target metrics often severely restrict the feasible solution space
-
Determine if the problem is over-constrained before attempting optimization
Estimate the solution structure analytically
-
Determine the approximate entropy the distribution must have
-
Estimate how concentrated or spread the probability mass should be
Parameterization Strategy
High-dimensional optimization over all probability values is impractical. Use reduced parameterizations:
Recommended Parameterizations
Power law with uniform tail: k high-probability elements following p_i ∝ i^(-α), remaining elements share probability equally
-
Parameters: k (number of special elements), α (power law exponent), p_rest (probability for tail)
Two-group distribution: k elements with probability p_high, remaining elements with probability p_low
-
Parameters: k, p_high (p_low is determined by normalization)
Multi-tier distribution: Several groups of elements with distinct probability levels
- Parameters: group sizes and probability levels
Parameter Selection Guidance
-
Start with few parameters (2-3) and add flexibility only if needed
-
Derive parameter bounds from mathematical constraints rather than arbitrary choices
-
Calculate reasonable initial values from the analytical solution estimates
Implementation Approach
Modular Code Structure
Organize code into separate, tested functions:
- kl_forward(P, V): Compute KL(P||Uniform)
- kl_backward(P, V): Compute KL(Uniform||P)
- create_distribution(params, V): Generate distribution from parameterization
- objective(params): Optimization objective combining constraints
- verify_solution(P, V, targets): Independent verification
Computational Efficiency
-
Avoid full array operations when possible - use analytical formulas for symmetric distributions
-
For two-group distributions: compute entropy/KL directly from group sizes and probabilities
-
Estimate computational cost before running - set appropriate timeouts
Optimization Strategy
-
Coarse grid search over reduced parameter space to find promising regions
-
Local optimization from multiple starting points in promising regions
-
Refinement with tighter tolerances near solutions
Verification
Independent Verification
Always verify solutions with code independent from the optimization:
-
Check probability constraints: All values non-negative, sum to 1.0
-
Compute metrics directly: Use explicit formulas, not the optimization objective
-
Test against known cases: Verify computation on uniform distribution or other known solutions
Common Verification Bugs
-
Incorrect handling of log(0) - use appropriate thresholds (e.g., max(p, 1e-30))
-
Array indexing errors when elements are counted vs indexed
-
Forgetting to normalize after parameter adjustments
-
Multiplication errors when computing sums over groups
Common Pitfalls
Computational
-
Starting with full-dimensional optimization: Immediately recognize the need for reduced parameterization
-
Insufficient timeout estimation: Estimate iterations needed before running
-
Rewriting entire scripts: Use modular code to enable targeted fixes
Mathematical
-
Incomplete constraint analysis: Fully leverage mathematical relationships before coding
-
Arbitrary parameter bounds: Derive bounds from problem constraints
-
Poor initial values: Use analytical estimates for starting points
Verification
-
Trusting optimization output directly: Always verify with independent computation
-
Not testing verification code: Verify the verifier using known solutions first
-
Numerical precision issues: Handle small probabilities carefully
Problem-Solving Workflow
-
Analyze: Derive analytical relationships and estimate solution structure
-
Parameterize: Choose reduced parameterization matching expected structure
-
Implement: Write modular, tested code for each component
-
Search: Grid search followed by local optimization
-
Verify: Independent verification of candidate solutions
-
Refine: Adjust parameterization if tolerances not achieved
When Solutions Are Not Found
If optimization fails to find valid solutions:
-
Check if the problem is mathematically feasible given constraints
-
Verify the parameterization can represent valid solutions
-
Expand the parameterization to add flexibility
-
Check for bugs in objective function or constraint handling
-
Try different optimization algorithms or starting points