Numerai Research
Overview
This skill is a “meta-workflow” that sequences existing Numerai skills so research requests reliably produce: (1) runnable configs, (2) executed experiments, (3) a full written report + plots, and (4) a deployable pickle when requested.
Workflow (always follow this order)
- Design the experiment (use numerai-experiment-design)
-
Follow the numerai-experiment-design skill to:
-
clarify the idea (or run quick scout interpretations if ambiguous)
-
choose baseline + feature set alignment (default ender20 baseline)
-
create an experiment folder under numerai/agents/experiments/<experiment_name>/
-
write configs in configs/
-
run training via PYTHONPATH=numerai python3 -m agents.code.modeling --config <config> --output-dir <experiment_dir>
-
track metrics with BMC as primary (bmc_mean , bmc_last_200_eras )
-
iterate in rounds (typically 4–5 configs per round), and keep going until you hit a plateau (per the experiment-design skill)
-
scale winners (bigger feature set and/or full data) before finalizing the best model
- Implement new model types if needed (use numerai-model-implementation)
Only if the idea requires new code (new model wrapper, new fit/predict behavior, etc.):
-
Follow the numerai-model-implementation skill to add the model type and register it.
-
Add at least one smoke-test config and verify the pipeline runs.
- Report the research (use report-research)
After you have iterated through multiple rounds and stopped finding improvements (plateau), and after any confirmatory scale runs:
-
Follow the report-research skill to:
-
write a full experiment.md (abstract + methods + results + decisions + next steps)
-
generate the standard show_experiment plot(s)
-
link plots and artifacts in the report
- Package and upload (use numerai-model-upload)
If (and only if) the user wants deployment:
-
Follow the numerai-model-upload skill to create a Numerai-compatible pickle and upload it via the Numerai MCP.
-
Remember: only Classic (tournament 8) supports pickle uploads.
Defaults (unless user specifies otherwise)
-
Scout first on downsampled data; scale only winners.
-
Run experiments in rounds (4–5 configs per round) and stop only after a plateau + confirmatory scale step.
-
Benchmark reference: v52_lgbm_ender20 .
-
Always record corr + BMC metrics and include the standard plot in the report.