Software Developer Skill
Purpose
Implement production-quality bioinformatics software from technical specifications with comprehensive testing, documentation, and error handling.
When to Use This Skill
Use this skill when you need to:
-
Implement software from architecture specification
-
Write production-ready code (not exploratory analysis)
-
Create command-line tools or packages
-
Build reusable libraries
-
Ensure code quality through testing
Workflow Integration
Pattern: Receive Spec → Implement → Test → Document → Deliver
Systems Architect provides technical spec ↓ Software Developer implements ↓ (copilot reviews continuously) Biologist Commentator validates biological correctness ↓ Production-ready software
Archival Compliance
Before writing any output file:
-
Check if archival context was provided via handoff from an orchestrator
-
If yes: use the provided archival_context block directly
-
If archival_context is "skip": bypass all compliance checks
-
If no handoff context: check for .archive-metadata.yaml in the repo root following the archival compliance check pattern: a. Read the reference document: ~/.claude/skills/archive-workflow/references/archival-compliance-check.md
b. If file not found, use graceful degradation (log warning, proceed without archival check) c. Apply the 5-step pattern to all file creation operations
-
Before writing output, validate path against guidelines
-
On violation: if invoked standalone, present advisory options; if invoked via Task tool (sub-agent), apply archival guidelines silently
software-developer specific: Focus on code naming conventions (snake_case for .py) and directory structure (src/, tests/) validation.
Core Capabilities
- Implementation from Spec
-
Translate architecture into working code
-
Modular, reusable functions/classes
-
Follow coding standards (PEP 8)
-
Type hints for clarity
- Error Handling
-
Try/except with informative messages
-
Validate inputs
-
Graceful failure
-
Logging for debugging
- Testing
-
Unit tests (pytest)
-
Integration tests
-
Edge case coverage
80% code coverage goal
- Documentation
-
Docstrings (Google style)
-
README with usage examples
-
API reference
-
Troubleshooting guide
- CLI Interface
-
argparse or Click
-
Help messages
-
Progress bars for long operations
-
Sensible defaults
Standard Package Structure
Use assets/package_structure_template/ :
project_name/ ├── src/ │ ├── init.py │ ├── module1.py │ ├── module2.py │ └── cli.py ├── tests/ │ ├── test_module1.py │ ├── test_module2.py │ ├── fixtures/ │ └── test_data/ ├── docs/ │ ├── usage.md │ └── api.md ├── README.md ├── setup.py ├── pyproject.toml ├── requirements.txt ├── environment.yml └── .gitignore
Code Quality Standards
Docstring Format (Google Style)
def calculate_cpm(counts: pd.DataFrame) -> pd.DataFrame: """ Calculate counts per million (CPM) normalization.
Parameters
----------
counts : pd.DataFrame
Raw count matrix (genes × samples)
Returns
-------
pd.DataFrame
CPM-normalized counts
Raises
------
ValueError
If counts contain negative values
Examples
--------
>>> counts = pd.DataFrame({'A': [10, 20], 'B': [30, 40]})
>>> cpm = calculate_cpm(counts)
>>> cpm['A'].sum() # Should be ~1,000,000
1000000.0
"""
if (counts < 0).any().any():
raise ValueError("Counts cannot be negative")
return (counts / counts.sum(axis=0)) * 1e6
Error Handling
✅ Good: Informative error messages
try: data = pd.read_csv(filepath) except FileNotFoundError: raise FileNotFoundError( f"Data file not found: {filepath}\n" f"Expected location: {Path(filepath).absolute()}" ) except pd.errors.EmptyDataError: raise ValueError( f"Data file is empty: {filepath}\n" f"Check that file was generated correctly" )
Logging
import logging
logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' ) logger = logging.getLogger(name)
def process_samples(sample_list): logger.info(f"Processing {len(sample_list)} samples") for i, sample in enumerate(sample_list): logger.debug(f"Processing sample {i+1}/{len(sample_list)}: {sample}") # ... processing code ... logger.info("Processing complete")
Testing with pytest
tests/test_normalization.py
import pytest import pandas as pd import numpy as np from src.normalization import calculate_cpm
def test_cpm_sum_equals_million(): """Test that CPM normalization sums to ~1 million.""" counts = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]}) cpm = calculate_cpm(counts) assert np.allclose(cpm.sum(axis=0), 1e6)
def test_cpm_raises_on_negative(): """Test that negative counts raise ValueError.""" counts = pd.DataFrame({'A': [-10, 20], 'B': [30, 40]}) with pytest.raises(ValueError, match="negative"): calculate_cpm(counts)
def test_cpm_handles_zero_sum(): """Test behavior when column sums to zero.""" counts = pd.DataFrame({'A': [0, 0], 'B': [10, 20]}) # Should handle gracefully (decide behavior: NaN or raise)
CLI Template
See assets/cli_template.py :
#!/usr/bin/env python3 """ QC Pipeline CLI
Usage: qc_pipeline samples.csv --output results/ """
import click import logging from pathlib import Path
@click.command() @click.argument('sample_file', type=click.Path(exists=True)) @click.option('--output', '-o', default='results/', help='Output directory') @click.option('--threads', '-t', default=4, help='Number of threads') @click.option('--verbose', '-v', is_flag=True, help='Verbose logging') def main(sample_file, output, threads, verbose): """Run QC pipeline on samples."""
# Setup logging
level = logging.DEBUG if verbose else logging.INFO
logging.basicConfig(level=level)
logger = logging.getLogger(__name__)
# Validate inputs
output_dir = Path(output)
output_dir.mkdir(parents=True, exist_ok=True)
logger.info(f"Processing samples from {sample_file}")
logger.info(f"Output directory: {output_dir}")
logger.info(f"Using {threads} threads")
# Main logic
try:
# ... pipeline code ...
logger.info("Pipeline complete!")
except Exception as e:
logger.error(f"Pipeline failed: {e}")
raise
if name == 'main': main()
Testing Strategy
- Unit Tests
Test individual functions in isolation.
- Integration Tests
Test components working together.
- Regression Tests
Save expected outputs, compare to current.
- Edge Case Tests
-
Empty input
-
Single element
-
All zeros
-
Missing values
-
Very large input
Copilot Integration
During implementation:
-
Write code section
-
Copilot reviews immediately
-
Fix critical issues before proceeding
-
Iterate until approved
-
Move to next section
Quality Checklist
Before delivery:
-
All code passes tests (pytest)
-
80% test coverage
-
All public functions documented
-
Error messages are actionable
-
CLI help message clear
-
README with installation + usage
-
Example data/workflow provided
-
Copilot approved (no critical issues)
-
Biologist validated (biological correctness)
References
For detailed standards:
-
references/coding_standards.md
-
PEP 8, naming, function length
-
references/testing_patterns.md
-
pytest, fixtures, mocking
-
references/error_handling_guide.md
-
Exception hierarchy, logging
-
references/documentation_standards.md
-
Docstrings, README, API docs
Scripts
Available in scripts/ :
-
project_template_generator.py
-
Creates project structure
-
test_runner.py
-
Runs pytest with coverage
Success Criteria
Code is ready for production when:
-
Implements full specification
-
All tests pass
-
Coverage >80%
-
Documentation complete
-
CLI functional
-
Copilot approved
-
Biologist validated
-
Ready for deployment