Code Duplication Detection & DRY Refactoring
I'll analyze your codebase for duplicate code blocks, identify similar patterns across files, and suggest DRY (Don't Repeat Yourself) refactoring strategies based on obra principles.
Detection Capabilities:
- Exact code duplication (copy-paste detection)
- Similar code patterns (semantic duplication)
- Repeated logic across files
- Duplicated constants and magic numbers
- Repeated test patterns
Supported Languages:
- JavaScript/TypeScript
- Python
- Go
- Java
- Generic text-based detection
Token Optimization
This skill uses duplication detection-specific patterns to minimize token usage:
1. Source Directory Detection Caching (500 token savings)
Pattern: Cache project structure and source directories
- Store structure in
.duplication-structure-cache(1 hour TTL) - Cache: source directories, file extensions, excluded paths
- Read cached structure on subsequent runs (50 tokens vs 550 tokens fresh)
- Invalidate on directory structure changes
- Savings: 91% on repeat duplication checks
2. Bash-Based Duplication Tool Execution (1,800 token savings)
Pattern: Use jscpd/PMD directly via bash
- JavaScript:
jscpd --format json(300 tokens) - Python:
pylint --duplicate-code(300 tokens) - Generic:
simianor custom grep-based detection (400 tokens) - Parse JSON output with jq
- No Task agents for duplication detection
- Savings: 90% vs Task-based duplication analysis
3. Sample-Based Duplication Reporting (900 token savings)
Pattern: Report first 10 duplication instances only
- Show top 10 duplications by severity (600 tokens)
- Count remaining duplications without details
- Full report via
--allflag - Savings: 65% vs reporting every duplication
4. Template-Based DRY Refactoring Recommendations (800 token savings)
Pattern: Use predefined DRY patterns
- Standard strategies: extract function, extract constant, inheritance, composition
- Pattern-based recommendations for duplication types
- No creative refactoring generation
- Savings: 80% vs LLM-generated DRY strategies
5. Incremental Duplication Checks (1,000 token savings)
Pattern: Check only changed files via git diff
- Analyze files modified since last commit (500 tokens)
- Check if changes introduce new duplication
- Full codebase analysis via
--fullflag - Savings: 75% vs full codebase duplication detection
6. Grep-Based Similar Pattern Discovery (700 token savings)
Pattern: Find potential duplications with grep
- Grep for repeated patterns: function signatures, constant values (300 tokens)
- Flag files with high similarity
- Run full tool only on flagged file pairs
- Savings: 70% vs running tool on all file combinations
7. Cached Duplication Baseline (600 token savings)
Pattern: Store baseline duplication report
- Cache initial duplication report
- Compare new runs against baseline to detect new duplications
- Focus on newly introduced duplications
- Savings: 80% by focusing on deltas
8. Threshold-Based Filtering (400 token savings)
Pattern: Filter out small duplications
- Default: 6+ lines of duplication
- Skip trivial duplications (imports, boilerplate)
- Adjustable threshold
- Savings: 70% by filtering noise
Real-World Token Usage Distribution
Typical operation patterns:
- Check recent changes (git diff scope): 1,200 tokens
- Show top 10 duplications: 1,400 tokens
- Full codebase analysis (first time): 3,000 tokens
- Cached baseline comparison: 800 tokens
- Refactoring recommendations (top 5): 1,800 tokens
- Most common: Incremental checks on recent changes
Expected per-analysis: 1,500-2,500 tokens (60% reduction from 3,500-6,000 baseline) Real-world average: 1,100 tokens (due to incremental checks, sample-based reporting, cached baseline)
Arguments: $ARGUMENTS - optional: minimum duplication threshold (default: 6 lines) or specific directories
DRY refactoring strategies:
- Extract function/method (most common)
- Extract constant/configuration
- Inheritance or composition
- Higher-order functions
- Template method pattern
- Strategy pattern for variations </think>
Phase 1: Tool Setup & Configuration
First, I'll set up duplication detection tools:
#!/bin/bash
# Duplication Detection - Setup & Configuration
echo "=== Code Duplication Detection ==="
echo ""
# Create analysis directory
mkdir -p .claude/duplication-analysis
ANALYSIS_DIR=".claude/duplication-analysis"
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
REPORT="$ANALYSIS_DIR/duplication-report-$TIMESTAMP.md"
MIN_LINES="${1:-6}" # Minimum lines for duplication detection
echo "Configuration:"
echo " Minimum duplication threshold: $MIN_LINES lines"
echo " Analysis directory: $ANALYSIS_DIR"
echo ""
# Detect project structure
echo "Detecting project structure..."
SOURCE_DIRS=""
# Common source directories
for dir in src lib app components pages services utils helpers; do
if [ -d "$dir" ]; then
SOURCE_DIRS="$SOURCE_DIRS $dir"
echo " ✓ Found: $dir/"
fi
done
# Python-specific
for dir in tests test __tests__; do
if [ -d "$dir" ]; then
echo " ✓ Found test directory: $dir/"
fi
done
if [ -z "$SOURCE_DIRS" ]; then
echo " Using current directory"
SOURCE_DIRS="."
fi
echo ""
Phase 2: Install Detection Tools
I'll install and configure duplication detection tools:
echo "=== Installing Duplication Detection Tools ==="
echo ""
install_jscpd() {
# JSCPD - Multi-language copy-paste detector
if ! command -v jscpd >/dev/null 2>&1 && ! npm list -g jscpd >/dev/null 2>&1; then
echo "Installing jscpd (copy-paste detector)..."
npm install -g jscpd 2>/dev/null || npm install --save-dev jscpd
if [ $? -eq 0 ]; then
echo "✓ jscpd installed"
else
echo "⚠️ Failed to install jscpd - using basic detection"
return 1
fi
else
echo "✓ jscpd already installed"
fi
# Create jscpd configuration
cat > "$ANALYSIS_DIR/.jscpd.json" << JSCPDCONFIG
{
"threshold": $MIN_LINES,
"reporters": ["json", "html"],
"ignore": [
"**/*.min.js",
"**/node_modules/**",
"**/dist/**",
"**/build/**",
"**/.git/**",
"**/coverage/**",
"**/__pycache__/**",
"**/venv/**",
"**/.venv/**"
],
"format": ["javascript", "typescript", "python", "go", "java"],
"absolute": false,
"output": "$ANALYSIS_DIR/jscpd-report"
}
JSCPDCONFIG
echo "✓ jscpd configuration created"
return 0
}
# Install tool
if install_jscpd; then
USE_JSCPD=true
else
USE_JSCPD=false
echo "ℹ️ Will use basic grep-based detection"
fi
echo ""
Phase 3: Detect Code Duplication
I'll analyze the codebase for duplicated code:
echo "=== Analyzing Code Duplication ==="
echo ""
run_jscpd_analysis() {
echo "Running jscpd analysis..."
echo ""
jscpd $SOURCE_DIRS \
--config "$ANALYSIS_DIR/.jscpd.json" \
2>&1 | tee "$ANALYSIS_DIR/jscpd-log.txt"
if [ $? -eq 0 ]; then
echo ""
echo "✓ jscpd analysis complete"
# Check if duplications found
if [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
DUPLICATIONS=$(jq '.statistics.total.duplications' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
PERCENTAGE=$(jq '.statistics.total.percentage' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
echo ""
echo "Duplication Statistics:"
echo " Total duplications: $DUPLICATIONS"
echo " Duplication percentage: $PERCENTAGE%"
echo ""
# Extract top duplications
echo "Top Duplicated Blocks:"
jq -r '.duplicates[] |
"\(.format) - \(.lines) lines duplicated in \(.fragment) (\(.firstFile.name):\(.firstFile.start) and \(.secondFile.name):\(.secondFile.start))"' \
"$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null | head -10
fi
else
echo "⚠️ jscpd analysis failed - see $ANALYSIS_DIR/jscpd-log.txt"
return 1
fi
}
run_basic_duplication_detection() {
echo "Running basic duplication detection..."
echo ""
# Find similar function signatures
echo "Detecting similar function signatures..."
# JavaScript/TypeScript functions
if find $SOURCE_DIRS -name "*.js" -o -name "*.jsx" -o -name "*.ts" -o -name "*.tsx" 2>/dev/null | grep -q .; then
grep -rh "^function\|^const.*=.*=>.*{$\|^export function" \
--include="*.js" --include="*.jsx" --include="*.ts" --include="*.tsx" \
$SOURCE_DIRS 2>/dev/null | \
sort | uniq -d | head -10 > "$ANALYSIS_DIR/duplicate-functions.txt"
if [ -s "$ANALYSIS_DIR/duplicate-functions.txt" ]; then
echo " Found duplicate function signatures (JavaScript/TypeScript):"
cat "$ANALYSIS_DIR/duplicate-functions.txt" | sed 's/^/ /'
echo ""
fi
fi
# Python functions
if find $SOURCE_DIRS -name "*.py" 2>/dev/null | grep -q .; then
grep -rh "^def \|^async def " \
--include="*.py" \
$SOURCE_DIRS 2>/dev/null | \
sort | uniq -d | head -10 > "$ANALYSIS_DIR/duplicate-python-functions.txt"
if [ -s "$ANALYSIS_DIR/duplicate-python-functions.txt" ]; then
echo " Found duplicate function signatures (Python):"
cat "$ANALYSIS_DIR/duplicate-python-functions.txt" | sed 's/^/ /'
echo ""
fi
fi
# Find magic numbers (potential constants)
echo "Detecting magic numbers (repeated literals)..."
find $SOURCE_DIRS -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" \) 2>/dev/null | \
xargs grep -oh "[0-9]\{2,\}" 2>/dev/null | \
sort | uniq -c | sort -rn | head -10 > "$ANALYSIS_DIR/magic-numbers.txt"
if [ -s "$ANALYSIS_DIR/magic-numbers.txt" ]; then
echo " Most common numeric literals:"
cat "$ANALYSIS_DIR/magic-numbers.txt" | sed 's/^/ /'
echo ""
fi
# Find repeated strings
echo "Detecting repeated string literals..."
find $SOURCE_DIRS -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" \) 2>/dev/null | \
xargs grep -oh "\"[^\"]\{10,\}\"" 2>/dev/null | \
sort | uniq -c | sort -rn | head -10 > "$ANALYSIS_DIR/repeated-strings.txt"
if [ -s "$ANALYSIS_DIR/repeated-strings.txt" ]; then
echo " Most common string literals:"
cat "$ANALYSIS_DIR/repeated-strings.txt" | sed 's/^/ /'
echo ""
fi
}
# Run appropriate analysis
if [ "$USE_JSCPD" = true ]; then
if ! run_jscpd_analysis; then
echo "Falling back to basic detection..."
run_basic_duplication_detection
fi
else
run_basic_duplication_detection
fi
Phase 4: Generate DRY Refactoring Strategies
I'll provide specific refactoring patterns for eliminating duplication:
echo ""
echo "=== Generating DRY Refactoring Strategies ==="
echo ""
cat > "$ANALYSIS_DIR/dry-refactoring-patterns.md" << 'DRYPATTERNS'
# DRY Refactoring Patterns
Based on obra YAGNI/DRY principles: Don't Repeat Yourself
---
## Pattern 1: Extract Function
**Problem:** Same code block repeated in multiple places
**Solution:** Extract to a reusable function
### Before (JavaScript)
```javascript
// File 1
function processOrderA(order) {
if (!order.customer || !order.customer.email) {
throw new Error('Invalid customer');
}
if (!order.items || order.items.length === 0) {
throw new Error('Empty order');
}
// ... process order
}
// File 2
function processOrderB(order) {
if (!order.customer || !order.customer.email) {
throw new Error('Invalid customer');
}
if (!order.items || order.items.length === 0) {
throw new Error('Empty order');
}
// ... different processing
}
After
// utils/validation.js
export function validateOrder(order) {
if (!order.customer?.email) {
throw new Error('Invalid customer');
}
if (!order.items?.length) {
throw new Error('Empty order');
}
}
// File 1
import { validateOrder } from './utils/validation';
function processOrderA(order) {
validateOrder(order);
// ... process order
}
// File 2
import { validateOrder } from './utils/validation';
function processOrderB(order) {
validateOrder(order);
// ... different processing
}
DRY Improvement: 8 duplicated lines → 1 function call
Pattern 2: Extract Configuration/Constants
Problem: Same literals repeated across codebase
Solution: Centralize in configuration
Before (Python)
# Multiple files with repeated values
def calculate_shipping():
if weight < 10:
return 5.99
return 9.99
def check_free_shipping(total):
return total >= 50.00
def apply_discount():
if quantity >= 5:
return 0.10
return 0
After
# config/constants.py
class ShippingConfig:
FREE_SHIPPING_THRESHOLD = 50.00
LIGHT_PACKAGE_WEIGHT = 10
LIGHT_PACKAGE_COST = 5.99
HEAVY_PACKAGE_COST = 9.99
class DiscountConfig:
BULK_QUANTITY = 5
BULK_DISCOUNT = 0.10
# services/shipping.py
from config.constants import ShippingConfig, DiscountConfig
def calculate_shipping(weight):
if weight < ShippingConfig.LIGHT_PACKAGE_WEIGHT:
return ShippingConfig.LIGHT_PACKAGE_COST
return ShippingConfig.HEAVY_PACKAGE_COST
def check_free_shipping(total):
return total >= ShippingConfig.FREE_SHIPPING_THRESHOLD
def apply_discount(quantity):
if quantity >= DiscountConfig.BULK_QUANTITY:
return DiscountConfig.BULK_DISCOUNT
return 0
DRY Improvement: Magic numbers eliminated, single source of truth
Pattern 3: Template Method Pattern
Problem: Similar algorithms with slight variations
Solution: Use template method or strategy pattern
Before (TypeScript)
class PDFReport {
generate() {
this.loadData();
this.formatHeader();
this.formatPDFContent();
this.addPDFFooter();
this.savePDF();
}
private loadData() { /* same logic */ }
private formatHeader() { /* same logic */ }
private formatPDFContent() { /* PDF-specific */ }
private addPDFFooter() { /* PDF-specific */ }
private savePDF() { /* PDF-specific */ }
}
class ExcelReport {
generate() {
this.loadData();
this.formatHeader();
this.formatExcelContent();
this.addExcelFooter();
this.saveExcel();
}
private loadData() { /* DUPLICATE logic */ }
private formatHeader() { /* DUPLICATE logic */ }
private formatExcelContent() { /* Excel-specific */ }
private addExcelFooter() { /* Excel-specific */ }
private saveExcel() { /* Excel-specific */ }
}
After
abstract class Report {
// Template method
generate() {
this.loadData();
this.formatHeader();
this.formatContent();
this.addFooter();
this.save();
}
// Common implementations
protected loadData() {
// Shared logic
}
protected formatHeader() {
// Shared logic
}
// Abstract methods for subclasses
protected abstract formatContent(): void;
protected abstract addFooter(): void;
protected abstract save(): void;
}
class PDFReport extends Report {
protected formatContent() { /* PDF-specific */ }
protected addFooter() { /* PDF-specific */ }
protected save() { /* PDF-specific */ }
}
class ExcelReport extends Report {
protected formatContent() { /* Excel-specific */ }
protected addFooter() { /* Excel-specific */ }
protected save() { /* Excel-specific */ }
}
DRY Improvement: Shared logic in base class, variations in subclasses
Pattern 4: Higher-Order Functions
Problem: Similar operations with different behaviors
Solution: Use higher-order functions or callbacks
Before (JavaScript)
function processUsersForEmail(users) {
const results = [];
for (const user of users) {
if (user.email && user.active) {
results.push(user);
}
}
return results;
}
function processUsersForSMS(users) {
const results = [];
for (const user of users) {
if (user.phone && user.active) {
results.push(user);
}
}
return results;
}
function processUsersForPush(users) {
const results = [];
for (const user of users) {
if (user.deviceToken && user.active) {
results.push(user);
}
}
return results;
}
After
function processUsers(users, channel) {
const validators = {
email: user => user.email && user.active,
sms: user => user.phone && user.active,
push: user => user.deviceToken && user.active
};
return users.filter(validators[channel]);
}
// Or more flexible with custom validator
function processUsers(users, isValid) {
return users.filter(isValid);
}
// Usage
const emailUsers = processUsers(users, user => user.email && user.active);
const smsUsers = processUsers(users, user => user.phone && user.active);
const pushUsers = processUsers(users, user => user.deviceToken && user.active);
DRY Improvement: 3 functions → 1 configurable function
Pattern 5: Composition Over Duplication
Problem: Repeated utility combinations
Solution: Compose smaller utilities
Before (Go)
// Repeated validation patterns
func ValidateUser(user User) error {
if user.Email == "" {
return errors.New("email required")
}
if !strings.Contains(user.Email, "@") {
return errors.New("invalid email")
}
if user.Age < 18 {
return errors.New("must be 18+")
}
return nil
}
func ValidateAdmin(admin Admin) error {
if admin.Email == "" {
return errors.New("email required")
}
if !strings.Contains(admin.Email, "@") {
return errors.New("invalid email")
}
if admin.Age < 18 {
return errors.New("must be 18+")
}
if admin.Role == "" {
return errors.New("role required")
}
return nil
}
After
// Composable validators
func requireEmail(email string) error {
if email == "" {
return errors.New("email required")
}
return nil
}
func validateEmailFormat(email string) error {
if !strings.Contains(email, "@") {
return errors.New("invalid email")
}
return nil
}
func requireAdult(age int) error {
if age < 18 {
return errors.New("must be 18+")
}
return nil
}
func requireRole(role string) error {
if role == "" {
return errors.New("role required")
}
return nil
}
// Compose validators
func ValidateUser(user User) error {
validators := []func() error{
func() error { return requireEmail(user.Email) },
func() error { return validateEmailFormat(user.Email) },
func() error { return requireAdult(user.Age) },
}
return runValidators(validators)
}
func ValidateAdmin(admin Admin) error {
validators := []func() error{
func() error { return requireEmail(admin.Email) },
func() error { return validateEmailFormat(admin.Email) },
func() error { return requireAdult(admin.Age) },
func() error { return requireRole(admin.Role) },
}
return runValidators(validators)
}
func runValidators(validators []func() error) error {
for _, validate := range validators {
if err := validate(); err != nil {
return err
}
}
return nil
}
DRY Improvement: Reusable validators, composable validation
Pattern 6: Extract Test Helpers
Problem: Repeated test setup/teardown
Solution: Create test utilities
Before
// test/user.test.js
describe('User tests', () => {
it('should create user', () => {
const db = createDatabase();
const user = { name: 'John', email: 'john@example.com' };
// ... test logic
db.close();
});
it('should update user', () => {
const db = createDatabase();
const user = { name: 'John', email: 'john@example.com' };
// ... test logic
db.close();
});
});
// test/order.test.js
describe('Order tests', () => {
it('should create order', () => {
const db = createDatabase();
const user = { name: 'John', email: 'john@example.com' };
// ... test logic
db.close();
});
});
After
// test/helpers/testUtils.js
export function setupTestDatabase() {
const db = createDatabase();
return {
db,
cleanup: () => db.close()
};
}
export function createTestUser(overrides = {}) {
return {
name: 'John',
email: 'john@example.com',
...overrides
};
}
// test/user.test.js
import { setupTestDatabase, createTestUser } from './helpers/testUtils';
describe('User tests', () => {
let db;
beforeEach(() => {
({ db } = setupTestDatabase());
});
afterEach(() => {
db.close();
});
it('should create user', () => {
const user = createTestUser();
// ... test logic (cleaner!)
});
it('should update user', () => {
const user = createTestUser({ name: 'Jane' });
// ... test logic
});
});
DRY Improvement: Shared test utilities, less boilerplate
obra YAGNI/DRY Principles
YAGNI: You Aren't Gonna Need It
- Don't add functionality until necessary
- Avoid premature abstraction
- Wait for duplication before extracting
DRY: Don't Repeat Yourself
- Every piece of knowledge should have a single representation
- Avoid copy-paste programming
- Extract when you see duplication 2-3 times
When to Extract
- Three strikes rule: Duplicate 3 times → extract
- Clear pattern: Similar code structure appears
- High maintenance cost: Changes require updates in multiple places
- Business logic: Domain rules should be centralized
When NOT to Extract
- Premature abstraction: Only seen once or twice
- Coincidental duplication: Similar code, different concepts
- Temporary code: Prototypes, experiments
- Over-engineering: Abstraction more complex than duplication
DRYPATTERNS
echo "✓ DRY refactoring patterns guide created"
## Phase 5: Generate Duplication Report
I'll create a comprehensive report with prioritized refactoring opportunities:
```bash
echo ""
echo "=== Generating Duplication Report ==="
echo ""
# Count duplications
TOTAL_DUPLICATIONS=0
DUPLICATION_PERCENTAGE=0
if [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
TOTAL_DUPLICATIONS=$(jq '.statistics.total.duplications' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
DUPLICATION_PERCENTAGE=$(jq '.statistics.total.percentage' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
fi
cat > "$REPORT" << EOF
# Code Duplication Analysis Report
**Generated:** $(date)
**Minimum Lines Threshold:** $MIN_LINES lines
**Total Duplications Found:** $TOTAL_DUPLICATIONS
**Duplication Percentage:** $DUPLICATION_PERCENTAGE%
---
## Summary
Code duplication violates the DRY (Don't Repeat Yourself) principle and leads to:
- Higher maintenance costs (fix bugs in multiple places)
- Increased bug likelihood (miss updating one location)
- Larger codebase (more code to understand)
- Lower code quality (copy-paste programming)
**Duplication Guidelines (obra principles):**
- **0-5%:** Excellent - minimal duplication
- **5-10%:** Good - acceptable for large projects
- **10-20%:** Moderate - consider refactoring
- **20%+:** High - significant technical debt
---
## Duplication Statistics
EOF
if [ "$USE_JSCPD" = true ] && [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
cat >> "$REPORT" << EOF
### Overall Statistics
- Total Lines Analyzed: $(jq '.statistics.total.lines' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
- Duplicated Lines: $(jq '.statistics.total.clones' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
- Duplication Blocks: $TOTAL_DUPLICATIONS
- Duplication Percentage: $DUPLICATION_PERCENTAGE%
### Top Duplicated Files
EOF
jq -r '.statistics.formats[] |
"\(.format):\n Files: \(.sources)\n Duplications: \(.duplications)\n Percentage: \(.percentage)%\n"' \
"$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null >> "$REPORT"
cat >> "$REPORT" << EOF
### Largest Duplication Blocks
EOF
jq -r '.duplicates[:10] | .[] |
"**\(.lines) lines** duplicated in \(.format)\n- Location 1: \(.firstFile.name):\(.firstFile.start)\n- Location 2: \(.secondFile.name):\(.secondFile.start)\n"' \
"$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null >> "$REPORT"
cat >> "$REPORT" << EOF
**Full HTML Report:** Open \`$ANALYSIS_DIR/jscpd-report/html/index.html\` in browser
EOF
else
cat >> "$REPORT" << EOF
### Basic Analysis Results
EOF
if [ -f "$ANALYSIS_DIR/duplicate-functions.txt" ] && [ -s "$ANALYSIS_DIR/duplicate-functions.txt" ]; then
cat >> "$REPORT" << EOF
**Duplicate Function Signatures (JavaScript/TypeScript):**
\`\`\`
$(cat "$ANALYSIS_DIR/duplicate-functions.txt")
\`\`\`
EOF
fi
if [ -f "$ANALYSIS_DIR/duplicate-python-functions.txt" ] && [ -s "$ANALYSIS_DIR/duplicate-python-functions.txt" ]; then
cat >> "$REPORT" << EOF
**Duplicate Function Signatures (Python):**
\`\`\`
$(cat "$ANALYSIS_DIR/duplicate-python-functions.txt")
\`\`\`
EOF
fi
if [ -f "$ANALYSIS_DIR/magic-numbers.txt" ] && [ -s "$ANALYSIS_DIR/magic-numbers.txt" ]; then
cat >> "$REPORT" << EOF
**Repeated Numeric Literals:**
\`\`\`
$(cat "$ANALYSIS_DIR/magic-numbers.txt")
\`\`\`
Consider extracting to named constants.
EOF
fi
if [ -f "$ANALYSIS_DIR/repeated-strings.txt" ] && [ -s "$ANALYSIS_DIR/repeated-strings.txt" ]; then
cat >> "$REPORT" << EOF
**Repeated String Literals:**
\`\`\`
$(cat "$ANALYSIS_DIR/repeated-strings.txt")
\`\`\`
Consider extracting to configuration.
EOF
fi
fi
cat >> "$REPORT" << 'EOF'
---
## Recommended DRY Refactoring
### Priority 1: Extract Duplicated Functions
- Identify exact code duplicates (10+ lines)
- Extract to shared utility functions
- Consolidate in common modules
### Priority 2: Centralize Configuration
- Extract magic numbers to constants
- Create configuration files
- Use environment variables for deployment-specific values
### Priority 3: Eliminate Similar Patterns
- Identify similar code structures
- Extract common patterns
- Use higher-order functions or templates
### Priority 4: Consolidate Test Helpers
- Extract common test setup/teardown
- Create test utilities
- Share fixtures across test suites
**See detailed patterns:** `cat $ANALYSIS_DIR/dry-refactoring-patterns.md`
---
## Implementation Steps
1. **Create Git Checkpoint**
```bash
git add -A
git commit -m "Pre DRY-refactoring checkpoint" || echo "No changes"
-
Prioritize Duplications
- Start with largest duplication blocks
- Focus on frequently changed code
- Consider domain importance
-
Apply DRY Patterns
- Extract function (most common)
- Extract configuration
- Template method pattern
- Higher-order functions
- Composition
-
Three Strikes Rule
- Wait for 3 occurrences before extracting
- Avoid premature abstraction
- Balance DRY with YAGNI
-
Verify Improvements
- Re-run duplication analysis
- Ensure tests pass
- Check for reduced code size
-
Document Changes
- Update documentation
- Note refactoring decisions
- Share patterns with team
Continuous Monitoring
Add to CI/CD
Pre-commit Hook
# .git/hooks/pre-commit
#!/bin/bash
# Check duplication before commit
jscpd --threshold $MIN_LINES . || exit 1
GitHub Actions
name: Code Quality
on: [push, pull_request]
jobs:
duplication:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Check duplication
run: |
npm install -g jscpd
jscpd --threshold 6 .
Integration with Other Skills
/refactor- Systematic code restructuring/complexity-reduce- Reduce cyclomatic complexity/make-it-pretty- Improve code readability/review- Include duplication in code reviews/test- Ensure tests after refactoring
Resources
- obra/superpowers - YAGNI/DRY principles
- The Pragmatic Programmer - DRY Principle
- Refactoring - Martin Fowler
- JSCPD - Copy-Paste Detector
- Rule of Three (refactoring)
Report generated at: $(date)
Next Steps:
- Review high-duplication areas
- Select appropriate DRY pattern
- Implement refactoring incrementally
- Re-run analysis to verify improvement
EOF
echo "✓ Duplication report generated: $REPORT"
## Summary
```bash
echo ""
echo "=== ✓ Duplication Analysis Complete ==="
echo ""
echo "📊 Analysis Results:"
if [ "$USE_JSCPD" = true ]; then
echo " Total duplications: $TOTAL_DUPLICATIONS"
echo " Duplication percentage: $DUPLICATION_PERCENTAGE%"
echo " Threshold: $MIN_LINES lines"
else
echo " Basic analysis complete"
echo " Install jscpd for detailed analysis: npm install -g jscpd"
fi
echo ""
echo "📁 Generated Files:"
echo " - Duplication Report: $REPORT"
echo " - DRY Patterns: $ANALYSIS_DIR/dry-refactoring-patterns.md"
[ -f "$ANALYSIS_DIR/jscpd-report/html/index.html" ] && echo " - HTML Report: $ANALYSIS_DIR/jscpd-report/html/index.html"
echo ""
echo "🎯 Recommended Actions:"
if [ "$DUPLICATION_PERCENTAGE" != "0" ] && [ "${DUPLICATION_PERCENTAGE%.*}" -gt 10 ]; then
echo " ⚠️ High duplication detected ($DUPLICATION_PERCENTAGE%)"
echo " 1. Review largest duplication blocks"
echo " 2. Extract duplicated functions"
echo " 3. Centralize configuration"
echo " 4. Run tests after refactoring"
elif [ "$DUPLICATION_PERCENTAGE" != "0" ] && [ "${DUPLICATION_PERCENTAGE%.*}" -gt 5 ]; then
echo " Moderate duplication ($DUPLICATION_PERCENTAGE%)"
echo " Consider refactoring during feature work"
else
echo " ✓ Low duplication - excellent!"
echo " Continue monitoring in code reviews"
fi
echo ""
echo "💡 DRY Refactoring Patterns:"
echo " 1. Extract Function (consolidate duplicates)"
echo " 2. Extract Configuration (centralize constants)"
echo " 3. Template Method (share algorithm structure)"
echo " 4. Higher-Order Functions (parameterize behavior)"
echo " 5. Composition (build from smaller pieces)"
echo ""
echo "📏 obra YAGNI/DRY Guidelines:"
echo " - Three strikes rule: Extract after 3 duplications"
echo " - Avoid premature abstraction"
echo " - Balance DRY with readability"
echo ""
echo "🔗 Integration Points:"
echo " - /refactor - Systematic restructuring"
echo " - /complexity-reduce - Reduce complexity"
echo " - /review - Include in code reviews"
echo ""
echo "View report: cat $REPORT"
echo "View patterns: cat $ANALYSIS_DIR/dry-refactoring-patterns.md"
[ -f "$ANALYSIS_DIR/jscpd-report/html/index.html" ] && echo "View HTML: open $ANALYSIS_DIR/jscpd-report/html/index.html"
Safety Guarantees
What I'll NEVER do:
- Automatically refactor without analysis
- Extract after single occurrence (violates YAGNI)
- Remove code without understanding context
- Skip testing after refactoring
- Add AI attribution to commits
What I WILL do:
- Identify genuine code duplication
- Suggest proven DRY patterns
- Follow obra YAGNI principles
- Recommend incremental refactoring
- Preserve functionality
- Provide clear examples
Credits
This skill is based on:
- obra/superpowers - YAGNI and DRY principles
- The Pragmatic Programmer - DRY principle origin
- Martin Fowler - Refactoring patterns
- JSCPD - Multi-language duplication detection
- Rule of Three - When to extract abstraction
Token Budget
Target: 2,000-3,500 tokens per execution
- Phase 1-2: ~600 tokens (setup + tool installation)
- Phase 3: ~1,000 tokens (duplication detection)
- Phase 4-5: ~1,500 tokens (patterns + reporting)
Optimization Strategy:
- Use jscpd for efficient detection
- Fallback to grep-based analysis
- Template-based pattern generation
- Focus on actionable duplications
- Clear DRY refactoring guidance
This ensures thorough duplication detection with practical DRY refactoring strategies based on obra principles while respecting token limits.