Code Duplication Detection & DRY Refactoring

I'll analyze your codebase for duplicate code blocks, identify similar patterns across files, and suggest DRY (Don't Repeat Yourself) refactoring strategies based on obra principles.

Detection Capabilities:

Exact code duplication (copy-paste detection)
Similar code patterns (semantic duplication)
Repeated logic across files
Duplicated constants and magic numbers
Repeated test patterns

Supported Languages:

JavaScript/TypeScript
Python
Go
Java
Generic text-based detection

Token Optimization

This skill uses duplication detection-specific patterns to minimize token usage:

1. Source Directory Detection Caching (500 token savings)

Pattern: Cache project structure and source directories

Store structure in .duplication-structure-cache (1 hour TTL)
Cache: source directories, file extensions, excluded paths
Read cached structure on subsequent runs (50 tokens vs 550 tokens fresh)
Invalidate on directory structure changes
Savings: 91% on repeat duplication checks

2. Bash-Based Duplication Tool Execution (1,800 token savings)

Pattern: Use jscpd/PMD directly via bash

JavaScript: jscpd --format json (300 tokens)
Python: pylint --duplicate-code (300 tokens)
Generic: simian or custom grep-based detection (400 tokens)
Parse JSON output with jq
No Task agents for duplication detection
Savings: 90% vs Task-based duplication analysis

3. Sample-Based Duplication Reporting (900 token savings)

Pattern: Report first 10 duplication instances only

Show top 10 duplications by severity (600 tokens)
Count remaining duplications without details
Full report via --all flag
Savings: 65% vs reporting every duplication

4. Template-Based DRY Refactoring Recommendations (800 token savings)

Pattern: Use predefined DRY patterns

Standard strategies: extract function, extract constant, inheritance, composition
Pattern-based recommendations for duplication types
No creative refactoring generation
Savings: 80% vs LLM-generated DRY strategies

5. Incremental Duplication Checks (1,000 token savings)

Pattern: Check only changed files via git diff

Analyze files modified since last commit (500 tokens)
Check if changes introduce new duplication
Full codebase analysis via --full flag
Savings: 75% vs full codebase duplication detection

6. Grep-Based Similar Pattern Discovery (700 token savings)

Pattern: Find potential duplications with grep

Grep for repeated patterns: function signatures, constant values (300 tokens)
Flag files with high similarity
Run full tool only on flagged file pairs
Savings: 70% vs running tool on all file combinations

7. Cached Duplication Baseline (600 token savings)

Pattern: Store baseline duplication report

Cache initial duplication report
Compare new runs against baseline to detect new duplications
Focus on newly introduced duplications
Savings: 80% by focusing on deltas

8. Threshold-Based Filtering (400 token savings)

Pattern: Filter out small duplications

Default: 6+ lines of duplication
Skip trivial duplications (imports, boilerplate)
Adjustable threshold
Savings: 70% by filtering noise

Real-World Token Usage Distribution

Typical operation patterns:

Check recent changes (git diff scope): 1,200 tokens
Show top 10 duplications: 1,400 tokens
Full codebase analysis (first time): 3,000 tokens
Cached baseline comparison: 800 tokens
Refactoring recommendations (top 5): 1,800 tokens
Most common: Incremental checks on recent changes

Expected per-analysis: 1,500-2,500 tokens (60% reduction from 3,500-6,000 baseline) Real-world average: 1,100 tokens (due to incremental checks, sample-based reporting, cached baseline)

Arguments: $ARGUMENTS - optional: minimum duplication threshold (default: 6 lines) or specific directories

<think> Code duplication indicates: - Copy-paste programming (high maintenance cost) - Missing abstraction opportunities - Violation of DRY principle - Increased bug surface (fix in one place, miss others) - Higher cognitive load

DRY refactoring strategies:

Extract function/method (most common)
Extract constant/configuration
Inheritance or composition
Higher-order functions
Template method pattern
Strategy pattern for variations </think>

Phase 1: Tool Setup & Configuration

First, I'll set up duplication detection tools:

#!/bin/bash
# Duplication Detection - Setup & Configuration

echo "=== Code Duplication Detection ==="
echo ""

# Create analysis directory
mkdir -p .claude/duplication-analysis
ANALYSIS_DIR=".claude/duplication-analysis"
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
REPORT="$ANALYSIS_DIR/duplication-report-$TIMESTAMP.md"
MIN_LINES="${1:-6}"  # Minimum lines for duplication detection

echo "Configuration:"
echo "  Minimum duplication threshold: $MIN_LINES lines"
echo "  Analysis directory: $ANALYSIS_DIR"
echo ""

# Detect project structure
echo "Detecting project structure..."
SOURCE_DIRS=""

# Common source directories
for dir in src lib app components pages services utils helpers; do
    if [ -d "$dir" ]; then
        SOURCE_DIRS="$SOURCE_DIRS $dir"
        echo "  ✓ Found: $dir/"
    fi
done

# Python-specific
for dir in tests test __tests__; do
    if [ -d "$dir" ]; then
        echo "  ✓ Found test directory: $dir/"
    fi
done

if [ -z "$SOURCE_DIRS" ]; then
    echo "  Using current directory"
    SOURCE_DIRS="."
fi

echo ""

Phase 2: Install Detection Tools

I'll install and configure duplication detection tools:

echo "=== Installing Duplication Detection Tools ==="
echo ""

install_jscpd() {
    # JSCPD - Multi-language copy-paste detector
    if ! command -v jscpd >/dev/null 2>&1 && ! npm list -g jscpd >/dev/null 2>&1; then
        echo "Installing jscpd (copy-paste detector)..."
        npm install -g jscpd 2>/dev/null || npm install --save-dev jscpd

        if [ $? -eq 0 ]; then
            echo "✓ jscpd installed"
        else
            echo "⚠️  Failed to install jscpd - using basic detection"
            return 1
        fi
    else
        echo "✓ jscpd already installed"
    fi

    # Create jscpd configuration
    cat > "$ANALYSIS_DIR/.jscpd.json" << JSCPDCONFIG
{
    "threshold": $MIN_LINES,
    "reporters": ["json", "html"],
    "ignore": [
        "**/*.min.js",
        "**/node_modules/**",
        "**/dist/**",
        "**/build/**",
        "**/.git/**",
        "**/coverage/**",
        "**/__pycache__/**",
        "**/venv/**",
        "**/.venv/**"
    ],
    "format": ["javascript", "typescript", "python", "go", "java"],
    "absolute": false,
    "output": "$ANALYSIS_DIR/jscpd-report"
}
JSCPDCONFIG

    echo "✓ jscpd configuration created"
    return 0
}

# Install tool
if install_jscpd; then
    USE_JSCPD=true
else
    USE_JSCPD=false
    echo "ℹ️  Will use basic grep-based detection"
fi

echo ""

Phase 3: Detect Code Duplication

I'll analyze the codebase for duplicated code:

echo "=== Analyzing Code Duplication ==="
echo ""

run_jscpd_analysis() {
    echo "Running jscpd analysis..."
    echo ""

    jscpd $SOURCE_DIRS \
        --config "$ANALYSIS_DIR/.jscpd.json" \
        2>&1 | tee "$ANALYSIS_DIR/jscpd-log.txt"

    if [ $? -eq 0 ]; then
        echo ""
        echo "✓ jscpd analysis complete"

        # Check if duplications found
        if [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
            DUPLICATIONS=$(jq '.statistics.total.duplications' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
            PERCENTAGE=$(jq '.statistics.total.percentage' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)

            echo ""
            echo "Duplication Statistics:"
            echo "  Total duplications: $DUPLICATIONS"
            echo "  Duplication percentage: $PERCENTAGE%"
            echo ""

            # Extract top duplications
            echo "Top Duplicated Blocks:"
            jq -r '.duplicates[] |
                "\(.format) - \(.lines) lines duplicated in \(.fragment) (\(.firstFile.name):\(.firstFile.start) and \(.secondFile.name):\(.secondFile.start))"' \
                "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null | head -10
        fi
    else
        echo "⚠️  jscpd analysis failed - see $ANALYSIS_DIR/jscpd-log.txt"
        return 1
    fi
}

run_basic_duplication_detection() {
    echo "Running basic duplication detection..."
    echo ""

    # Find similar function signatures
    echo "Detecting similar function signatures..."

    # JavaScript/TypeScript functions
    if find $SOURCE_DIRS -name "*.js" -o -name "*.jsx" -o -name "*.ts" -o -name "*.tsx" 2>/dev/null | grep -q .; then
        grep -rh "^function\|^const.*=.*=>.*{$\|^export function" \
            --include="*.js" --include="*.jsx" --include="*.ts" --include="*.tsx" \
            $SOURCE_DIRS 2>/dev/null | \
            sort | uniq -d | head -10 > "$ANALYSIS_DIR/duplicate-functions.txt"

        if [ -s "$ANALYSIS_DIR/duplicate-functions.txt" ]; then
            echo "  Found duplicate function signatures (JavaScript/TypeScript):"
            cat "$ANALYSIS_DIR/duplicate-functions.txt" | sed 's/^/    /'
            echo ""
        fi
    fi

    # Python functions
    if find $SOURCE_DIRS -name "*.py" 2>/dev/null | grep -q .; then
        grep -rh "^def \|^async def " \
            --include="*.py" \
            $SOURCE_DIRS 2>/dev/null | \
            sort | uniq -d | head -10 > "$ANALYSIS_DIR/duplicate-python-functions.txt"

        if [ -s "$ANALYSIS_DIR/duplicate-python-functions.txt" ]; then
            echo "  Found duplicate function signatures (Python):"
            cat "$ANALYSIS_DIR/duplicate-python-functions.txt" | sed 's/^/    /'
            echo ""
        fi
    fi

    # Find magic numbers (potential constants)
    echo "Detecting magic numbers (repeated literals)..."

    find $SOURCE_DIRS -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" \) 2>/dev/null | \
        xargs grep -oh "[0-9]\{2,\}" 2>/dev/null | \
        sort | uniq -c | sort -rn | head -10 > "$ANALYSIS_DIR/magic-numbers.txt"

    if [ -s "$ANALYSIS_DIR/magic-numbers.txt" ]; then
        echo "  Most common numeric literals:"
        cat "$ANALYSIS_DIR/magic-numbers.txt" | sed 's/^/    /'
        echo ""
    fi

    # Find repeated strings
    echo "Detecting repeated string literals..."

    find $SOURCE_DIRS -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" \) 2>/dev/null | \
        xargs grep -oh "\"[^\"]\{10,\}\"" 2>/dev/null | \
        sort | uniq -c | sort -rn | head -10 > "$ANALYSIS_DIR/repeated-strings.txt"

    if [ -s "$ANALYSIS_DIR/repeated-strings.txt" ]; then
        echo "  Most common string literals:"
        cat "$ANALYSIS_DIR/repeated-strings.txt" | sed 's/^/    /'
        echo ""
    fi
}

# Run appropriate analysis
if [ "$USE_JSCPD" = true ]; then
    if ! run_jscpd_analysis; then
        echo "Falling back to basic detection..."
        run_basic_duplication_detection
    fi
else
    run_basic_duplication_detection
fi

Phase 4: Generate DRY Refactoring Strategies

I'll provide specific refactoring patterns for eliminating duplication:

echo ""
echo "=== Generating DRY Refactoring Strategies ==="
echo ""

cat > "$ANALYSIS_DIR/dry-refactoring-patterns.md" << 'DRYPATTERNS'
# DRY Refactoring Patterns

Based on obra YAGNI/DRY principles: Don't Repeat Yourself

---

## Pattern 1: Extract Function

**Problem:** Same code block repeated in multiple places

**Solution:** Extract to a reusable function

### Before (JavaScript)
```javascript
// File 1
function processOrderA(order) {
    if (!order.customer || !order.customer.email) {
        throw new Error('Invalid customer');
    }
    if (!order.items || order.items.length === 0) {
        throw new Error('Empty order');
    }
    // ... process order
}

// File 2
function processOrderB(order) {
    if (!order.customer || !order.customer.email) {
        throw new Error('Invalid customer');
    }
    if (!order.items || order.items.length === 0) {
        throw new Error('Empty order');
    }
    // ... different processing
}

After

// utils/validation.js
export function validateOrder(order) {
    if (!order.customer?.email) {
        throw new Error('Invalid customer');
    }
    if (!order.items?.length) {
        throw new Error('Empty order');
    }
}

// File 1
import { validateOrder } from './utils/validation';

function processOrderA(order) {
    validateOrder(order);
    // ... process order
}

// File 2
import { validateOrder } from './utils/validation';

function processOrderB(order) {
    validateOrder(order);
    // ... different processing
}

DRY Improvement: 8 duplicated lines → 1 function call

Pattern 2: Extract Configuration/Constants

Problem: Same literals repeated across codebase

Solution: Centralize in configuration

Before (Python)

# Multiple files with repeated values
def calculate_shipping():
    if weight < 10:
        return 5.99
    return 9.99

def check_free_shipping(total):
    return total >= 50.00

def apply_discount():
    if quantity >= 5:
        return 0.10
    return 0

After

# config/constants.py
class ShippingConfig:
    FREE_SHIPPING_THRESHOLD = 50.00
    LIGHT_PACKAGE_WEIGHT = 10
    LIGHT_PACKAGE_COST = 5.99
    HEAVY_PACKAGE_COST = 9.99

class DiscountConfig:
    BULK_QUANTITY = 5
    BULK_DISCOUNT = 0.10

# services/shipping.py
from config.constants import ShippingConfig, DiscountConfig

def calculate_shipping(weight):
    if weight < ShippingConfig.LIGHT_PACKAGE_WEIGHT:
        return ShippingConfig.LIGHT_PACKAGE_COST
    return ShippingConfig.HEAVY_PACKAGE_COST

def check_free_shipping(total):
    return total >= ShippingConfig.FREE_SHIPPING_THRESHOLD

def apply_discount(quantity):
    if quantity >= DiscountConfig.BULK_QUANTITY:
        return DiscountConfig.BULK_DISCOUNT
    return 0

DRY Improvement: Magic numbers eliminated, single source of truth

Pattern 3: Template Method Pattern

Problem: Similar algorithms with slight variations

Solution: Use template method or strategy pattern

Before (TypeScript)

class PDFReport {
    generate() {
        this.loadData();
        this.formatHeader();
        this.formatPDFContent();
        this.addPDFFooter();
        this.savePDF();
    }

    private loadData() { /* same logic */ }
    private formatHeader() { /* same logic */ }
    private formatPDFContent() { /* PDF-specific */ }
    private addPDFFooter() { /* PDF-specific */ }
    private savePDF() { /* PDF-specific */ }
}

class ExcelReport {
    generate() {
        this.loadData();
        this.formatHeader();
        this.formatExcelContent();
        this.addExcelFooter();
        this.saveExcel();
    }

    private loadData() { /* DUPLICATE logic */ }
    private formatHeader() { /* DUPLICATE logic */ }
    private formatExcelContent() { /* Excel-specific */ }
    private addExcelFooter() { /* Excel-specific */ }
    private saveExcel() { /* Excel-specific */ }
}

After

abstract class Report {
    // Template method
    generate() {
        this.loadData();
        this.formatHeader();
        this.formatContent();
        this.addFooter();
        this.save();
    }

    // Common implementations
    protected loadData() {
        // Shared logic
    }

    protected formatHeader() {
        // Shared logic
    }

    // Abstract methods for subclasses
    protected abstract formatContent(): void;
    protected abstract addFooter(): void;
    protected abstract save(): void;
}

class PDFReport extends Report {
    protected formatContent() { /* PDF-specific */ }
    protected addFooter() { /* PDF-specific */ }
    protected save() { /* PDF-specific */ }
}

class ExcelReport extends Report {
    protected formatContent() { /* Excel-specific */ }
    protected addFooter() { /* Excel-specific */ }
    protected save() { /* Excel-specific */ }
}

DRY Improvement: Shared logic in base class, variations in subclasses

Pattern 4: Higher-Order Functions

Problem: Similar operations with different behaviors

Solution: Use higher-order functions or callbacks

Before (JavaScript)

function processUsersForEmail(users) {
    const results = [];
    for (const user of users) {
        if (user.email && user.active) {
            results.push(user);
        }
    }
    return results;
}

function processUsersForSMS(users) {
    const results = [];
    for (const user of users) {
        if (user.phone && user.active) {
            results.push(user);
        }
    }
    return results;
}

function processUsersForPush(users) {
    const results = [];
    for (const user of users) {
        if (user.deviceToken && user.active) {
            results.push(user);
        }
    }
    return results;
}

After

function processUsers(users, channel) {
    const validators = {
        email: user => user.email && user.active,
        sms: user => user.phone && user.active,
        push: user => user.deviceToken && user.active
    };

    return users.filter(validators[channel]);
}

// Or more flexible with custom validator
function processUsers(users, isValid) {
    return users.filter(isValid);
}

// Usage
const emailUsers = processUsers(users, user => user.email && user.active);
const smsUsers = processUsers(users, user => user.phone && user.active);
const pushUsers = processUsers(users, user => user.deviceToken && user.active);

DRY Improvement: 3 functions → 1 configurable function

Pattern 5: Composition Over Duplication

Problem: Repeated utility combinations

Solution: Compose smaller utilities

Before (Go)

// Repeated validation patterns
func ValidateUser(user User) error {
    if user.Email == "" {
        return errors.New("email required")
    }
    if !strings.Contains(user.Email, "@") {
        return errors.New("invalid email")
    }
    if user.Age < 18 {
        return errors.New("must be 18+")
    }
    return nil
}

func ValidateAdmin(admin Admin) error {
    if admin.Email == "" {
        return errors.New("email required")
    }
    if !strings.Contains(admin.Email, "@") {
        return errors.New("invalid email")
    }
    if admin.Age < 18 {
        return errors.New("must be 18+")
    }
    if admin.Role == "" {
        return errors.New("role required")
    }
    return nil
}

After

// Composable validators
func requireEmail(email string) error {
    if email == "" {
        return errors.New("email required")
    }
    return nil
}

func validateEmailFormat(email string) error {
    if !strings.Contains(email, "@") {
        return errors.New("invalid email")
    }
    return nil
}

func requireAdult(age int) error {
    if age < 18 {
        return errors.New("must be 18+")
    }
    return nil
}

func requireRole(role string) error {
    if role == "" {
        return errors.New("role required")
    }
    return nil
}

// Compose validators
func ValidateUser(user User) error {
    validators := []func() error{
        func() error { return requireEmail(user.Email) },
        func() error { return validateEmailFormat(user.Email) },
        func() error { return requireAdult(user.Age) },
    }
    return runValidators(validators)
}

func ValidateAdmin(admin Admin) error {
    validators := []func() error{
        func() error { return requireEmail(admin.Email) },
        func() error { return validateEmailFormat(admin.Email) },
        func() error { return requireAdult(admin.Age) },
        func() error { return requireRole(admin.Role) },
    }
    return runValidators(validators)
}

func runValidators(validators []func() error) error {
    for _, validate := range validators {
        if err := validate(); err != nil {
            return err
        }
    }
    return nil
}

DRY Improvement: Reusable validators, composable validation

Pattern 6: Extract Test Helpers

Problem: Repeated test setup/teardown

Solution: Create test utilities

Before

// test/user.test.js
describe('User tests', () => {
    it('should create user', () => {
        const db = createDatabase();
        const user = { name: 'John', email: 'john@example.com' };
        // ... test logic
        db.close();
    });

    it('should update user', () => {
        const db = createDatabase();
        const user = { name: 'John', email: 'john@example.com' };
        // ... test logic
        db.close();
    });
});

// test/order.test.js
describe('Order tests', () => {
    it('should create order', () => {
        const db = createDatabase();
        const user = { name: 'John', email: 'john@example.com' };
        // ... test logic
        db.close();
    });
});

After

// test/helpers/testUtils.js
export function setupTestDatabase() {
    const db = createDatabase();
    return {
        db,
        cleanup: () => db.close()
    };
}

export function createTestUser(overrides = {}) {
    return {
        name: 'John',
        email: 'john@example.com',
        ...overrides
    };
}

// test/user.test.js
import { setupTestDatabase, createTestUser } from './helpers/testUtils';

describe('User tests', () => {
    let db;

    beforeEach(() => {
        ({ db } = setupTestDatabase());
    });

    afterEach(() => {
        db.close();
    });

    it('should create user', () => {
        const user = createTestUser();
        // ... test logic (cleaner!)
    });

    it('should update user', () => {
        const user = createTestUser({ name: 'Jane' });
        // ... test logic
    });
});

DRY Improvement: Shared test utilities, less boilerplate

obra YAGNI/DRY Principles

YAGNI: You Aren't Gonna Need It

Don't add functionality until necessary
Avoid premature abstraction
Wait for duplication before extracting

DRY: Don't Repeat Yourself

Every piece of knowledge should have a single representation
Avoid copy-paste programming
Extract when you see duplication 2-3 times

When to Extract

Three strikes rule: Duplicate 3 times → extract
Clear pattern: Similar code structure appears
High maintenance cost: Changes require updates in multiple places
Business logic: Domain rules should be centralized

When NOT to Extract

Premature abstraction: Only seen once or twice
Coincidental duplication: Similar code, different concepts
Temporary code: Prototypes, experiments
Over-engineering: Abstraction more complex than duplication

DRYPATTERNS

echo "✓ DRY refactoring patterns guide created"


## Phase 5: Generate Duplication Report

I'll create a comprehensive report with prioritized refactoring opportunities:

```bash
echo ""
echo "=== Generating Duplication Report ==="
echo ""

# Count duplications
TOTAL_DUPLICATIONS=0
DUPLICATION_PERCENTAGE=0

if [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
    TOTAL_DUPLICATIONS=$(jq '.statistics.total.duplications' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
    DUPLICATION_PERCENTAGE=$(jq '.statistics.total.percentage' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
fi

cat > "$REPORT" << EOF
# Code Duplication Analysis Report

**Generated:** $(date)
**Minimum Lines Threshold:** $MIN_LINES lines
**Total Duplications Found:** $TOTAL_DUPLICATIONS
**Duplication Percentage:** $DUPLICATION_PERCENTAGE%

---

## Summary

Code duplication violates the DRY (Don't Repeat Yourself) principle and leads to:
- Higher maintenance costs (fix bugs in multiple places)
- Increased bug likelihood (miss updating one location)
- Larger codebase (more code to understand)
- Lower code quality (copy-paste programming)

**Duplication Guidelines (obra principles):**
- **0-5%:** Excellent - minimal duplication
- **5-10%:** Good - acceptable for large projects
- **10-20%:** Moderate - consider refactoring
- **20%+:** High - significant technical debt

---

## Duplication Statistics

EOF

if [ "$USE_JSCPD" = true ] && [ -f "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" ]; then
    cat >> "$REPORT" << EOF
### Overall Statistics
- Total Lines Analyzed: $(jq '.statistics.total.lines' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
- Duplicated Lines: $(jq '.statistics.total.clones' "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null)
- Duplication Blocks: $TOTAL_DUPLICATIONS
- Duplication Percentage: $DUPLICATION_PERCENTAGE%

### Top Duplicated Files

EOF
    jq -r '.statistics.formats[] |
        "\(.format):\n  Files: \(.sources)\n  Duplications: \(.duplications)\n  Percentage: \(.percentage)%\n"' \
        "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null >> "$REPORT"

    cat >> "$REPORT" << EOF

### Largest Duplication Blocks

EOF
    jq -r '.duplicates[:10] | .[] |
        "**\(.lines) lines** duplicated in \(.format)\n- Location 1: \(.firstFile.name):\(.firstFile.start)\n- Location 2: \(.secondFile.name):\(.secondFile.start)\n"' \
        "$ANALYSIS_DIR/jscpd-report/jscpd-report.json" 2>/dev/null >> "$REPORT"

    cat >> "$REPORT" << EOF

**Full HTML Report:** Open \`$ANALYSIS_DIR/jscpd-report/html/index.html\` in browser

EOF
else
    cat >> "$REPORT" << EOF
### Basic Analysis Results

EOF
    if [ -f "$ANALYSIS_DIR/duplicate-functions.txt" ] && [ -s "$ANALYSIS_DIR/duplicate-functions.txt" ]; then
        cat >> "$REPORT" << EOF
**Duplicate Function Signatures (JavaScript/TypeScript):**
\`\`\`
$(cat "$ANALYSIS_DIR/duplicate-functions.txt")
\`\`\`

EOF
    fi

    if [ -f "$ANALYSIS_DIR/duplicate-python-functions.txt" ] && [ -s "$ANALYSIS_DIR/duplicate-python-functions.txt" ]; then
        cat >> "$REPORT" << EOF
**Duplicate Function Signatures (Python):**
\`\`\`
$(cat "$ANALYSIS_DIR/duplicate-python-functions.txt")
\`\`\`

EOF
    fi

    if [ -f "$ANALYSIS_DIR/magic-numbers.txt" ] && [ -s "$ANALYSIS_DIR/magic-numbers.txt" ]; then
        cat >> "$REPORT" << EOF
**Repeated Numeric Literals:**
\`\`\`
$(cat "$ANALYSIS_DIR/magic-numbers.txt")
\`\`\`
Consider extracting to named constants.

EOF
    fi

    if [ -f "$ANALYSIS_DIR/repeated-strings.txt" ] && [ -s "$ANALYSIS_DIR/repeated-strings.txt" ]; then
        cat >> "$REPORT" << EOF
**Repeated String Literals:**
\`\`\`
$(cat "$ANALYSIS_DIR/repeated-strings.txt")
\`\`\`
Consider extracting to configuration.

EOF
    fi
fi

cat >> "$REPORT" << 'EOF'
---

## Recommended DRY Refactoring

### Priority 1: Extract Duplicated Functions
- Identify exact code duplicates (10+ lines)
- Extract to shared utility functions
- Consolidate in common modules

### Priority 2: Centralize Configuration
- Extract magic numbers to constants
- Create configuration files
- Use environment variables for deployment-specific values

### Priority 3: Eliminate Similar Patterns
- Identify similar code structures
- Extract common patterns
- Use higher-order functions or templates

### Priority 4: Consolidate Test Helpers
- Extract common test setup/teardown
- Create test utilities
- Share fixtures across test suites

**See detailed patterns:** `cat $ANALYSIS_DIR/dry-refactoring-patterns.md`

---

## Implementation Steps

1. **Create Git Checkpoint**
   ```bash
   git add -A
   git commit -m "Pre DRY-refactoring checkpoint" || echo "No changes"

Prioritize Duplications
- Start with largest duplication blocks
- Focus on frequently changed code
- Consider domain importance
Apply DRY Patterns
- Extract function (most common)
- Extract configuration
- Template method pattern
- Higher-order functions
- Composition
Three Strikes Rule
- Wait for 3 occurrences before extracting
- Avoid premature abstraction
- Balance DRY with YAGNI
Verify Improvements
- Re-run duplication analysis
- Ensure tests pass
- Check for reduced code size
Document Changes
- Update documentation
- Note refactoring decisions
- Share patterns with team

Continuous Monitoring

Add to CI/CD

Pre-commit Hook

# .git/hooks/pre-commit
#!/bin/bash
# Check duplication before commit
jscpd --threshold $MIN_LINES . || exit 1

GitHub Actions

name: Code Quality
on: [push, pull_request]
jobs:
  duplication:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Check duplication
        run: |
          npm install -g jscpd
          jscpd --threshold 6 .

Integration with Other Skills

/refactor - Systematic code restructuring
/complexity-reduce - Reduce cyclomatic complexity
/make-it-pretty - Improve code readability
/review - Include duplication in code reviews
/test - Ensure tests after refactoring

Resources

Report generated at: $(date)

Next Steps:

Review high-duplication areas
Select appropriate DRY pattern
Implement refactoring incrementally
Re-run analysis to verify improvement

EOF

echo "✓ Duplication report generated: $REPORT"


## Summary

```bash
echo ""
echo "=== ✓ Duplication Analysis Complete ==="
echo ""
echo "📊 Analysis Results:"
if [ "$USE_JSCPD" = true ]; then
    echo "  Total duplications: $TOTAL_DUPLICATIONS"
    echo "  Duplication percentage: $DUPLICATION_PERCENTAGE%"
    echo "  Threshold: $MIN_LINES lines"
else
    echo "  Basic analysis complete"
    echo "  Install jscpd for detailed analysis: npm install -g jscpd"
fi
echo ""
echo "📁 Generated Files:"
echo "  - Duplication Report: $REPORT"
echo "  - DRY Patterns: $ANALYSIS_DIR/dry-refactoring-patterns.md"
[ -f "$ANALYSIS_DIR/jscpd-report/html/index.html" ] && echo "  - HTML Report: $ANALYSIS_DIR/jscpd-report/html/index.html"
echo ""
echo "🎯 Recommended Actions:"
if [ "$DUPLICATION_PERCENTAGE" != "0" ] && [ "${DUPLICATION_PERCENTAGE%.*}" -gt 10 ]; then
    echo "  ⚠️  High duplication detected ($DUPLICATION_PERCENTAGE%)"
    echo "  1. Review largest duplication blocks"
    echo "  2. Extract duplicated functions"
    echo "  3. Centralize configuration"
    echo "  4. Run tests after refactoring"
elif [ "$DUPLICATION_PERCENTAGE" != "0" ] && [ "${DUPLICATION_PERCENTAGE%.*}" -gt 5 ]; then
    echo "  Moderate duplication ($DUPLICATION_PERCENTAGE%)"
    echo "  Consider refactoring during feature work"
else
    echo "  ✓ Low duplication - excellent!"
    echo "  Continue monitoring in code reviews"
fi
echo ""
echo "💡 DRY Refactoring Patterns:"
echo "  1. Extract Function (consolidate duplicates)"
echo "  2. Extract Configuration (centralize constants)"
echo "  3. Template Method (share algorithm structure)"
echo "  4. Higher-Order Functions (parameterize behavior)"
echo "  5. Composition (build from smaller pieces)"
echo ""
echo "📏 obra YAGNI/DRY Guidelines:"
echo "  - Three strikes rule: Extract after 3 duplications"
echo "  - Avoid premature abstraction"
echo "  - Balance DRY with readability"
echo ""
echo "🔗 Integration Points:"
echo "  - /refactor - Systematic restructuring"
echo "  - /complexity-reduce - Reduce complexity"
echo "  - /review - Include in code reviews"
echo ""
echo "View report: cat $REPORT"
echo "View patterns: cat $ANALYSIS_DIR/dry-refactoring-patterns.md"
[ -f "$ANALYSIS_DIR/jscpd-report/html/index.html" ] && echo "View HTML: open $ANALYSIS_DIR/jscpd-report/html/index.html"

Safety Guarantees

What I'll NEVER do:

Automatically refactor without analysis
Extract after single occurrence (violates YAGNI)
Remove code without understanding context
Skip testing after refactoring
Add AI attribution to commits

What I WILL do:

Identify genuine code duplication
Suggest proven DRY patterns
Follow obra YAGNI principles
Recommend incremental refactoring
Preserve functionality
Provide clear examples

Credits

This skill is based on:

obra/superpowers - YAGNI and DRY principles
The Pragmatic Programmer - DRY principle origin
Martin Fowler - Refactoring patterns
JSCPD - Multi-language duplication detection
Rule of Three - When to extract abstraction

Token Budget

Target: 2,000-3,500 tokens per execution

Phase 1-2: ~600 tokens (setup + tool installation)
Phase 3: ~1,000 tokens (duplication detection)
Phase 4-5: ~1,500 tokens (patterns + reporting)

Optimization Strategy:

Use jscpd for efficient detection
Fallback to grep-based analysis
Template-based pattern generation
Focus on actionable duplications
Clear DRY refactoring guidance

This ensures thorough duplication detection with practical DRY refactoring strategies based on obra principles while respecting token limits.

duplication-detect

Safety Notice

Copy this and send it to your AI assistant to learn

Code Duplication Detection & DRY Refactoring

Token Optimization

1. Source Directory Detection Caching (500 token savings)

2. Bash-Based Duplication Tool Execution (1,800 token savings)

3. Sample-Based Duplication Reporting (900 token savings)

4. Template-Based DRY Refactoring Recommendations (800 token savings)

5. Incremental Duplication Checks (1,000 token savings)

6. Grep-Based Similar Pattern Discovery (700 token savings)

7. Cached Duplication Baseline (600 token savings)

8. Threshold-Based Filtering (400 token savings)

Real-World Token Usage Distribution

Phase 1: Tool Setup & Configuration

Phase 2: Install Detection Tools

Phase 3: Detect Code Duplication

Phase 4: Generate DRY Refactoring Strategies

After

Pattern 2: Extract Configuration/Constants

Before (Python)

After

Pattern 3: Template Method Pattern

Before (TypeScript)

After

Pattern 4: Higher-Order Functions

Before (JavaScript)

After

Pattern 5: Composition Over Duplication

Before (Go)

After

Pattern 6: Extract Test Helpers

Before

After

obra YAGNI/DRY Principles

YAGNI: You Aren't Gonna Need It

DRY: Don't Repeat Yourself

When to Extract

When NOT to Extract

Continuous Monitoring

Add to CI/CD

Pre-commit Hook

GitHub Actions

Integration with Other Skills

Resources

Safety Guarantees

Credits

Token Budget

Source Transparency

Related Skills

cache-strategy

sessions-init

postman-convert

db-diagram