Humanize: AI Pattern Detection and Removal

Remove AI-generated writing patterns from text. Produce natural, human-sounding output that preserves meaning.

This is not a generic rewriter. It targets specific, documented AI-writing patterns catalogued by Wikipedia's WikiProject AI Cleanup from thousands of observed instances.

Workflow

Five phases. Each phase has a clear input, transformation, and output. Do not skip phases.

Phase 1: Detection Scan

Read the input text. Load references/detection-patterns.md. Scan for two categories of signals:

A. Lexical patterns (the 24 catalogued AI-writing patterns):

Category	Patterns	Priority
Content inflation	Significance puffing, notability claims, superficial -ing analyses, promotional language, vague attributions, formulaic challenges sections	HIGH — loudest AI tells
Vocabulary	AI-frequency words, copula avoidance, filler phrases, excessive hedging	HIGH — statistically detectable
Structure	Rule of three, negative parallelisms, elegant variation, false ranges, inline-header lists	MEDIUM — structural fingerprints
Style	Em dash overuse, boldface overuse, title case headings, emoji decoration, curly quotes	MEDIUM — formatting tells
Communication	Chatbot artifacts, knowledge-cutoff disclaimers, sycophantic tone, generic conclusions	LOW — obvious, usually caught by author

B. Statistical regularity signals (see references/statistical-signals.md):

Signal	What to look for
Sentence length uniformity	Sentences clustering within a narrow word-count range
Low clause density variation	Every sentence has the same number of clauses
Flat information density	Every sentence carries roughly the same amount of detail
High-frequency phrase templates	Stock collocations and common bigrams/trigrams dominating the text
Excessive transition markers	Formal connectives appearing more than 8 per 1,000 words
Structural symmetry	Paragraphs and sentences following balanced, mirror-like patterns
Uniform inter-sentence cohesion	Every sentence tightly follows the previous with no topic shifts or digressions
Generic function word usage	Connectors and prepositions used in textbook-standard distribution with no personal tendencies

Output a detection report using the detection report template (see Output Format).

Instance severity rating:

Severity	Criteria
HIGH	3+ patterns co-occurring in a single paragraph, or any paragraph saturated with AI vocabulary (5+ signal words)
MEDIUM	1-2 patterns in a paragraph, or a statistical signal present across 3+ consecutive sentences
LOW	Isolated single instance of any pattern, or a borderline statistical signal

Phase 2: Structural Rewrite

Transform document structure to break AI-typical organization:

Convert uniform paragraph lengths to varied blocks
Merge or split sentences to break rhythmic uniformity
Reorder clauses where meaning permits
Convert formulaic list structures to narrative where appropriate
Remove tripartite constructions unless the content genuinely has three parts

Do not change factual content. Do not add information. Do not remove cited sources, data, or technical terms.

Phase 3: Vocabulary and Style Pass

Apply pattern-specific rewrites from the detection report:

Replace AI-frequency vocabulary with natural alternatives
Restore simple copulas (is/are/has) where the text uses elaborate substitutes
Remove filler phrases and excessive hedging
Cut promotional language and significance inflation
Replace vague attributions with specific ones (or remove if no source exists)

Load the appropriate style profile from references/style-guide.md based on the target domain. Apply domain-specific voice calibration.

Phase 4: Entropy and Variation

Human writing has burstiness — irregular rhythm, varied sentence lengths, uneven information density. AI text is statistically smooth. This phase breaks that smoothness.

Load references/statistical-signals.md for target ranges. Apply:

Sentence length variance: mix short declarative with longer explanatory. Target visible variance across any 5-sentence window.
Clause density variation: alternate simple sentences (one clause) with compound/complex (2-3 clauses). Do not settle on a uniform clause count.
Information density variation: let some sentences carry heavy detail while others are light — a summary statement, a reaction, a pivot. Uniform density reads as generated.
Phrase template breaking: replace stock collocations with specific phrasings. "Play a role in" -> name the specific action. "In terms of" -> delete or restructure.
Inter-sentence cohesion variation: not every sentence should tightly follow the previous. Allow small topic expansions, brief asides, or contextual jumps that a thinking human would make.
Function word personalization: vary connector usage. Use "but" in one place, "still" in another, nothing in a third. Do not default to the same conjunction pattern throughout.
Paragraph length variance: mix single-sentence paragraphs with 4-5 sentence blocks.
Controlled imperfection: fragments at impact positions, parenthetical asides, concessive turns. Sparingly — seasoning, not structure.

Phase 5: Validation and Output

Two checks before delivering:

Semantic check: Compare rewrite against original. Every factual claim, data point, argument, and technical term in the original must be present in the rewrite. If anything was lost, restore it.

Self-audit: Ask internally: "What still sounds AI-generated about this text?" If residual patterns remain, fix them. One pass only — do not loop indefinitely.

Output the final text followed by a brief changes summary.

Output Format

Full Rewrite / Targeted Fix / Style Shift

[Humanized text]

---
Changes: [2-4 bullet summary of what was changed and why]
Patterns detected: [list of pattern numbers/names found]
Domain: [detected or specified domain]

For short texts (under 100 words), skip the changes summary unless the user requests it.

Detection Only

## Detection Report

**Domain:** [detected or specified]
**Overall severity:** [HIGH / MEDIUM / LOW]
**Patterns found:** [count]

### Findings

| Location | Pattern | Severity | Evidence |
|----------|---------|----------|----------|
| Para 1 | #7 AI vocabulary | HIGH | "delve", "intricate", "pivotal" in same sentence |
| Para 2 | #8 Copula avoidance | MEDIUM | "serves as" instead of "is" |
| Para 1-4 | Sentence length uniformity | MEDIUM | All sentences 18-22 words, SD < 3 |
| ... | ... | ... | ... |

### Statistical Signals

| Signal | Status | Detail |
|--------|--------|--------|
| Sentence length variance | FLAG | SD ~3 words (human typical: 7-15) |
| Transition frequency | OK | 5 per 1,000 words |
| ... | ... | ... |

### Summary
[1-2 sentences: overall assessment and highest-priority patterns to fix first]

Reference Files

File	Purpose	Load When
`references/detection-patterns.md`	24 AI-writing patterns with examples	Always (Phase 1)
`references/statistical-signals.md`	12 statistical regularity signals with target ranges	Phase 1 (scan) and Phase 4 (targets)
`references/style-guide.md`	Domain-specific voice profiles and calibration rules	Phase 3 (match to domain)
`references/transformation-rules.md`	Structural rewrite strategies and entropy techniques	Phase 2 and Phase 4
`examples/academic.md`	Before/after pairs for academic writing	When domain is academic
`examples/blog.md`	Before/after pairs for blog/casual writing	When domain is blog or social
`examples/professional.md`	Before/after pairs for professional/business writing	When domain is professional

Domain Detection

If the user does not specify a domain, infer from:

Vocabulary density and jargon type
Citation patterns
Sentence complexity
Register (formal/informal markers)

Default to professional if ambiguous.

Supported domains: academic, technical, blog, social, professional, marketing

Behavioral Constraints

Never fabricate. Do not add facts, citations, quotes, statistics, or claims not in the original.
Never remove data. Numbers, dates, names, URLs, and cited sources must survive the rewrite.
Preserve argument structure. If the original makes points A, B, C in that order with that logic, the rewrite must preserve the logical flow.
Do not over-humanize. Some text is meant to be neutral and informational. A technical specification does not need personality. Match the appropriate register.
Respect code blocks and structured data. Do not humanize code, tables, JSON, YAML, or any structured/machine-readable content. Pass these through unchanged.
One pass through the pipeline. Do not run the 5-phase pipeline recursively. If the output still has tells after Phase 5, note them in the changes summary rather than looping.

Scope Modes

Mode	Trigger	Behavior
Full rewrite	"humanize this", "rewrite naturally"	Run all 5 phases
Detection only	"check for AI patterns", "does this sound AI"	Run Phase 1 only, output detection report
Targeted fix	"fix the AI-sounding parts", "just clean up the obvious stuff"	Run Phase 1, then apply fixes only to HIGH-priority patterns
Style shift	"make this more casual/academic/professional"	Run Phases 3-4 with specified domain profile

Error Handling

Problem	Cause	Resolution
Input under 20 words	Insufficient signal for pattern detection	Report: "Text too short for reliable pattern detection." Apply vocabulary fixes only (Phase 3) if obvious patterns are present. Skip statistical signal analysis.
Input is entirely code/structured data	No prose to humanize	Report: "Input is structured data — no humanization applicable." Return input unchanged.
Mixed human + AI text	Partial AI generation or human-edited AI output	Run Phase 1 on full text. Flag only paragraphs/sections with detected patterns. Apply Phases 2-4 selectively to flagged sections. Leave clean sections untouched.
Domain ambiguous after detection	Input mixes registers (e.g., academic citations in a blog post)	Default to professional. Note the ambiguity in the output: "Domain defaulted to professional — specify if another profile is preferred."
Semantic drift detected in Phase 5	Rewrite altered meaning during structural/vocabulary changes	Restore the drifted factual claim from the original. Do not re-run the full pipeline. Note the restoration in the changes summary.
Input contains fabricated citations	Original text has hallucinated sources	Not detectable — this skill humanizes style, not factual accuracy. Pass through unchanged. Note in limitations if the user asks about accuracy.
All patterns are LOW severity	Text is mostly human-written with minor tells	In targeted fix mode, report findings but recommend no changes. In full rewrite mode, apply light-touch fixes only — do not over-edit clean text.

Integration Point

Other writing skills can import references/detection-patterns.md as a pattern library for their own anti-pattern sweeps. The detection patterns are the shared asset; the pipeline is this skill's domain.

Limitations

Cannot verify factual accuracy of the original text. Garbage in, humanized garbage out.
Effectiveness depends on input length. Very short texts (under 20 words) have insufficient signal for pattern detection.
Style profiles are guidelines, not voice cloning. The output will sound natural but will not match a specific author's voice without additional calibration.
Does not interact with external AI-detection APIs. Assessment is heuristic, not benchmark-verified.

humanize

Safety Notice

Copy this and send it to your AI assistant to learn