Test Data Management
<default_to_action> When creating or managing test data:
-
NEVER use production PII directly
-
GENERATE synthetic data with faker libraries
-
ANONYMIZE production data if used (mask, hash)
-
ISOLATE test data (transactions, per-test cleanup)
-
SCALE with batch generation (10k+ records/sec)
Quick Data Strategy:
-
Unit tests: Minimal data (just enough)
-
Integration: Realistic data (full complexity)
-
Performance: Volume data (10k+ records)
Critical Success Factors:
-
40% of test failures from inadequate data
-
GDPR fines up to €20M for PII violations
-
Never store production PII in test environments </default_to_action>
Quick Reference Card
When to Use
-
Creating test datasets
-
Handling sensitive data
-
Performance testing with volume
-
GDPR/CCPA compliance
Data Strategies
Type When Size
Minimal Unit tests 1-10 records
Realistic Integration 100-1000 records
Volume Performance 10k+ records
Edge cases Boundary testing Targeted
Privacy Techniques
Technique Use Case
Synthetic Generate fake data (preferred)
Masking j***@example.com
Hashing Irreversible pseudonymization
Tokenization Reversible with key
Synthetic Data Generation
import { faker } from '@faker-js/faker';
// Seed for reproducibility faker.seed(123);
function generateUser() { return { id: faker.string.uuid(), email: faker.internet.email(), firstName: faker.person.firstName(), lastName: faker.person.lastName(), phone: faker.phone.number(), address: { street: faker.location.streetAddress(), city: faker.location.city(), zip: faker.location.zipCode() }, createdAt: faker.date.past() }; }
// Generate 1000 users const users = Array.from({ length: 1000 }, generateUser);
Test Data Builder Pattern
class UserBuilder { private user: Partial<User> = {};
asAdmin() { this.user.role = 'admin'; this.user.permissions = ['read', 'write', 'delete']; return this; }
asCustomer() { this.user.role = 'customer'; this.user.permissions = ['read']; return this; }
withEmail(email: string) { this.user.email = email; return this; }
build(): User { return { id: this.user.id ?? faker.string.uuid(), email: this.user.email ?? faker.internet.email(), role: this.user.role ?? 'customer', ...this.user } as User; } }
// Usage const admin = new UserBuilder().asAdmin().withEmail('admin@test.com').build(); const customer = new UserBuilder().asCustomer().build();
Data Anonymization
// Masking
function maskEmail(email) {
const [user, domain] = email.split('@');
return ${user[0]}***@${domain};
}
// john@example.com → j***@example.com
function maskCreditCard(cc) {
return ****-****-****-${cc.slice(-4)};
}
// 4242424242424242 → --****-4242
// Anonymize production data
const anonymizedUsers = prodUsers.map(user => ({
id: user.id, // Keep ID for relationships
email: user-${user.id}@example.com, // Fake email
firstName: faker.person.firstName(), // Generated
phone: null, // Remove PII
createdAt: user.createdAt // Keep non-PII
}));
Database Transaction Isolation
// Best practice: use transactions for cleanup beforeEach(async () => { await db.beginTransaction(); });
afterEach(async () => { await db.rollbackTransaction(); // Auto cleanup! });
test('user registration', async () => { const user = await userService.register({ email: 'test@example.com' }); expect(user.id).toBeDefined(); // Automatic rollback after test - no cleanup needed });
Volume Data Generation
// Generate 10,000 users efficiently async function generateLargeDataset(count = 10000) { const batchSize = 1000; const batches = Math.ceil(count / batchSize);
for (let i = 0; i < batches; i++) {
const users = Array.from({ length: batchSize }, (_, index) => ({
id: i * batchSize + index,
email: user${i * batchSize + index}@example.com,
firstName: faker.person.firstName()
}));
await db.users.insertMany(users); // Batch insert
console.log(`Batch ${i + 1}/${batches}`);
} }
Agent-Driven Data Generation
// High-speed generation with constraints await Task("Generate Test Data", { schema: 'ecommerce', count: { users: 10000, products: 500, orders: 5000 }, preserveReferentialIntegrity: true, constraints: { age: { min: 18, max: 90 }, roles: ['customer', 'admin'] } }, "qe-test-data-architect");
// GDPR-compliant anonymization await Task("Anonymize Production Data", { source: 'production-snapshot', piiFields: ['email', 'phone', 'ssn'], method: 'pseudonymization', retainStructure: true }, "qe-test-data-architect");
Agent Coordination Hints
Memory Namespace
aqe/test-data-management/ ├── schemas/* - Data schemas ├── generators/* - Generator configs ├── anonymization/* - PII handling rules └── fixtures/* - Reusable fixtures
Fleet Coordination
const dataFleet = await FleetManager.coordinate({ strategy: 'test-data-generation', agents: [ 'qe-test-data-architect', // Generate data 'qe-test-executor', // Execute with data 'qe-security-scanner' // Validate no PII exposure ], topology: 'sequential' });
Related Skills
-
database-testing - Schema and integrity testing
-
compliance-testing - GDPR/CCPA compliance
-
performance-testing - Volume data for perf tests
Remember
Test data is infrastructure, not an afterthought. 40% of test failures are caused by inadequate test data. Poor data = poor tests.
Never use production PII directly. GDPR fines up to €20M or 4% of revenue. Always use synthetic data or properly anonymized production snapshots.
With Agents: qe-test-data-architect generates 10k+ records/sec with realistic patterns, relationships, and constraints. Agents ensure GDPR/CCPA compliance automatically and eliminate test data bottlenecks.