OCI Vault and Secrets Management - Expert Knowledge

🏗️ Use OCI Landing Zone Terraform Modules

Don't reinvent the wheel. Use oracle-terraform-modules/landing-zone for Vault setup.

Landing Zone solves:

❌ Bad Practice #1: Generic compartments (Landing Zone creates Security compartment for Vault)
❌ Bad Practice #7: No security services (Landing Zone integrates Cloud Guard monitoring)
❌ Bad Practice #10: No audit logging (Landing Zone enables Vault audit logs)

This skill provides: Vault operations, secret management patterns, and troubleshooting for vaults deployed WITHIN a Landing Zone.

⚠️ OCI CLI/API Knowledge Gap

You don't know OCI CLI commands or OCI API structure.

Your training data has limited and outdated knowledge of:

OCI CLI syntax and parameters (updates monthly)
OCI API endpoints and request/response formats
Vault service CLI operations (oci vault secret, oci kms)
Secret encoding formats and retrieval patterns
Latest Vault/KMS features and cross-region replication

When OCI operations are needed:

Use exact CLI commands from this skill's references
Do NOT guess OCI Vault CLI syntax
Do NOT assume AWS Secrets Manager patterns work in OCI
Load reference files for detailed Vault API documentation

What you DO know:

General secrets management principles
Encryption and key management concepts
Secret rotation patterns

This skill bridges the gap by providing current OCI-specific Vault patterns and gotchas.

You are an OCI Vault expert. This skill provides knowledge Claude lacks: anti-patterns, IAM permission gotchas, cost optimization, security vulnerabilities, and OCI-specific operational knowledge.

NEVER Do This

❌ NEVER log secret contents (even in debug/error messages)

# WRONG - secret ends up in log aggregation, retained for years
logger.debug(f"Retrieved secret: {secret_value}")
logger.error(f"Failed to parse secret: {secret_value}")

# RIGHT - log metadata only
logger.debug(f"Retrieved secret OCID: {secret_ocid[:20]}...")
logger.error(f"Failed to parse secret (type: {type(secret_value)})")

❌ NEVER set temp key file permissions AFTER writing content

# WRONG - world-readable during write (security window)
with open('/tmp/key.pem', 'w') as f:
    f.write(private_key)
os.chmod('/tmp/key.pem', 0o600)  # Too late!

# RIGHT - secure BEFORE writing
fd = os.open('/tmp/key.pem', os.O_CREAT | os.O_WRONLY, 0o600)
with os.fdopen(fd, 'w') as f:
    f.write(private_key)

❌ NEVER use overly broad IAM policies

BAD:  "Allow any-user to read secret-family in tenancy"
BAD:  "Allow group Developers to manage secret-family in tenancy"
GOOD: "Allow dynamic-group app-prod to read secret-family in compartment AppSecrets
       where target.secret.name = 'db-*'"

❌ NEVER retrieve secrets without caching

Cost: $0.03 per 10,000 requests (first 10k/month free)
Without cache: 1000 req/hr × 24 × 30 = 720k/month = $2.16/month
With 60min cache: 1000 req/hr → 24 calls/day = 720/month = FREE
Savings: 98% cost reduction

❌ NEVER use PLAIN content type (deprecated)

Always use BASE64 encoding for secrets
PLAIN is legacy and may not work in future

❌ NEVER hardcode Vault OCIDs in code

# WRONG - not portable, leaked in repos
VAULT_SECRET_OCID = "ocid1.vaultsecret.oc1.iad.xxxxx"

# RIGHT - configuration
VAULT_SECRET_OCID = os.environ['VAULT_SECRET_OCID']

IAM Permission Gotcha (Critical)

Secret retrieval requires BOTH permissions:

"Allow dynamic-group X to read secret-family in compartment Y"
"Allow dynamic-group X to use keys in compartment Y"

Why both needed:

read secret-family → allows listing and reading secret metadata
use keys → allows decryption of secret content (secrets encrypted with master key)

Without use keys: Get confusing 403 error: "User not authorized to perform this operation"

Common mistake: Forgetting use keys permission, spending hours debugging "authorization failed"

Vault Hierarchy (Often Confused)

Vault (container)
 └─ Master Encryption Key (for encryption/decryption)
     └─ Secret (encrypted data)
         └─ Secret Versions (rotation over time)

Commands use different services:

Vault operations: oci kms management vault ...
Key operations: oci kms management key ... --endpoint <vault-endpoint>
Secret operations: oci vault secret ... (NOT kms!)

Common mistake: oci vault-secret create (no such command) vs oci vault secret create (correct)

Secret Retrieval Error Decision Tree

Secret retrieval fails?
│
├─ 401 Unauthorized
│  ├─ On OCI compute? → Check dynamic group membership
│  ├─ Local dev? → Check ~/.oci/config, verify API key uploaded
│  └─ After rotation? → Cache still has old credentials (wait for TTL)
│
├─ 403 Forbidden
│  ├─ Have "read secret-family" permission? → Add if missing
│  └─ Have "use keys" permission? → THIS IS USUALLY THE ISSUE
│
├─ 404 Not Found
│  ├─ Wrong secret OCID? → Verify environment variable
│  ├─ Wrong compartment? → Secrets client must use secret's compartment
│  └─ Secret deleted? → Check vault for secret status
│
└─ 500 Internal Server Error
   └─ Vault service issue → Retry with exponential backoff (rate limit)

Cost Optimization

Vault API Pricing: $0.03 per 10,000 requests (10k/month free)

Calculation Examples:

Without caching (retrieve on every API call):

1000 API calls/hour
24 hours × 30 days = 720,000 Vault requests/month
(720,000 / 10,000) × $0.03 = $2.16/month

With 60-minute cache TTL:

1000 API calls/hour → 1 Vault request/hour
24 hours × 30 days = 720 Vault requests/month
Under 10k free tier = $0/month (FREE)
Savings: 98%

Cache TTL Selection:

Security Requirements	Cache TTL	Reasoning
High (rotate daily)	5-15 minutes	Frequent refresh, still 90%+ savings
Standard (rotate monthly)	30-60 minutes	Balance security and cost
Dev/Test	No cache	Always fresh for development

Rule: Cache TTL must be less than secret rotation window

Secret Rotation (Zero-Downtime)

WRONG (causes downtime):

# Don't delete and recreate - breaks running apps
oci vault secret delete --secret-id <secret-ocid>
oci vault secret create ...  # New OCID, apps break

RIGHT (zero-downtime):

# Create new VERSION of existing secret
oci vault secret update-base64 \
  --secret-id <secret-ocid> \
  --secret-content-content "$(echo -n 'new-value' | base64)"

# Secret OCID stays same, apps automatically get new version
# Old version kept as "previous" for rollback

Key points:

Secret OCID doesn't change (apps continue working)
Vault serves latest version by default
Previous versions retained for rollback
Applications pick up new version on next cache refresh (no restart needed)

Instance Principal Authentication

Production compute instances should use instance principals:

# 1. Create dynamic group
oci iam dynamic-group create \
  --name "app-instances" \
  --matching-rule "instance.compartment.id = '<compartment-ocid>'"

# 2. Grant Vault access
# "Allow dynamic-group app-instances to read secret-family in compartment Secrets"
# "Allow dynamic-group app-instances to use keys in compartment Secrets"

# 3. Application code (no credentials needed)
signer = oci.auth.signers.InstancePrincipalsSecurityTokenSigner()
secrets_client = oci.secrets.SecretsClient(config={}, signer=signer)

Benefits:

No credentials to manage or rotate
No secrets stored on compute instances
Automatic token refresh
Audit trail shows which instance accessed what

Audit Logging

Enable Vault access logging:

# Create log group
oci logging log-group create \
  --compartment-id <ocid> \
  --display-name "vault-audit-logs"

# Enable read access logging
oci logging log create \
  --log-group-id <log-group-ocid> \
  --display-name "secret-read-audit" \
  --log-type SERVICE \
  --configuration '{
    "source": {
      "sourceType": "OCISERVICE",
      "service": "vaults",
      "resource": "<vault-ocid>",
      "category": "read"
    }
  }'

What gets logged:

Who: User OCID or instance principal identity
What: Secret OCID accessed
When: Timestamp (UTC)
Where: Source IP address
Result: Success (200) or failure (403, 404, etc.)

Monitoring alerts (recommended):

>10 failed access attempts in 5 minutes (unauthorized access)
Access to secrets not assigned to requesting instance
Access from unexpected IP ranges

OCI-Specific Gotchas

Vault Management Endpoint is Required for Key Operations:

# Find your vault's endpoint
oci kms management vault get --vault-id <vault-ocid> \
  --query 'data."management-endpoint"' --raw-output

# Use in key commands
oci kms management key create ... \
  --endpoint https://xxxxx-management.kms.us-ashburn-1.oraclecloud.com

Regional Vault Availability:

Not all OCI regions have Vault service
Check region availability before designing architecture
Cross-region secret access adds latency (10-50ms)

Secret Bundle Base64 Decoding:

# Secret content is base64-encoded
secret_bundle = secrets_client.get_secret_bundle(secret_ocid)
encoded = secret_bundle.data.secret_bundle_content.content
decoded = base64.b64decode(encoded).decode('utf-8')  # Don't forget decode()

Progressive Loading References

OCI Vault Reference (Official Oracle Documentation)

WHEN TO LOAD oci-vault-reference.md:

Need comprehensive Vault and KMS API documentation
Understanding key management and encryption options
Implementing HSM-backed key protection
Need official Oracle guidance on Vault architecture
Setting up cross-region secret replication

Do NOT load for:

Quick secret retrieval examples (covered in this skill)
Permission debugging (decision trees above)
Secret rotation patterns (covered above)

When to Use This Skill

Storing credentials in Vault: secret organization, IAM setup
Secret retrieval failures: 403 errors, permission debugging
Cost optimization: caching strategy, API call reduction
Secret rotation: zero-downtime updates, version management
Security: temp file handling, logging anti-patterns, audit setup
Production: instance principal configuration, monitoring alerts

secrets-management

Safety Notice

Copy this and send it to your AI assistant to learn