AI Data Validation¶

Overview¶

AI Data Validation automatically detects data quality issues in your records using machine learning. It can identify typos, format errors, duplicate entries, and semantic inconsistencies - then suggest or auto-apply fixes.

Key Features¶

Real-time Validation: Records are validated as they're created or updated
Batch Scanning: Scan all existing records to find issues
Auto-Correct Mode: Automatically fix high-confidence issues
Configurable Per Structure: Enable/disable and configure for each structure independently

Issue Types¶

Type	Description	Examples
Format	Rule-based validation for standard formats	Invalid email, malformed URL, incorrect date format
Typo	AI-powered spelling and typo detection	"Jonh" → "John", "recieve" → "receive"
Duplicate	Potential duplicate records detected	Two customers with same email but different names
Semantic	Logical inconsistencies in data	End date before start date, negative quantities

Configuration¶

AI Validation is configured per structure in the Console UI under the AI Validation tab.

Validation Mode¶

Advisory: Issues are flagged for manual review. No automatic changes are made.
Auto-Correct: High-confidence issues are automatically fixed. You set the confidence threshold (70%-100%).

Auto-Correct Threshold¶

When using auto-correct mode, only suggestions with confidence above your threshold are applied automatically. Lower confidence issues are still flagged for manual review.

Recommended thresholds: - 90%+ for production data (conservative) - 80%+ for development/testing (balanced) - 70%+ for initial data cleanup (aggressive)

Using AI Validation¶

Enable Validation¶

Navigate to your structure in the Console
Click the AI Validation tab
Toggle Enable AI Validation
Select which issue types to detect
Choose your validation mode
Click Save Configuration

Run a Batch Scan¶

To scan all existing records:

Go to the AI Validation tab
Click Start Scan
Progress is shown in real-time
You can navigate away - the scan continues in the background

Review Suggestions¶

Pending suggestions appear in the Validation Suggestions table:

Accept: Apply the suggested fix to the record
Reject: Mark the suggestion as not applicable
Bulk Actions: Select multiple suggestions to accept/reject at once

Suggestion Details¶

Each suggestion shows: - Record: Link to the affected record - Field: The field with the issue - Original Value: Current value in the record - Suggested Value: Recommended correction - Confidence: How confident the AI is (0-100%) - Reason: Explanation of why this was flagged (hover to view)

Real-time Notifications¶

When validation is enabled, you receive notifications for: - New validation suggestions created - Batch scan completion with summary

Subscribe to validation events in the Notification Settings.

SDK Usage¶

The SDK provides methods to work with validation:

// Trigger a batch validation scan
const batch = await centrali.validation.triggerScan('orders');
console.log('Scan started:', batch.data.batchId);

// List pending suggestions
const suggestions = await centrali.validation.listSuggestions({
  status: 'pending',
  issueType: 'typo'
});

// Accept a suggestion (applies the fix)
await centrali.validation.accept('suggestion-id');

// Reject a suggestion
await centrali.validation.reject('suggestion-id');

// Bulk accept high-confidence suggestions
const highConfidence = suggestions.data.filter(s => s.confidence >= 0.95);
await centrali.validation.bulkAccept(highConfidence.map(s => s.id));

// Get validation summary
const summary = await centrali.validation.getSummary();

Realtime Events¶

Subscribe to validation events via SSE:

const subscription = centrali.realtime.subscribe({
  structures: ['orders'],
  events: ['validation_suggestion_created', 'validation_batch_completed'],
  onEvent: (event) => {
    if (event.event === 'validation_suggestion_created') {
      console.log('New issue:', event.data.field, event.data.issueType);
    }
  }
});

Best Practices¶

Start with Advisory Mode¶

Begin with advisory mode to understand what issues exist in your data before enabling auto-correct.

Use High Thresholds for Production¶

Set auto-correct threshold to 90%+ for production data to minimize false positives.

Regular Batch Scans¶

Run batch scans periodically (weekly/monthly) to catch issues that may have been missed during real-time validation.

Review Low-Confidence Suggestions¶

Low-confidence suggestions often reveal edge cases or unusual but valid data. Review these carefully.

Limits¶

Limit	Value
Batch scan size	All records in structure
Concurrent scans per workspace	1
Suggestion retention	90 days
Max suggestions per batch	10,000

Anomaly Insights - AI-powered anomaly detection
Schema Discovery - Automatic schema evolution
Structures & Records - Data schemas and entries