Skip to content

AI Data Validation

Overview

AI Data Validation automatically detects data quality issues in your records using machine learning. It can identify typos, format errors, duplicate entries, and semantic inconsistencies - then suggest or auto-apply fixes.

Key Features

  • Real-time Validation: Records are validated as they're created or updated
  • Batch Scanning: Scan all existing records to find issues
  • Auto-Correct Mode: Automatically fix high-confidence issues
  • Configurable Per Structure: Enable/disable and configure for each structure independently

Issue Types

Type Description Examples
Format Rule-based validation for standard formats Invalid email, malformed URL, incorrect date format
Typo AI-powered spelling and typo detection "Jonh" → "John", "recieve" → "receive"
Duplicate Potential duplicate records detected Two customers with same email but different names
Semantic Logical inconsistencies in data End date before start date, negative quantities

Configuration

AI Validation is configured per structure in the Console UI under the AI Validation tab.

Validation Mode

  • Advisory: Issues are flagged for manual review. No automatic changes are made.
  • Auto-Correct: High-confidence issues are automatically fixed. You set the confidence threshold (70%-100%).

Auto-Correct Threshold

When using auto-correct mode, only suggestions with confidence above your threshold are applied automatically. Lower confidence issues are still flagged for manual review.

Recommended thresholds: - 90%+ for production data (conservative) - 80%+ for development/testing (balanced) - 70%+ for initial data cleanup (aggressive)

Using AI Validation

Enable Validation

  1. Navigate to your structure in the Console
  2. Click the AI Validation tab
  3. Toggle Enable AI Validation
  4. Select which issue types to detect
  5. Choose your validation mode
  6. Click Save Configuration

Run a Batch Scan

To scan all existing records:

  1. Go to the AI Validation tab
  2. Click Start Scan
  3. Progress is shown in real-time
  4. You can navigate away - the scan continues in the background

Review Suggestions

Pending suggestions appear in the Validation Suggestions table:

  • Accept: Apply the suggested fix to the record
  • Reject: Mark the suggestion as not applicable
  • Bulk Actions: Select multiple suggestions to accept/reject at once

Suggestion Details

Each suggestion shows: - Record: Link to the affected record - Field: The field with the issue - Original Value: Current value in the record - Suggested Value: Recommended correction - Confidence: How confident the AI is (0-100%) - Reason: Explanation of why this was flagged (hover to view)

Real-time Notifications

When validation is enabled, you receive notifications for: - New validation suggestions created - Batch scan completion with summary

Subscribe to validation events in the Notification Settings.

SDK Usage

The SDK provides methods to work with validation:

// Trigger a batch validation scan
const batch = await centrali.validation.triggerScan('orders');
console.log('Scan started:', batch.data.batchId);

// List pending suggestions
const suggestions = await centrali.validation.listSuggestions({
  status: 'pending',
  issueType: 'typo'
});

// Accept a suggestion (applies the fix)
await centrali.validation.accept('suggestion-id');

// Reject a suggestion
await centrali.validation.reject('suggestion-id');

// Bulk accept high-confidence suggestions
const highConfidence = suggestions.data.filter(s => s.confidence >= 0.95);
await centrali.validation.bulkAccept(highConfidence.map(s => s.id));

// Get validation summary
const summary = await centrali.validation.getSummary();

Realtime Events

Subscribe to validation events via SSE:

const subscription = centrali.realtime.subscribe({
  structures: ['orders'],
  events: ['validation_suggestion_created', 'validation_batch_completed'],
  onEvent: (event) => {
    if (event.event === 'validation_suggestion_created') {
      console.log('New issue:', event.data.field, event.data.issueType);
    }
  }
});

Best Practices

Start with Advisory Mode

Begin with advisory mode to understand what issues exist in your data before enabling auto-correct.

Use High Thresholds for Production

Set auto-correct threshold to 90%+ for production data to minimize false positives.

Regular Batch Scans

Run batch scans periodically (weekly/monthly) to catch issues that may have been missed during real-time validation.

Review Low-Confidence Suggestions

Low-confidence suggestions often reveal edge cases or unusual but valid data. Review these carefully.

Limits

Limit Value
Batch scan size All records in structure
Concurrent scans per workspace 1
Suggestion retention 90 days
Max suggestions per batch 10,000