Skip to content

Schema Discovery

Overview

Schema Discovery helps you understand and organize the data flowing into your collections. When you ingest data from webhooks, APIs, or other sources, your records may contain fields that aren't yet defined in your collection's schema. Schema Discovery scans your data, detects these fields, infers their types, and creates suggestions that you can review and accept to build your schema over time.

Why It Matters

Without defined properties, your collection's Records tab only shows system fields (ID, Created). All your actual data fields are hidden inside the raw JSON. Once you promote fields via Schema Discovery, they appear as named columns in the records table — making your data visible, filterable, and sortable at a glance.

Querying works regardless of schema. You can filter and sort by any field in the JSONB data column using the API or record filter builder, whether or not that field is a defined property. Schema Discovery is about visibility and organization, not query capability.

Schema Discovery Modes

Every collection has a schema discovery mode that controls how incoming records are handled relative to defined properties.

Strict (Default)

Records must conform to the defined schema. Unknown fields are rejected.

Best for: Production collections with a known, stable data shape.

// Schema defines: name, email
// ✅ Accepted
{ "name": "Jane", "email": "jane@example.com" }

// ❌ Rejected — "phone" is not in the schema
{ "name": "Jane", "email": "jane@example.com", "phone": "555-1234" }

Behavior:

  • All fields are validated against the schema (type, required, unique constraints, references)
  • Unknown fields cause the request to fail with a validation error
  • No schema discovery or evolution occurs

Schemaless

Records are accepted with any fields. No validation is performed.

Best for: Webhook ingestion, event streams, or any collection where you don't know the data shape upfront.

// No schema defined — everything is accepted
{ "event": "order.created", "amount": 99.99, "customer": "cust_abc" }

Behavior:

  • All data is stored as-is (subject to size limits: max 100 keys, 3 levels of nesting, 1000 character strings)
  • Records are buffered for schema discovery
  • No automatic inference — you trigger discovery manually via the Scan or Inference buttons
  • Accepted suggestions become defined properties (columns visible in the records table)

Auto-Evolving

Records are validated against the existing schema, but unknown fields are accepted alongside known fields.

Best for: Collections with a defined schema that integrates with sources whose data shape changes over time (e.g., evolving APIs, third-party webhooks that add fields).

// Schema defines: name (string, required), email (string)
// ✅ Accepted — "phone" is stored as an unknown field
{ "name": "Jane", "email": "jane@example.com", "phone": "555-1234" }

// ❌ Rejected — "name" is required but missing
{ "email": "jane@example.com", "phone": "555-1234" }

Behavior:

  • Known fields (matching defined properties) are fully validated — type checks, required fields, unique constraints, references
  • Unknown fields are accepted and stored alongside known fields
  • New unknown fields are automatically detected per record — a suggestion is created for each new field without any manual action
  • Detection uses type inference (pattern matching), not AI — it's fast and free
  • Suggestions still require manual accept/reject before becoming properties

Mode Comparison

Strict Schemaless Auto-Evolving
Requires schema upfront Yes No Yes
Validates known fields Yes No Yes
Accepts unknown fields No Yes Yes
Auto-detects new fields No Yes
Discovery trigger Manual Automatic
Detection method Scan (free) or Inference (AI) Type inference (free)

Discovering Fields

Scan Existing Records

Scans a sample of your records to find fields not in your schema. Uses pattern matching to infer types — no AI cost.

  1. Go to your collection → AI tab → Schema Discovery
  2. Click Scan Existing Records
  3. Suggestions appear in the table below

This is the recommended starting point. It's fast, free, and handles most cases.

Trigger Inference

Sends buffered records to the AI service for deeper analysis. Useful when simple type inference isn't enough — for example, detecting semantic types or suggesting field descriptions.

  1. Check the Buffer Count in the summary cards
  2. Click Trigger Inference
  3. The AI analyzes buffered records and creates suggestions

This uses your AI token budget. Use it when the scan doesn't give you what you need.

Reviewing Suggestions

Suggestions appear in the Schema Suggestions table with status filters:

Status Meaning
Pending Awaiting your review
Accepted Added to the collection's schema
Rejected Dismissed

For each suggestion you can see:

  • Field Name — the discovered field
  • Operation — typically "Add Field"
  • Suggested Type — the inferred type (string, number, boolean, datetime, object, array)
  • Confidence — how confident the detection is
  • Sample Values — actual values found in your records

Use Accept to promote a field to a defined property, or Reject to dismiss it. You can also use bulk accept/reject for multiple suggestions.

How Type Inference Works

The scan examines actual values across your records to determine the most appropriate type:

Sample Values Inferred Type
"hello", "world" String
42, 99.99, -5 Number
true, false Boolean
"2024-01-15", "2024-01-15T10:30:00Z" DateTime
{"nested": "object"} Object
[1, 2, 3] Array

Configuration

Configure Schema Discovery per collection under AI tab → Schema Discovery Configuration.

  • Mode — Strict, Schemaless, or Auto-Evolving
  • Batch Size — How many records to buffer before they become available for inference (schemaless mode). Default: 10. Larger batches give more accurate type inference but take longer to accumulate.

Example Workflow

Webhook Integration

You're receiving Stripe webhook events with an evolving payload.

1. Create a collection in schemaless mode — you don't know the full shape yet.

2. Send webhook data to Centrali. Records are accepted and stored as-is.

{
  "type": "charge.succeeded",
  "amount": 2999,
  "currency": "usd",
  "customer": "cus_abc123",
  "created": 1705312200
}

3. Run Schema Discovery. Go to the collection → AI tab → Schema Discovery → click Scan Existing Records.

4. Review and accept suggestions. The scan detects type (string), amount (number), currency (string), customer (string), created (number).

5. Your records table now shows named columns instead of raw JSON — filterable and sortable.

6. Optionally switch to auto-evolving mode. If Stripe adds new fields to their webhook payload in the future, auto-evolving mode will detect them automatically and create suggestions without you having to run another scan.

Limits

Limit Value
Max keys per record (schemaless) 100
Max nesting depth (schemaless) 3 levels
Max string length sampled 1,000 characters
Max buffered records 1,000