Schema Discovery¶

Overview¶

Schema Discovery helps you understand and organize the data flowing into your collections. When you ingest data from webhooks, APIs, or other sources, your records may contain fields that aren't yet defined in your collection's schema. Schema Discovery scans your data, detects these fields, infers their types, and creates suggestions that you can review and accept to build your schema over time.

Why It Matters¶

Without defined properties, your collection's Records tab only shows system fields (ID, Created). All your actual data fields are hidden inside the raw JSON. Once you promote fields via Schema Discovery, they appear as named columns in the records table — making your data visible, filterable, and sortable at a glance.

Querying works regardless of schema. You can filter and sort by any field in the JSONB data column using the API or record filter builder, whether or not that field is a defined property. Schema Discovery is about visibility and organization, not query capability.

Schema Discovery Modes¶

Every collection has a schema discovery mode that controls how incoming records are handled relative to defined properties.

Strict (Default)¶

Records must conform to the defined schema. Unknown fields are rejected.

Best for: Production collections with a known, stable data shape.

// Schema defines: name, email
// ✅ Accepted
{ "name": "Jane", "email": "jane@example.com" }

// ❌ Rejected — "phone" is not in the schema
{ "name": "Jane", "email": "jane@example.com", "phone": "555-1234" }

Behavior:

All fields are validated against the schema (type, required, unique constraints, references)
Unknown fields cause the request to fail with a validation error
No schema discovery or evolution occurs

Schemaless¶

Records are accepted with any fields. No validation is performed.

Best for: Webhook ingestion, event streams, or any collection where you don't know the data shape upfront.

// No schema defined — everything is accepted
{ "event": "order.created", "amount": 99.99, "customer": "cust_abc" }

Behavior:

All data is stored as-is (subject to size limits: max 100 keys, 3 levels of nesting, 1000 character strings)
Records are buffered for schema discovery
No automatic inference — you trigger discovery manually via the Scan or Inference buttons
Accepted suggestions become defined properties (columns visible in the records table)

Auto-Evolving¶

Records are validated against the existing schema, but unknown fields are accepted alongside known fields.

Best for: Collections with a defined schema that integrates with sources whose data shape changes over time (e.g., evolving APIs, third-party webhooks that add fields).

// Schema defines: name (string, required), email (string)
// ✅ Accepted — "phone" is stored as an unknown field
{ "name": "Jane", "email": "jane@example.com", "phone": "555-1234" }

// ❌ Rejected — "name" is required but missing
{ "email": "jane@example.com", "phone": "555-1234" }

Behavior:

Known fields (matching defined properties) are fully validated — type checks, required fields, unique constraints, references
Unknown fields are accepted and stored alongside known fields
New unknown fields are automatically detected per record — a suggestion is created for each new field without any manual action
Detection uses type inference (pattern matching), not AI — it's fast and free
Suggestions still require manual accept/reject before becoming properties

Mode Comparison¶

	Strict	Schemaless	Auto-Evolving
Requires schema upfront	Yes	No	Yes
Validates known fields	Yes	No	Yes
Accepts unknown fields	No	Yes	Yes
Auto-detects new fields	—	No	Yes
Discovery trigger	—	Manual	Automatic
Detection method	—	Scan (free) or Inference (AI)	Type inference (free)

Discovering Fields¶

Scan Existing Records¶

Scans a sample of your records to find fields not in your schema. Uses pattern matching to infer types — no AI cost.

Go to your collection → AI tab → Schema Discovery
Click Scan Existing Records
Suggestions appear in the table below

This is the recommended starting point. It's fast, free, and handles most cases.

Trigger Inference¶

Sends buffered records to the AI service for deeper analysis. Useful when simple type inference isn't enough — for example, detecting semantic types or suggesting field descriptions.

Check the Buffer Count in the summary cards
Click Trigger Inference
The AI analyzes buffered records and creates suggestions

This uses your AI token budget. Use it when the scan doesn't give you what you need.

Reviewing Suggestions¶

Suggestions appear in the Schema Suggestions table with status filters:

Status	Meaning
Pending	Awaiting your review
Accepted	Added to the collection's schema
Rejected	Dismissed

For each suggestion you can see:

Field Name — the discovered field
Operation — typically "Add Field"
Suggested Type — the inferred type (string, number, boolean, datetime, object, array)
Confidence — how confident the detection is
Sample Values — actual values found in your records

Use Accept to promote a field to a defined property, or Reject to dismiss it. You can also use bulk accept/reject for multiple suggestions.

How Type Inference Works¶

The scan examines actual values across your records to determine the most appropriate type:

Sample Values	Inferred Type
`"hello"`, `"world"`	String
`42`, `99.99`, `-5`	Number
`true`, `false`	Boolean
`"2024-01-15"`, `"2024-01-15T10:30:00Z"`	DateTime
`{"nested": "object"}`	Object
`[1, 2, 3]`	Array

Configuration¶

Configure Schema Discovery per collection under AI tab → Schema Discovery Configuration.

Mode — Strict, Schemaless, or Auto-Evolving
Batch Size — How many records to buffer before they become available for inference (schemaless mode). Default: 10. Larger batches give more accurate type inference but take longer to accumulate.

Example Workflow¶

Webhook Integration¶

You're receiving Stripe webhook events with an evolving payload.

1. Create a collection in schemaless mode — you don't know the full shape yet.

2. Send webhook data to Centrali. Records are accepted and stored as-is.

{
  "type": "charge.succeeded",
  "amount": 2999,
  "currency": "usd",
  "customer": "cus_abc123",
  "created": 1705312200
}

3. Run Schema Discovery. Go to the collection → AI tab → Schema Discovery → click Scan Existing Records.

4. Review and accept suggestions. The scan detects type (string), amount (number), currency (string), customer (string), created (number).

5. Your records table now shows named columns instead of raw JSON — filterable and sortable.

6. Optionally switch to auto-evolving mode. If Stripe adds new fields to their webhook payload in the future, auto-evolving mode will detect them automatically and create suggestions without you having to run another scan.

Limits¶

Limit	Value
Max keys per record (schemaless)	100
Max nesting depth (schemaless)	3 levels
Max string length sampled	1,000 characters
Max buffered records	1,000

AI Validation — Data quality validation rules
Anomaly Insights — Anomaly detection
Collections & Records — Collection schema and records
Querying Records — Filtering and sorting

Schema Discovery¶

Overview¶

Why It Matters¶

Schema Discovery Modes¶

Strict (Default)¶

Schemaless¶

Auto-Evolving¶

Mode Comparison¶

Discovering Fields¶

Scan Existing Records¶

Trigger Inference¶

Reviewing Suggestions¶

How Type Inference Works¶

Configuration¶

Example Workflow¶

Webhook Integration¶

Limits¶

Related¶