MongoDB is built from the ground up to handle JSON data. Its flexible schemas and dynamic queries make it easy to store and access non-relational data.
But with flexibility comes complexity. As your data models grow, how do you ensure data quality and consistency? This is where JSON schema validation comes in…
In this comprehensive 3200+ word guide, you’ll learn:
- What is JSON schema and why it matters for data integrity
- How to create JSON schemas for MongoDB document validation
- Using $jsonSchema to validate documents on insert, find, and update
- Building indexes and scaling schemas across shards
- JSON schema validation patterns for common use cases
- Implementation best practices for schema validation
- Managing schema changes over time
Let’s dive in!
What is JSON Schema?
JSON schema provides a standard format for defining the structure of JSON data. It includes keywords for detailing:
- What fields/properties exist
- The expected data type of each field (string, number, array, etc)
- Whether a field is required vs optional
- Complex data validation rules and relationships
Here‘s a simple product schema example:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Product",
"type": "object",
"properties": {
"_id": {
"type": "objectId"
},
"name": {
"type": "string"
},
"price": {
"type": "number",
"minimum": 0
}
}
}
This does a few key things:
- Defines expected fields for a product document
- Sets data types (objectId, string, number)
- Enforces
priceas a number >= 0
JSON schema allows declarative validation of JSON structures. This helps ensure integrity as data flows into and within MongoDB.
Why JSON Schema Matters for MongoDB
MongoDB is schema-flexible by design. The JSON documents within a collection can have entirely different fields and structures.
This flexibility enables rapid iteration on data models. But it also means bad data can easily sneak into your database as needs evolve.
Without structural checks, a single typo can corrupt datasets without notice. Similarly, different code bases can intro invalid data types that break integrations.
JSON schema bridges the worlds of flexibility and integrity:
- Flexibility to adapt to changing business needs
- Integrity to prevent data quality issues
It also serves as self-documenting metadata for other engineers to understand expected data shapes.
Creating JSON Schemas for MongoDB
Before applying JSON schema validation logic, we need to model schemas that describe our collection structures.
Here are schema design best practices for MongoDB:
Match the Schema to Actual Documents
Rather than an aspirational schema, focus on validating real-world docs that exist in your database already. Analyze sample documents to derive common structures, types, and relationships.
This ensures a pragmatic schema that fits your actual data flows.
Use Validation Keywords
JSON Schema includes advanced validation keywords like:
- minimum/maximum for numeric boundaries
- minLength/maxLength for string lengths
- pattern for regex string matching
- format for common types like date, email
Sprinkle these in to further validate field values.
Reference Sub-Schemas
For complex nested sub-documents, define reusable sub-schemas. Then reference them from your top-level schema to avoid duplication.
Support Flexible Model Evolution
Allow extra unspecified fields via "additionalProperties". This supports new undeclared properties as your models evolve.
Here is an example product schema demonstrating these best practices:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://example.com/product.schema.json",
"title": "Product",
"description": "A product sold in our online store",
"type": "object",
"properties": {
"_id": {
"description": "Unique product ID",
"type": "objectId"
},
"name": {
"description": "Name of the product",
"type": "string",
"minLength": 2
},
"price": {
"type": "number",
"minimum": 0,
"maximum": 1000,
"exclusiveMaximum": true
},
"stock": {
"type": "integer"
},
"lastOrdered": {
"type": "string",
"format": "date-time"
}
},
"required": ["_id", "name", "price"],
"additionalProperties": true
}
This schema allows flexibility while upholding baseline quality for products.
Now we can leverage $jsonSchema to enforce it on our database operations.
Using $jsonSchema for Insert Validation
A primary use case for JSON Schema is validating document structures during insertion into MongoDB. This prevents bad data from ever reaching the database.
1. Create a Collection with JSON Schema Validation
First, attach schema validation rules to your MongoDB collection on creation:
db.createCollection("products", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["name", "price"],
properties: {
name: {},
price: {
bsonType: "number",
minimum: 0
}
}
}
}
})
This sets up basic validation requiring:
- name field as a string
- price field as a positive number
The JSON schema here acts as the validator.
2. Insert Documents
Now MongoDB will validate all inserts against the defined schema:
db.products.insert({
_id: 1,
name: "T-Shirt",
price: 10
})
// Inserts successfully
db.products.insert({
_id: 2
})
// Fails document validation
Any inserts that don‘t adhere to the JSON schema rules will be rejected.
This keeps bad data out of the database.
3. Evolve JSON Schemas Over Time
A key benefit of JSON Schema is the ability to update validation rules as needs change:
db.runCommand({
collMod: "products",
validator: {
$jsonSchema: {
required: ["name", "price", "stock"],
properties: {
stock: {
bsonType: "int",
minimum: 0
}
}
}
}
})
We‘ve now updated the schema to also enforce a new stock field as an integer >= 0.
This allows your validation logic to gracefully evolve alongside changing data models.
Querying with JSON Schema
Along with inserts, we can also use $jsonSchema to filter query results by document structure.
This is useful for ensuring you only get docs that meet a target shape from MongoDB:
// Products collection
{ "_id": 1, "name": "A", price: 10 }
{ "_id": 2, "name": "B" }
db.products.find({
$jsonSchema: {
required:["name", "price"],
properties: {
price: { bsonType: "number" }
}
}
})
// Returns only first doc
// Second lacks ‘price‘ field in schema
Here the JSON schema acts as a structural filter to exclude documents that don‘t match the declared shape.
You can even combine $jsonSchema with other query language features:
db.products.find(
{ $jsonSchema: { required: ["name"] } },
{ name: 1 }
).sort({ name: 1 })
// Finds docs with ‘name‘ field
// Projects to only return name
// Sorts names alphabetically
This shows the flexibility of querying documents with schema validation.
Updating Documents with JSON Schema
Along with reads and writes, schema validation works on updates too.
You can ensure updated documents continue meeting schema requirements after modifications.
// Products collection
{
"_id": 1,
"name": "T-Shirt",
"price": 10,
"stock": 5
}
db.products.update(
{ _id: 1 },
{ $unset: { stock: 1 } } // Attempts to remove
)
// Fails schema validation
// Since ‘stock‘ is required
Since the collection has JSON schema enabled, the invalid update is blocked.
This prevents corrupt data mutations.
JSON schema combined with updates prevents drifting out of compliance over time as documents get modified.
Building Indexes from JSON Schemas
For fast query performance, MongoDB indexes optimize data access.
And JSON schemas provide perfect metadata for deciding what to index.
Fields marked as required or with advanced validation rules make great index candidates. As queries often filter and sort on these attributes.
You can auto-create indexes based on schema keywords:
db.products.createIndexes([
{
keys: { name: 1 },
options: { name: "name_index" }
}
])
Since ‘name‘ is marked required in the schema, an index builds on it.
Closely matching indexes to schema shapes optimal for your queries avoids manual index creation.
Some MongoDB drivers even auto-index fields used in $jsonSchema validations for you!
Scaling JSON Schema Across Shards
In distributed databases, data gets split across shards – separate physical servers.
This presents an integrity challenge:
How do you enforce schema rules when documents live in different places?
MongoDB uses the config server cluster to coordinate validation across shards. When you attach a JSON schema to a sharded collection, the config servers store and execute the checks properly no matter the data placement.
$jsonSchema seamlessly scales validation across clusters keeping integrity intact.
JSON Schema Patterns for Common Cases
Beyond the basics, some useful advanced validation patterns emerge:
Optional Validation
Check for a field value only when present:
"stock": {
"bsonType": "int",
"minimum": 0,
"optionalValidation": true
}
String Enumeration
Allow only specific string values:
"department": {
"enum": ["books", "electronics", "household"]
}
Array Length Limiting
Constrain array sizes:
"categories": {
"maxItems": 5
}
Conditional Logic
Validate one field based on another:
"if": {
"inStock": { "const": true }
},
"then": {
"warehouseLocation": { "bsonType": "string" }
}
These demonstrate the flexibility of JSON Schema validation logic.
Implementation Best Practices
Now that we‘ve explored JSON Schema capabilities, here are some best practices for effectively implementing validation:
Start with Required Fields
Begin by making key fields required along with their BSON data types. This establishes baseline document shape.
Add Validation Rules Incrementally
Gradually layer on additional keywords like string lengths or numeric ranges. This avoids initial overwhelm.
Use Conditionals Judiciously
While conditional nested rules are possible, balance flexibility with complexity.
Centralize Schema Definitions
Keep schemas in dedicated files or a database collection. Don‘t duplicate inline schema logic.
Implement in Code First
Define validation in application code before the database layer. This allows iterating without affecting production data.
Use Schema Versioning
Version schemas (v1, v2) to safely upgrade rules without breaking existing data.
Managed Schema Evolution
The fluid nature of JSON Schema does pose a data evolution challenge over time:
How do you iteratively improve validation rules without breaking existing documents?
Here are 3 effective strategies for managed schema changes:
1. Application-Layer Checks
Initially implement new validation logic in application code only. This safely improves integrity without affecting underlying docs structures.
2. Conditional Validation
Use conditional keywords like "if/then" to check new rules just for inserts vs the existing dataset. This avoids conflicts.
3. Schema Versioning
Utilize unique schema version names (v1, v2) then update gradually after testing. Old schemas should continue working.
With some planning, teams can uphold integrity while supporting existing documents as validation needs grow.
Integration Testing Approaches
To prevent bad data from being introduced unexpectedly, JSON schemas should be tested just like application code.
Unit Testing
Unit test schema definitions with test documents to validate they pass/fail as expected. Use scripting with mock data.
End-to-End Testing
In CI pipelines, insert test documents then assert $jsonSchema compliance with matchers. Confirm validation functioning end-to-end.
Performance Testing
Ensure JSON schema rules don‘t introduce bottlenecks. Load test inserts and measure delays comparing with/without validation.
Canary Deployments
Release updated schemas slowly to a percentage of users first. Monitor for real-world data issues before full rollout.
Testing flows confirm your validation foundation upholds data quality standards at scale.
Common JSON Schema Design Antipatterns
While building JSON schemas, there are also some common missteps to avoid:
Overly Strict Validation
Don‘t over constrain schemas early without giving flexibility for changes. Documents should only fail if unusable.
Duplicated Logic
Don‘t repeat the same validation rules at the app layer and database level. Keep business logic in one place.
Blocked Evolution
Avoid required fields that can‘t easily receive null values later. This prevents adapting to shifts.
Schema Churn
Balance improving validation coverage with change rate. Highly volatile schemas create maintenance burdens.
Keeping an eye out to prevent these antipatterns will streamline long-term schema success.
Conclusion
JSON Schema allows defining consistent structures upfront that flexibly adapt over time. This prevents downstream data headaches.
MongoDB‘s native support for validating documents against JSON Schema converges powerful paradigms:
- Flexibility to modify models
- Integrity to safeguard data quality
- Agility to ship faster
By directly embedding schema checks within the database itself, you get programmatic validation without performance overhead.
The end result? More robust and resilient data stores.
Ready to validate? Start modeling expected JSON data shapes and integrate seamless integrity with MongoDB‘s document validation capabilities!


