A high‑performance JSON parser and encoder for MRuby, powered by Powered by simdjson Learn more at https://simdjson.org
mruby-fast-json provides:
- Ultra‑fast JSON.parse using simdjson’s DOM parser
- Strict error reporting mapped to Ruby exception classes
- Full UTF‑8 validation on both parse and dump
- Optional symbolized keys (
symbolize_names: true) - JSON.dump with correct escaping and Unicode handling
- Round‑trip safety for all supported types
- Big integer support (uint64 → MRuby integer)
- Precise error classes for malformed JSON
This gem is designed to be a drop‑in replacement for JSON.parse and JSON.dump in MRuby environments where performance and correctness matter.
Backed by simdjson, parsing is extremely fast even for large documents.
JSON.parse('{"name":"Alice"}', symbolize_names: true)
# => { :name => "Alice" }Invalid UTF‑8 sequences raise JSON::UTF8Error.
All control characters, quotes, backslashes, and C0 controls are escaped according to the JSON spec.
Numbers larger than INT64_MAX become MRuby integers, not floats.
Malformed JSON raises specific exceptions such as:
JSON::TapeErrorJSON::StringErrorJSON::UnclosedStringErrorJSON::DepthErrorJSON::NumberErrorJSON::BigIntErrorJSON::UnescapedCharsError- …and many more
obj = JSON.parse('{"name":"Alice","age":30}')
obj["name"] # => "Alice"
obj["age"] # => 30obj = JSON.parse('{"name":"Alice"}', symbolize_names: true)
obj[:name] # => "Alice"
obj["name"] # => nilobj = JSON.parse('{"user":{"id":1,"name":"Bob"}}')
obj["user"] # => { "id" => 1, "name" => "Bob" }arr = JSON.parse('[true, null, 42, "hi"]')
# => [true, nil, 42, "hi"]JSON.dump({ "x" => 1, "y" => "z" })
# => '{"x":1,"y":"z"}'JSON.dump([true, nil, "text"])
# => '[true,null,"text"]'obj = { "emoji" => "😀😃😄" }
json = JSON.dump(obj)
JSON.parse(json) # => same structureA high‑performance, zero‑copy, streaming JSON interface for MRuby, powered by simdjson’s OnDemand parser.
The OnDemand API provides:
- Lazy parsing — fields are parsed only when accessed
- Zero‑copy string access when possible
- Fast field lookup (
doc["key"],doc.at(index)) - JSON Pointer support (
doc.at_pointer("/a/b/0")) - Streaming iteration over arrays and objects
- Deterministic error handling mapped to Ruby exceptions
- Native deserialization into Ruby objects via
native_ext_deserialize
This API is ideal for large JSON documents, streaming workloads, or performance‑critical environments.
json = '{"user":{"id":1,"name":"Alice"},"tags":[1,2,3]}'
doc = JSON.parse_lazy(json)
doc["user"]["name"] # => "Alice"
doc["tags"][1] # => 2Unlike JSON.parse, this does not build a full Ruby object tree.
Values are parsed on demand, directly from the underlying buffer.
If the input string has enough capacity for simdjson’s padding, the parser uses it directly:
JSON.zero_copy_parsing = true
doc = JSON.parse_lazy(str)If not, the string is resized and frozen, or a padded buffer is allocated.
A JSON::Document represents a lazily parsed JSON value.
It supports:
doc["name"] # => value or nil
doc.fetch("name") # => value or raises KeyErrordoc.at(0) # => value or nil
doc.fetch(0) # => value or raises IndexErrorYou may only use .at once per array, when you need to iterate over an array take a look at the Iteration APIs below.
doc.at_pointer("/user/name") # => "Alice"doc.at_path("$.user.id") # => 1doc.at_path_with_wildcard("$.items[*].id")
# => [1, 2, 3]Or with a block:
doc.at_path_with_wildcard("$.items[*].id") do |id|
puts id
enddoc.array_each do |value|
puts value
endOr return an array:
doc.array_each
# => [ ... ]doc.object_each do |key, value|
puts "#{key} = #{value}"
endAll simdjson errors are mapped to Ruby exceptions:
JSON::NoSuchFieldErrorJSON::OutOfBoundsErrorJSON::TapeErrorJSON::DepthErrorJSON::UTF8ErrorJSON::NumberErrorJSON::BigIntErrorJSON::UnescapedCharsErrorJSON::OndemandParserInUseError- …and many more
Lookup misses (NO_SUCH_FIELD, INDEX_OUT_OF_BOUNDS, etc.) return nil for:
doc["key"]doc.find_fielddoc.find_field_unordereddoc.atdoc.at_pointerdoc.at_path
But raise for:
doc.fetch
You can define a Ruby class with a schema:
class Foo
attr_accessor :foo
native_ext_deserialize :@foo, JSON::Type::String
endThen deserialize directly from an OnDemand document:
doc = JSON.parse_lazy('{"foo":"hello"}')
foo = doc.into(Foo.new)
foo.foo # => "hello"- Each class stores a hidden schema hash:
:@ivar => JSON::Type::X - The C++ layer iterates the schema and attempts to:
- find the field
- check the JSON type
- convert the value
- assign the ivar
- No fallback, no coercion, no guessing
- If at least one field matches → success
- If none match →
JSON::IncorrectTypeError - If simdjson reports an error → raised immediately
This is a deterministic, explicit, zero‑magic deserialization pipeline.
Supported Types
- JSON::Type::Array
- JSON::Type::Object
- JSON::Type::Number
- JSON::Type::String
- JSON::Type::Boolean
- JSON::Type::Null
- OnDemand parsing is streaming: fields are parsed only when accessed.
- You have to access fields in order or an error is thrown, when you need to start from the beginning of a stream you can call .rewind on a JSON::Document.
Use OnDemand when:
- You parse large JSON documents
- You only need a subset of fields
- You want maximum performance
- You want deterministic, schema‑driven deserialization
- You want to avoid building full Ruby objects
Use DOM (JSON.parse) when:
- You need a complete Ruby object tree
- You want to modify the parsed structure
- You prefer simplicity over performance
class User
attr_accessor :id, :name
native_ext_deserialize :@id, JSON::Type::Number
native_ext_deserialize :@name, JSON::Type::String
end
doc = JSON.load_lazy("users.json")
users = []
doc.array_each do |user_doc|
u = User.new
users << user_doc.into(u)
endThis avoids building any intermediate Ruby hashes or arrays.
JSON.load_lazy loads a JSON file into a padded_string and returns a lazily‑parsed JSON::Document.
This is the most efficient way to process large JSON files in MRuby.
Unlike JSON.parse(File.read(...)), this API:
- avoids allocating a Ruby string for the entire file
- uses simdjson’s padded_string::load for optimal I/O
- parses lazily — fields are parsed only when accessed
- supports zero‑copy access to string values
- keeps the underlying buffer alive automatically
doc = JSON.load_lazy("data.json")
doc["user"]["name"] # parsed on demand
doc.array_each do |item|
puts item["id"]
endJSON.load_lazy(path) performs:
-
Load file into simdjson::padded_string This ensures correct padding and optimal memory layout.
-
Wrap it in a Ruby
JSON::PaddedStringThis object owns the buffer and ensures lifetime safety. -
Create a
JSON::PaddedStringViewA lightweight view into the padded buffer. -
Create a
JSON::OndemandParserIf none is provided. -
Create a
JSON::DocumentBound to the view and parser.
The result is a fully lazy, streaming JSON document.
doc = JSON.load_lazy("big.json")
doc.array_each do |record|
puts record["id"]
endThis avoids building a giant Ruby array and keeps memory usage minimal.
You can reuse a parser across multiple files:
parser = JSON::OndemandParser.new
doc1 = JSON.load_lazy("file1.json", parser)
doc2 = JSON.load_lazy("file2.json", parser)This reduces allocations and improves throughput.
All simdjson errors are mapped to Ruby exceptions:
Lookup misses return nil:
doc["missing"] # => nilBut strict methods raise:
doc.fetch("missing") # => KeyErrorLazy documents can be deserialized directly into Ruby objects:
class User
attr_reader :id, :name
native_ext_deserialize :@id, JSON::Type::Number
native_ext_deserialize :@name, JSON::Type::String
end
doc = JSON.load_lazy("user.json")
user = User.new
doc.into(user)This avoids building intermediate Ruby hashes entirely.
Use it when:
- You’re loading large JSON files
- You want streaming access
- You want minimal memory overhead
- You want to deserialize directly into Ruby objects
- You want simdjson’s full performance without DOM overhead
Malformed JSON raises specific exceptions:
JSON.parse('{"a":1,}') # => JSON::OndemandParserError
JSON.parse('"unterminated') # => JSON::UnclosedStringError
JSON.parse('tru') # => JSON::TAtomError
JSON.parse('"\xC0"') # => JSON::StringError
JSON.parse('{"x":12.3.4}') # => JSON::NumberError
JSON.parse('') # => JSON::EmptyInputErrorInvalid UTF‑8 inside strings:
JSON.parse("\"\xC0\xAF\"")
# => JSON::UTF8ErrorHuge integers:
JSON.parse('{"x":' + '9' * 20000 + '}')
# => JSON::BigIntErrorJSON.dump escapes strings according to the JSON spec:
- Printable ASCII → unchanged
- Quotes and backslashes → escaped
- Control chars →
\b \f \n \r \t - Other C0 controls →
\u00XX - Valid UTF‑8 → preserved
Example:
JSON.dump("\"\bλ😀\n")
# => "\"\\\"\\bλ😀\\n\""The test suite covers:
- Parsing primitives
- Symbolized keys
- Nested structures
- UTF‑8 correctness
- Error conditions
- Escaping rules
- Big integer handling
- Round‑trip stability
Run tests with:
rake test
You need at least a C++20 compatible compiler.
Apache-2.0