Expose byte offsets on XContentParser for zero-copy sub-structure extraction

## Summary

`XContentParser.getTokenLocation()` returns an `XContentLocation(lineNumber, columnNumber)`. The underlying Jackson `JsonParser` provides byte offsets via `JsonLocation.getByteOffset()`, but `JsonXContentParser` discards this information when wrapping the Jackson location. Exposing byte offsets would enable zero-copy extraction of sub-structures (objects/arrays) from the source byte array.

## Motivation

The `JSON_EXTRACT` ES|QL function (#142375) extracts values from JSON strings. When the extracted value is an object or array, it currently uses `XContentBuilder.copyCurrentStructure(parser)` to serialize the sub-structure back to JSON — walking every token and rebuilding the string from scratch, even though the original bytes are already valid JSON in the source array.

With byte offsets, extraction becomes a direct array slice:

```java
long start = parser.getTokenLocation().byteOffset();
parser.skipChildren();
long end = parser.getCurrentLocation().byteOffset();
builder.appendBytesRef(new BytesRef(bytes, offset + (int) start, (int) (end - start)));
```

This eliminates all intermediate parsing, string allocation, and JSON escaping for the sub-structure.

## Proposed API Change

**`XContentLocation`** — add `byteOffset` with a backward-compatible constructor:

```java
public record XContentLocation(int lineNumber, int columnNumber, long byteOffset) {
    public XContentLocation(int lineNumber, int columnNumber) {
        this(lineNumber, columnNumber, -1L);
    }
}
```

**`XContentParser`** — add `getCurrentLocation()` (the other method already exists):

```java
XContentLocation getTokenLocation();    // already exists — starts populating byteOffset
XContentLocation getCurrentLocation();  // new — position just past the last consumed byte
```

## Implementation Impact

The XContentParser hierarchy has 19 implementations. The change is concentrated in one place.

**Leaf implementations (5):**

| Class | Change needed | Notes |
|-------|--------------|-------|
| `JsonXContentParser` | Pass through `JsonLocation.getByteOffset()` instead of discarding it. Add `getCurrentLocation()` delegating to Jackson. | ~10 lines changed |
| `SmileXContentParser` | None — inherits from `JsonXContentParser` | |
| `CborXContentParser` | None — inherits from `JsonXContentParser` | |
| `YamlXContentParser` | None — inherits from `JsonXContentParser`. Jackson's YAML parser returns `-1` for byte offsets (only char offsets available). | |
| `MapXContentParser` | Return `-1` byte offset (no byte stream) | Trivial |

**Decorators (13):** All transparent — delegate through `FilterXContentParser.delegate()`. Zero changes needed for 11 of 13. The two with overrides:

| Class | Notes |
|-------|-------|
| `DotExpandingXContentParser` | Returns saved location for synthesized tokens — would carry `-1` byte offset for synthetic tokens, real offsets for original content |
| `CompletionFieldMapper.MultiFieldParser` | Returns fixed location — would carry `-1` byte offset |

**Test-only (1):** `ParameterizableYamlXContentParser` — delegates, transparent.

## Byte Slicing Feasibility by Format

Not all content types support raw byte slicing even with offsets available:

| Format | Byte offsets? | Slicing safe? | Why |
|--------|--------------|---------------|-----|
| JSON | Yes | **Yes** | Sliced sub-structure is valid standalone JSON |
| CBOR | Yes | **Yes** | Self-contained data items, no back-references |
| SMILE | Yes | **No** | Back-references for repeated field names/strings — sliced fragment may contain unresolvable references |
| YAML | No (`-1`) | **No** | Whitespace-sensitive grammar, anchor/alias system |

Consumers must check the content type before slicing. JSON and CBOR are safe; SMILE and YAML require the `XContentBuilder.copyCurrentStructure` fallback.

Class	Change needed	Notes
`JsonXContentParser`	Pass through `JsonLocation.getByteOffset()` instead of discarding it. Add `getCurrentLocation()` delegating to Jackson.	~10 lines changed
`SmileXContentParser`	None — inherits from `JsonXContentParser`
`CborXContentParser`	None — inherits from `JsonXContentParser`
`YamlXContentParser`	None — inherits from `JsonXContentParser`. Jackson's YAML parser returns `-1` for byte offsets (only char offsets available).
`MapXContentParser`	Return `-1` byte offset (no byte stream)	Trivial

Class	Notes
`DotExpandingXContentParser`	Returns saved location for synthesized tokens — would carry `-1` byte offset for synthetic tokens, real offsets for original content
`CompletionFieldMapper.MultiFieldParser`	Returns fixed location — would carry `-1` byte offset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose byte offsets on XContentParser for zero-copy sub-structure extraction #142873

Summary

Motivation

Proposed API Change

Implementation Impact

Byte Slicing Feasibility by Format

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Format	Byte offsets?	Slicing safe?	Why
JSON	Yes	Yes	Sliced sub-structure is valid standalone JSON
CBOR	Yes	Yes	Self-contained data items, no back-references
SMILE	Yes	No	Back-references for repeated field names/strings — sliced fragment may contain unresolvable references
YAML	No (`-1`)	No	Whitespace-sensitive grammar, anchor/alias system

Expose byte offsets on XContentParser for zero-copy sub-structure extraction #142873

Description

Summary

Motivation

Proposed API Change

Implementation Impact

Byte Slicing Feasibility by Format

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions