Skip to content

feat: #175 structured ast parsing#176

Merged
noahbald merged 17 commits intomainfrom
feat/175-rework-parsing-and-ast
Nov 19, 2025
Merged

feat: #175 structured ast parsing#176
noahbald merged 17 commits intomainfrom
feat/175-rework-parsing-and-ast

Conversation

@noahbald
Copy link
Owner

@noahbald noahbald commented Sep 20, 2025

This PR extends the existing AST functionality to parse and embed known element and attribute values upfront, rather than processing them ad-hoc.

  • Known element names, prefixes, and namespaces are captured in new enum ElementId
  • Known attribute names, prefixes, and namespaces are captured in an enums Attr and AttrId
  • Parsed element attributes are controlled by expected_attribute_groups and expected_attributes for each ElementId

The AST is now provided through concrete types instead of traits. You can still bring your own parser to create a document.

Motivation and Context

Performance

The fun part: parse; dont validate. Previously, treating names and values as strings we can only derive information through expensive operations like querying data from HashMap, regex, parsers, etc, which is wasted time.
Now that we do all parsing upfront, with Element, Attr, etc; the structure, data, and validity of these types are known ahead of time.

All in all, this bring the following changes to performance (with blobs.svg)

  • Default jobs: 5.11x faster (e.g. 10.10ms -> 1.98ms)
  • Parsing: 3.14x slower (e.g. 351.62µs -> 1.11ms)
  • Parsing + Default jobs: 3.38x faster (e.g. 10.45ms -> 3.09ms)

Minification

By using parsed values, each type can implement it's own minification, further improving the efficacy of OXVG compared to similar tools.

Correctness

Now that the document fully embeds type information, the following can now be achieved.

  • By using AttrId/ElementId we can be confident we're handling values of the correct namespace
  • By using parsing instead of regex, we can be confident the data we're handling is of the type we expect

Future work

Future linting and transforming work will be able to take advantage of these changes, as the expected values and capabilities of data is now embedded into the type system.

For example, the linter will be able to check which attributes were successfully parsed, rather than having to individually process all the values in the document.

For an GUI SVG editor, we'll be able to show users a set of allowed elements and attributes they can manipulate the document with.

How Has This Been Tested?

  • Tested against w3c and oxygen correctness
  • Unit tests

Types of changes

Fixes

Features

  • Added ElementId, AttrId, and Attr for storing structured element data
  • Added attribute data types to oxvg_ast::attribute::data::*
  • Added Atom type which can store various string types
  • Deprecating oxvg_collections in favour of oxvg_ast::element::category, oxvg_ast::attribute::group, etc which are assigned to each variant.
  • Moving error types to oxvg_*::error for each crate
  • Macros is_element, is_attribute, get_attribute etc
  • Element no longer hashable, as it had a high likelihood of collision
  • New type of Node, Node::Style, to hold parsed style content of a style element

Breaking changes

Due to changes in parsing

  • Arenas not directly accessible, and are wrapped inside Allocator instead

Due to changes in Attribute and Element structure

  • Replaced traits (Atom, Attr, Attributes, ClassList, Document, Element, Name, Node) with types.
  • Replaced oxvg_ast::implementations with oxvg_ast::parse
  • Removed "style" feature flag. Presentation attrs moved from style to attribute::data::presentation.
  • Visitor and related types are simplified

Due to changes with computed-styles

  • ComputedStyles no longer part of Info, and should be used ad-hoc instead.
  • ComputedStyles behind "selectors" feature flag, as it's functionality would only be non-obviously incomplete otherwise
  • Visitor::prepare uses Context instead of ContextFlags and Info
  • ContextFlags queries moved to Context
  • Visitor::use_style removed in favor of querying the document directly

Due to changes with how jobs use the updated features

  • convert_colors no longer supports SVGO options
  • Numbers can only be rounded to a maximum precision of 5 decimal places
  • convert_shape_to_path now avoids updating local_names referenced by stylesheets, to improve correctness
  • removed option for prefix-generator in prefix_ids

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Blockers/Todo

---
config:
    xyChart:
        width: 622
        height: 181

---
xychart horizontal
    title "Compile time (cargo clean; cargo build --profile release) (regress)"
    x-axis [Before, After]
    y-axis "Time (ms)" 0 --> 158000
    bar [133000,158000]
Loading
---
config:
    xyChart:
        width: 622
        height: 181

---
xychart horizontal
    title "Default Jobs Benchmark (cargo bench) (progress)"
    x-axis ["archlinux-logo-dark-scalable.518881f04ca9.svg", "banner.svg", "blobs-d.svg", "Wikipedia-logo-v2.svg", "Inkscape_About_Screen_Isometric_madness_HdG4la4.svg"]
    y-axis "Time (µs)" 0 -->  300000
    %% Before
    bar [3536, 673, 10160, 52090, 288660]
    %% After
    bar [830, 200, 1978, 12838, 55608]
Loading
---
config:
    xyChart:
        width: 622
        height: 181

---
xychart horizontal
    title "Parsing Benchmark (cargo bench) (regress)"
    x-axis ["archlinux-logo-dark-scalable.518881f04ca9.svg", "banner.svg", "blobs-d.svg", "Wikipedia-logo-v2.svg", "Inkscape_About_Screen_Isometric_madness_HdG4la4.svg"]
    y-axis "Time (µs)" 0 -->  25000
    %% After
    bar [253, 80, 906, 2838, 23386]
    %% Before
    bar [61, 27, 352, 815, 5301]
Loading

@noahbald noahbald self-assigned this Sep 20, 2025
@noahbald noahbald force-pushed the feat/175-rework-parsing-and-ast branch 2 times, most recently from eba2b59 to dbee301 Compare September 25, 2025 08:20
@noahbald noahbald force-pushed the feat/175-rework-parsing-and-ast branch from dbee301 to e225559 Compare October 1, 2025 05:58
@noahbald noahbald force-pushed the feat/175-rework-parsing-and-ast branch 3 times, most recently from e263a81 to b686ba5 Compare October 17, 2025 06:46
@noahbald noahbald force-pushed the feat/175-rework-parsing-and-ast branch 9 times, most recently from 8babe6c to 951bb31 Compare November 5, 2025 09:27
@noahbald noahbald force-pushed the feat/175-rework-parsing-and-ast branch 4 times, most recently from 0aee8eb to 2657e1b Compare November 13, 2025 08:41
@noahbald noahbald force-pushed the feat/175-rework-parsing-and-ast branch from 2657e1b to b990602 Compare November 13, 2025 21:08
@noahbald noahbald force-pushed the feat/175-rework-parsing-and-ast branch from b112e89 to 55c0714 Compare November 16, 2025 09:22
@noahbald noahbald marked this pull request as ready for review November 16, 2025 09:23
@noahbald noahbald force-pushed the feat/175-rework-parsing-and-ast branch from 786b58e to a719ab8 Compare November 16, 2025 11:21
Comment on lines +90 to +92
[patch.crates-io]
web_atoms = { git = "https://github.com/servo/html5ever.git", package = "web_atoms", rev = "a2ca649" }

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be remove on next xml5ever/markup5ever release

else {
let computed_styles = ComputedStyles::default()
.with_all(element, &context.query_has_stylesheet_result)
.map_err(JobsError::ComputedStylesError)?;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a generally benign error and doesn't need to halt jobs. We should update the handling of each job to ok and print warnings for some of the error types.

@@ -446,10 +445,10 @@ fn cleanup_ids() -> anyhow::Result<()> {
<g id="g001">
<circle id="circle001" fill="url(#gradient001)" cx="60" cy="60" r="50"/>
<rect fill="url('#gradient001')" x="0" y="0" width="500" height="100"/>
<tref href="#referencedText"/>
<tref xlink:href="#referencedText"/>
Copy link
Owner Author

@noahbald noahbald Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: href is unknown for tref, and so wouldn't be used as a reference. xlink:href should be used instead

let values = Allocator::new_values();
let mut arena = Allocator::new_arena();
let mut allocator = Allocator::new(&mut arena, &values);
let dom = parse(&xml, &mut allocator).unwrap();
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing API could be simpler without so much boilerplate. Should investigate better approaches

@noahbald noahbald force-pushed the feat/175-rework-parsing-and-ast branch from 278a614 to 9dc8cfc Compare November 18, 2025 12:03
@noahbald noahbald force-pushed the feat/175-rework-parsing-and-ast branch from 9dc8cfc to 04a7c18 Compare November 18, 2025 21:40
@noahbald noahbald merged commit f0d73bc into main Nov 19, 2025
30 of 34 checks passed
@noahbald noahbald deleted the feat/175-rework-parsing-and-ast branch November 23, 2025 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant