Add xml module by waruqi · Pull Request #7025 · xmake-io/xmake

waruqi · 2025-11-14T16:15:32Z

core.base.xml

The core.base.xml module provides a tiny DOM-style XML toolkit that works inside Xmake’s sandbox. It focuses on predictable data structures, JSON-like usability, and optional streaming so you can parse large XML documents without building the entire tree.

Node Structure

XML nodes are plain Lua tables. All constructors (xml.new, xml.text, xml.comment, etc.) return values shaped like:

{
    name     = "element-name" | nil, -- only for element nodes
    kind     = "element" | "text" | "comment" | "cdata" | "doctype" | "document",
    attrs    = { key = value, ... } or nil,
    text     = string or nil,
    children = { child1, child2, ... } or nil,
    prolog   = { comment/doctype nodes before root } or nil
}

Because these are regular tables, mutating them updates the DOM in place and the changes show up automatically when you call xml.encode or xml.savefile.

Quick Start

import("core.base.xml")

local doc = assert(xml.decode([[
<?xml version="1.0"?>
<root id="1">
  <item id="foo">hello</item>
</root>
]]))

local item = assert(xml.find(doc, "//item[@id='foo']"))
item.attrs.lang = "en"             -- mutate attrs directly
item.children = {xml.text("world")} -- replace existing text node
table.insert(doc.children, xml.comment("generated by xmake"))

local pretty = assert(xml.encode(doc, {pretty = true}))
assert(xml.savefile("out.xml", doc, {pretty = true}))

Streaming Example

local found
xml.scan(plist_text, function(node)
    if node.name == "key" and xml.text_of(node) == "NSPrincipalClass" then
        found = node
        return false -- early terminate
    end
end)

xml.scan walks nodes as they are completed; returning false stops the scan immediately. This is ideal for large files (e.g. Info.plist) when you only need a few entries.

Options Summary

Option	Applies to	Description
`trim_text = true`	`xml.decode`, `xml.scan`	Strip leading/trailing spaces inside text nodes. Disabled by default to avoid data loss.
`keep_whitespace_nodes = true`	`xml.decode`, `xml.scan`	Preserve whitespace-only text nodes (by default they are discarded unless `trim_text` produced non-empty content).
`pretty = true` / `indent` / `indentchar`	`xml.encode`, `xml.savefile`	Enable formatting and control indentation.

API Reference

`xml.new(opt)`

Create a custom node. opt may contain name, kind, attrs, children, and text. Usually you call the dedicated helpers below instead of xml.new directly.

Element/Text Helpers

local textnode    = xml.text("hello")
local empty       = xml.empty("br", {class = "line"})
local comment     = xml.comment("generated by xmake")
local cdata_node  = xml.cdata("if (value < 1) {...}")
local doctype     = xml.doctype('plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"')

All helpers return node tables that you can insert into children.

`xml.decode(data, opt)`

Parse an XML string into a node tree. Returns the single root element when there is exactly one element, or all top-level nodes when multiple elements exist. On failure returns nil, err.

Supports:

Comments, CDATA, DOCTYPE (stored in root.prolog when present).
Unquoted attributes such as <item flag=true path=/tmp/file>.
XPath-friendly structure (name, attrs, children).
trim_text and keep_whitespace_nodes options described above.

`xml.encode(node, opt)`

Serialize a node tree back into XML. Set {pretty = true, indent = 2} for multi-line output or pass a custom indentchar.

`xml.loadfile(path, opt)` / `xml.savefile(path, node, opt)`

Convenience wrappers that call io.readfile/io.writefile and reuse the decode/encode options.

`xml.text_of(node)`

Concatenate all direct text children and return the combined string. Useful for quickly reading <string>...</string> values.

`xml.find(node, path)`

XPath-like lookup supporting:

/ child axis, // descendant axis.
Wildcards (*) and node tests (text(), comment(), cdata(), doctype()).
Attribute predicates ([@id='foo'], [@enabled]), text predicates ([text()='value']), positional indexes ([2]).

Returns the first node that matches or nil if nothing is found.

`xml.scan(data, callback, opt)`

Streaming parser. Calls callback(node) for each completed node; returning false stops the scan early. Accepts the same options as xml.decode (trim_text, keep_whitespace_nodes). Nodes produced by xml.scan share the same structure as xml.decode.

Attribute Parsing Notes

Both quoted and unquoted values are supported (a="1 2", b='foo', c=bare).
Attribute names may include colons, dashes, or underscores.
Entity references inside attribute values are decoded (& → &).

Example: Parsing and Updating an Info.plist

import("core.base.xml")

local plist = assert(xml.loadfile("Info.plist"))
local dict = assert(xml.find(plist, "plist/dict"))
local version_key

for i = 1, #dict.children, 2 do
    local key = dict.children[i]
    local value = dict.children[i + 1]
    if key and value and xml.text_of(key) == "CFBundleShortVersionString" then
        version_key = value
        break
    end
end

if version_key then
    version_key.children = {xml.text("2.0")}
    assert(xml.savefile("Info.plist", plist, {pretty = true}))
end

This example demonstrates decoding, querying via DOM traversal, mutating nodes, and writing the file back with pretty formatting.

gemini-code-assist · 2025-11-14T16:15:48Z

Summary of Changes

Hello @waruqi, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new xml module, significantly enhancing the project's capability to interact with XML data. The module offers a complete set of tools for parsing XML strings into a navigable Lua table representation and serializing these structures back into XML, with options for formatted output. It is designed to be robust, supporting various XML constructs, and comes with essential utilities for file operations and tree traversal, all validated by a thorough test suite.

Highlights

XML Module Introduction: A new xml module has been added, providing core functionalities for XML processing within the project.
Parsing and Serialization: The module supports decoding XML strings into a structured Lua table format and encoding these structures back into XML, including pretty-printing options.
Node Type Support: It handles various XML node types such as elements, text, comments, CDATA sections, and DOCTYPE declarations.
Utility Functions: Includes functions for loading/saving XML files, finding nodes by path, and extracting concatenated text from child nodes.
Comprehensive Testing: A dedicated test suite (tests/modules/xml/test.lua) has been added to ensure the correctness and reliability of all XML functionalities.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new XML module, providing functionality for decoding and encoding XML strings and files. The module defines a clear node structure and includes helper functions for common XML operations like finding nodes and extracting text. Unit tests have been added to cover basic decoding, encoding, and special node types. The sandbox wrapper correctly integrates the new module and handles errors by raising exceptions. Overall, the implementation is a good foundation for XML processing.

gemini-code-assist · 2025-11-14T16:17:23Z

xmake/core/base/xml.lua

+            table.insert(children, xml.cdata(value))
+            i = close + 3
+        elseif data:sub(lt + 1, lt + 8):upper() == "!DOCTYPE" then
+            local close = data:find(">", lt + 8)


The string.find function is used here without the plain argument (true). This means the > character in the pattern is treated as a magic character for Lua patterns, which is not the intended behavior. It should be data:find(">", lt + 8, true) to ensure > is matched literally. This issue appears in multiple places within the decode function.

local close = data:find(">", lt + 8, true)

gemini-code-assist · 2025-11-14T16:17:23Z

xmake/core/base/xml.lua

+            end
+            i = close + 2
+        elseif data:sub(lt + 1, lt + 1) == "!" then
+            local close = data:find(">", lt + 2)


Similar to the previous comment, the string.find function is used here without the plain argument (true). This could lead to incorrect parsing if the > character is interpreted as a pattern magic character. It should be data:find(">", lt + 2, true) to match > literally.

local close = data:find(">", lt + 2, true)

gemini-code-assist · 2025-11-14T16:17:23Z

xmake/core/base/xml.lua

+            end
+            i = close + 1
+        elseif data:sub(lt + 1, lt + 1) == "/" then
+            local close = data:find(">", lt + 1)


The string.find function is used here without the plain argument (true). This means the > character in the pattern is treated as a magic character for Lua patterns, which is not the intended behavior. It should be data:find(">", lt + 1, true) to ensure > is matched literally.

local close = data:find(">", lt + 1, true)

gemini-code-assist · 2025-11-14T16:17:24Z

xmake/core/base/xml.lua

+            end
+            i = close + 1
+        else
+            local close = data:find(">", lt + 1)


The string.find function is used here without the plain argument (true). This means the > character in the pattern is treated as a magic character for Lua patterns, which is not the intended behavior. It should be data:find(">", lt + 1, true) to ensure > is matched literally.

local close = data:find(">", lt + 1, true)

gemini-code-assist · 2025-11-14T16:17:24Z

xmake/core/base/xml.lua

+function xml._parse_attrs(attrstr)
+    local attrs
+    attrstr:gsub("([%w_:%-%.]+)%s*=%s*([\"'])(.-)%2", function(key, quote, value)
+        attrs = attrs or {}
+        attrs[key] = xml._decode_entities(value)
+    end)


The regular expression for parsing attributes expects values to be enclosed in either double or single quotes. However, XML attribute values can also be unquoted if they do not contain spaces or special characters. This current implementation might fail to parse valid XML where attributes are unquoted, for example, <element attr=value>. Consider expanding the regex to support unquoted attribute values for broader XML compatibility.

gemini-code-assist · 2025-11-14T16:17:24Z

xmake/core/base/xml.lua

+    if opt.trim_text ~= false then
+        text = text:gsub("^%s+", ""):gsub("%s+$", "")


The trim_text option defaults to true (because opt.trim_text ~= false evaluates to true if opt.trim_text is nil). While trimming whitespace is often desired, it can lead to data loss if significant whitespace needs to be preserved, such as in xml:space="preserve" contexts. It would be more explicit and safer to make trim_text false by default and require users to opt-in for trimming, or provide a clear option to disable it when necessary.

waruqi added this to the v3.0.5 milestone Nov 14, 2025

gemini-code-assist bot reviewed Nov 14, 2025

View reviewed changes

waruqi added 13 commits November 15, 2025 00:19

add more tests

e8aa6bd

add new apis

dc6a23d

add xml module

29fbfde

improve find in xml

7c7c82f

improve xml module

d6b3bc5

add more comments

d1dc6ff

fix test

3d6be9e

add xpath support

cda90e4

update tests

0a21a3d

update comment

48159a7

use string apis

9c8f2fe

improve to parse attrs

88b0fb7

update xml api

b250c84

waruqi merged commit d5e9f0e into dev Nov 15, 2025
44 checks passed

waruqi deleted the xml branch November 15, 2025 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add xml module#7025

Add xml module#7025
waruqi merged 13 commits intodevfrom
xml

waruqi commented Nov 14, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Nov 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 14, 2025

Uh oh!

gemini-code-assist bot Nov 14, 2025

Uh oh!

gemini-code-assist bot Nov 14, 2025

Uh oh!

gemini-code-assist bot Nov 14, 2025

Uh oh!

gemini-code-assist bot Nov 14, 2025

Uh oh!

gemini-code-assist bot Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if opt.trim_text ~= false then
		text = text:gsub("^%s+", ""):gsub("%s+$", "")

Uh oh!

Conversation

waruqi commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

core.base.xml

Node Structure

Quick Start

Streaming Example

Options Summary

API Reference

xml.new(opt)

Element/Text Helpers

xml.decode(data, opt)

xml.encode(node, opt)

xml.loadfile(path, opt) / xml.savefile(path, node, opt)

xml.text_of(node)

xml.find(node, path)

xml.scan(data, callback, opt)

Attribute Parsing Notes

Example: Parsing and Updating an Info.plist

Uh oh!

gemini-code-assist bot commented Nov 14, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

waruqi commented Nov 14, 2025 •

edited

Loading

`xml.new(opt)`

`xml.decode(data, opt)`

`xml.encode(node, opt)`

`xml.loadfile(path, opt)` / `xml.savefile(path, node, opt)`

`xml.text_of(node)`

`xml.find(node, path)`

`xml.scan(data, callback, opt)`