Skip to content

Add ??/debug PEG rule#1710

Merged
bakpakin merged 6 commits intojanet-lang:masterfrom
0xbec163:master
Feb 5, 2026
Merged

Add ??/debug PEG rule#1710
bakpakin merged 6 commits intojanet-lang:masterfrom
0xbec163:master

Conversation

@0xbec163
Copy link
Contributor

@0xbec163 0xbec163 commented Feb 4, 2026

This PR adds a REBOL-inspired ?? PEG rule with an alias of debug.

Sample usage and output

Running the following

(peg/match '(* (<- "a") (??) "bc") "abc")

displays the following to stdout:

??: at [bc]
stack [1]:
  [0]: "a"

@0xbec163 0xbec163 changed the title Add debug/?? PEG rule Add ??/debug PEG rule Feb 4, 2026
@bakpakin bakpakin requested a review from Copilot February 4, 2026 19:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a REBOL-inspired ?? PEG rule with an alias debug that outputs the current parser position and capture stack state to stdout during PEG matching, useful for debugging PEG patterns.

Changes:

  • Added RULE_DEBUG opcode to the PEG virtual machine
  • Implemented runtime debug output showing remaining text and current capture stack
  • Registered both ?? and debug as special PEG forms
  • Added comprehensive test coverage including edge cases

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/include/janet.h Added RULE_DEBUG enum entry to JanetPegOpcode
src/core/peg.c Implemented RULE_DEBUG runtime behavior, compiler spec, special form registration, and unmarshal support
test/suite-peg.janet Added test helper and 6 test cases covering various debug scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/core/peg.c Outdated
Comment on lines +1261 to +1263
Reserve r = reserve(b, 2);
(void) argv;
emit_1(r, RULE_DEBUG, 0);
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec_debug function has a critical mismatch with the unmarshal code. The reserve is for 2 words and emit_1 emits 2 words (opcode + 1 argument), but the unmarshal code at line 1662-1664 expects RULE_DEBUG to be only 1 word (just the opcode, with 0 argument words). This will cause incorrect bytecode parsing when unmarshaling PEG rules that contain debug statements. The fix should reserve only 1 word and emit just the opcode without any arguments.

Suggested change
Reserve r = reserve(b, 2);
(void) argv;
emit_1(r, RULE_DEBUG, 0);
Reserve r = reserve(b, 1);
(void) argv;
emit(r, RULE_DEBUG);

Copilot uses AI. Check for mistakes.
src/core/peg.c Outdated

case RULE_DEBUG: {
char buffer[32] = {0};
size_t len = (s->outer_text_end - text + 1);
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length calculation includes an extra +1 that is incorrect. When text equals outer_text_end (at the end of input), the calculation gives 1 instead of 0, which would cause memcpy to copy 1 byte beyond the valid text range. The formula should be (s->outer_text_end - text) without the + 1. This is confirmed by the test case "?? end of text" which expects the output to show "[]" (empty) when at the end of text.

Suggested change
size_t len = (s->outer_text_end - text + 1);
size_t len = (size_t)(s->outer_text_end - text);

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/core/peg.c Outdated
Comment on lines +200 to +203
memcpy(buffer, text, (len > 31 ? 31 : len));
janet_printf("\n?? at [%s]\nstack [%d]:\n", buffer, s->captures->count);
for (int32_t i = 0; i < s->captures->count; i++) {
janet_printf(" [%d]: %M\n", i, s->captures->data[i]);
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer is printed using %s format specifier, which will display non-printable characters (like newlines, tabs, null bytes, control characters) literally. This could produce confusing debug output if the remaining text contains such characters. Consider escaping or sanitizing non-printable characters in the buffer before display, or using a length-limited print that can handle binary data more safely.

Suggested change
memcpy(buffer, text, (len > 31 ? 31 : len));
janet_printf("\n?? at [%s]\nstack [%d]:\n", buffer, s->captures->count);
for (int32_t i = 0; i < s->captures->count; i++) {
janet_printf(" [%d]: %M\n", i, s->captures->data[i]);
size_t copy_len = (len > 31 ? 31 : len);
memcpy(buffer, text, copy_len);
/* Sanitize non-printable characters for debug output */
for (size_t i = 0; i < copy_len; i++) {
if (buffer[i] < ' ' || buffer[i] > '~') {
buffer[i] = '.';
}
}
janet_printf("\n?? at [%s]\nstack [%d]:\n", buffer, s->captures->count);
for (int32_t i = 0; i < s->captures->count; i++) {
janet_printf(" [%d]: %M\n", i, s->captures->data[i]);

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sanitizing the output using '.' would confuse the issue more than just displaying the string. If the preference is to sanitize the output, I could change it to use "\xHH" for non-printable characters. This could additionally display the other supported escapes for clarity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as is for now - a quick hack to "sanitize" the string though is using "%v" and wrapping the string pointer like janet_string(buffer, len), which will print the quoted string with escape characters.

Comment on lines +848 to +892
# Debug and ?? tests.
(defn test-stdout [name peg input expected-matches expected-stdout]
(def actual @"")
(with-dyns [:out actual]
(test name peg input expected-matches))
(assert (deep= (string actual) expected-stdout)))

(test-stdout "?? long form"
'(* (debug) "abc")
"abc"
@[]
"\n?? at [abc]\nstack [0]:\n")

(test-stdout "?? short form"
'(* (??) "abc")
"abc"
@[]
"\n?? at [abc]\nstack [0]:\n")

(test-stdout "?? end of text"
'(* "abc" (??))
"abc"
@[]
"\n?? at []\nstack [0]:\n")

(test-stdout "?? between rules"
'(* "a" (??) "bc")
"abc"
@[]
"\n?? at [bc]\nstack [0]:\n")

(test-stdout
"?? stack display, string"
'(* (<- "a") (??) "bc")
"abc"
@["a"]
(string/format "\n?? at [bc]\nstack [1]:\n [0]: %M\n" "a"))

(test-stdout
"?? stack display, multiple types"
'(* (<- "a") (number :d) (constant true) (constant {}) (constant @[]) (??) "bc")
"a1bc"
@["a" 1 true {} @[]]
(string/format "\n?? at [bc]\nstack [5]:\n [0]: %M\n [1]: %M\n [2]: %M\n [3]: %M\n [4]: %M\n" "a" 1 true {} @[]))

Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a marshal/unmarshal test for the new debug rule to ensure it serializes and deserializes correctly, similar to the other PEG rules tested around line 254-268 in this file. For example: (marshpeg '(* (??) "abc"))

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I did this locally and it nicely catches the issue

0xbec163 and others added 2 commits February 4, 2026 15:55
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@bakpakin bakpakin merged commit 416bba9 into janet-lang:master Feb 5, 2026
13 checks passed
@bakpakin
Copy link
Member

bakpakin commented Feb 5, 2026

Thanks @0xbec163 , this is a useful utility for debugging PEGs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants