Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a REBOL-inspired ?? PEG rule with an alias debug that outputs the current parser position and capture stack state to stdout during PEG matching, useful for debugging PEG patterns.
Changes:
- Added
RULE_DEBUGopcode to the PEG virtual machine - Implemented runtime debug output showing remaining text and current capture stack
- Registered both
??anddebugas special PEG forms - Added comprehensive test coverage including edge cases
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/include/janet.h | Added RULE_DEBUG enum entry to JanetPegOpcode |
| src/core/peg.c | Implemented RULE_DEBUG runtime behavior, compiler spec, special form registration, and unmarshal support |
| test/suite-peg.janet | Added test helper and 6 test cases covering various debug scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/core/peg.c
Outdated
| Reserve r = reserve(b, 2); | ||
| (void) argv; | ||
| emit_1(r, RULE_DEBUG, 0); |
There was a problem hiding this comment.
The spec_debug function has a critical mismatch with the unmarshal code. The reserve is for 2 words and emit_1 emits 2 words (opcode + 1 argument), but the unmarshal code at line 1662-1664 expects RULE_DEBUG to be only 1 word (just the opcode, with 0 argument words). This will cause incorrect bytecode parsing when unmarshaling PEG rules that contain debug statements. The fix should reserve only 1 word and emit just the opcode without any arguments.
| Reserve r = reserve(b, 2); | |
| (void) argv; | |
| emit_1(r, RULE_DEBUG, 0); | |
| Reserve r = reserve(b, 1); | |
| (void) argv; | |
| emit(r, RULE_DEBUG); |
src/core/peg.c
Outdated
|
|
||
| case RULE_DEBUG: { | ||
| char buffer[32] = {0}; | ||
| size_t len = (s->outer_text_end - text + 1); |
There was a problem hiding this comment.
The length calculation includes an extra +1 that is incorrect. When text equals outer_text_end (at the end of input), the calculation gives 1 instead of 0, which would cause memcpy to copy 1 byte beyond the valid text range. The formula should be (s->outer_text_end - text) without the + 1. This is confirmed by the test case "?? end of text" which expects the output to show "[]" (empty) when at the end of text.
| size_t len = (s->outer_text_end - text + 1); | |
| size_t len = (size_t)(s->outer_text_end - text); |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/core/peg.c
Outdated
| memcpy(buffer, text, (len > 31 ? 31 : len)); | ||
| janet_printf("\n?? at [%s]\nstack [%d]:\n", buffer, s->captures->count); | ||
| for (int32_t i = 0; i < s->captures->count; i++) { | ||
| janet_printf(" [%d]: %M\n", i, s->captures->data[i]); |
There was a problem hiding this comment.
The buffer is printed using %s format specifier, which will display non-printable characters (like newlines, tabs, null bytes, control characters) literally. This could produce confusing debug output if the remaining text contains such characters. Consider escaping or sanitizing non-printable characters in the buffer before display, or using a length-limited print that can handle binary data more safely.
| memcpy(buffer, text, (len > 31 ? 31 : len)); | |
| janet_printf("\n?? at [%s]\nstack [%d]:\n", buffer, s->captures->count); | |
| for (int32_t i = 0; i < s->captures->count; i++) { | |
| janet_printf(" [%d]: %M\n", i, s->captures->data[i]); | |
| size_t copy_len = (len > 31 ? 31 : len); | |
| memcpy(buffer, text, copy_len); | |
| /* Sanitize non-printable characters for debug output */ | |
| for (size_t i = 0; i < copy_len; i++) { | |
| if (buffer[i] < ' ' || buffer[i] > '~') { | |
| buffer[i] = '.'; | |
| } | |
| } | |
| janet_printf("\n?? at [%s]\nstack [%d]:\n", buffer, s->captures->count); | |
| for (int32_t i = 0; i < s->captures->count; i++) { | |
| janet_printf(" [%d]: %M\n", i, s->captures->data[i]); |
There was a problem hiding this comment.
I think sanitizing the output using '.' would confuse the issue more than just displaying the string. If the preference is to sanitize the output, I could change it to use "\xHH" for non-printable characters. This could additionally display the other supported escapes for clarity.
There was a problem hiding this comment.
I think this is fine as is for now - a quick hack to "sanitize" the string though is using "%v" and wrapping the string pointer like janet_string(buffer, len), which will print the quoted string with escape characters.
| # Debug and ?? tests. | ||
| (defn test-stdout [name peg input expected-matches expected-stdout] | ||
| (def actual @"") | ||
| (with-dyns [:out actual] | ||
| (test name peg input expected-matches)) | ||
| (assert (deep= (string actual) expected-stdout))) | ||
|
|
||
| (test-stdout "?? long form" | ||
| '(* (debug) "abc") | ||
| "abc" | ||
| @[] | ||
| "\n?? at [abc]\nstack [0]:\n") | ||
|
|
||
| (test-stdout "?? short form" | ||
| '(* (??) "abc") | ||
| "abc" | ||
| @[] | ||
| "\n?? at [abc]\nstack [0]:\n") | ||
|
|
||
| (test-stdout "?? end of text" | ||
| '(* "abc" (??)) | ||
| "abc" | ||
| @[] | ||
| "\n?? at []\nstack [0]:\n") | ||
|
|
||
| (test-stdout "?? between rules" | ||
| '(* "a" (??) "bc") | ||
| "abc" | ||
| @[] | ||
| "\n?? at [bc]\nstack [0]:\n") | ||
|
|
||
| (test-stdout | ||
| "?? stack display, string" | ||
| '(* (<- "a") (??) "bc") | ||
| "abc" | ||
| @["a"] | ||
| (string/format "\n?? at [bc]\nstack [1]:\n [0]: %M\n" "a")) | ||
|
|
||
| (test-stdout | ||
| "?? stack display, multiple types" | ||
| '(* (<- "a") (number :d) (constant true) (constant {}) (constant @[]) (??) "bc") | ||
| "a1bc" | ||
| @["a" 1 true {} @[]] | ||
| (string/format "\n?? at [bc]\nstack [5]:\n [0]: %M\n [1]: %M\n [2]: %M\n [3]: %M\n [4]: %M\n" "a" 1 true {} @[])) | ||
|
|
There was a problem hiding this comment.
Consider adding a marshal/unmarshal test for the new debug rule to ensure it serializes and deserializes correctly, similar to the other PEG rules tested around line 254-268 in this file. For example: (marshpeg '(* (??) "abc"))
There was a problem hiding this comment.
Yup, I did this locally and it nicely catches the issue
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Thanks @0xbec163 , this is a useful utility for debugging PEGs |
This PR adds a REBOL-inspired
??PEG rule with an alias ofdebug.Sample usage and output
Running the following
displays the following to stdout: