Added an implementation independent test suite #1

Merged
emersion merged 1 commit from heuer/scfg:master into master 2026-02-01 00:13:11 +01:00
Contributor

An attempt to create a test suite for scfg to test scfg deserializers independently and to have a common base to interpret scfg.

The test suite was created for a not yet Open Source project and is not tested against other implementations.

An attempt to create a test suite for scfg to test scfg deserializers independently and to have a common base to interpret scfg. The test suite was created for a not yet Open Source project and is not tested against other implementations.
@ -0,0 +25,4 @@
* All whitespace is normalized to a single space character.
* All names and parameters must be enclosed in double quotes.
* All unnecessary escapes are removed from names and parameters.
* All directives of a block must be indented by 4 spaces per level of nesting.
First-time contributor

While it could be inferred by looking at the expected output (or at some failing tests), it might be a good idea to also note how to append the final newlines. In the case of your expected output, you always append a newline at the end of a directive (or the closing brace of its block). At the end of the most outer block, you add another newline.

While it could be inferred by looking at the expected output (or at some failing tests), it might be a good idea to also note how to append the final newlines. In the case of your expected output, you always append a newline at the end of a directive (or the closing brace of its block). At the end of the most outer block, you add another newline.
Author
Contributor

Thanks for the feedback.

I tried to address your points by 6787555b6a584ed3f2bc49c93917b2d9391e6c72
Let me know if the rules need more polishing.

Thanks for the feedback. I tried to address your points by 6787555b6a584ed3f2bc49c93917b2d9391e6c72 Let me know if the rules need more polishing.
emersion marked this conversation as resolved
@ -0,0 +1,3 @@
directive { # comment
First-time contributor

As per the grammar, comments can only appear at the start of a line. Or is there some semi-authoritative example that suggests something else?
For example, the Go parser also only accepts comments at the beginning of a line:

case quote == 0 && len(words) == 0 && sb.Len() == 0 && ch == '#':


The same goes for my Java parser.

As per the grammar, comments can only appear at the start of a line. Or is there some semi-authoritative example that suggests something else? For example, the Go parser also only accepts comments at the beginning of a line: https://codeberg.org/emersion/go-scfg/src/commit/3051521019c3e412c2711006eb833ca7304dbf66/reader.go#L130 The same goes for my Java parser.
Author
Contributor

Thanks, you're right. I'll move the test to the "invalid" directory :)

Thanks, you're right. I'll move the test to the "invalid" directory :)
emersion marked this conversation as resolved
@ -0,0 +1 @@
test { me }
First-time contributor

According to the grammar, this is actually a valid directive with the name "test" and the parameters "{", "me" and "}". I can't tell if this is intended though.

According to the grammar, this is actually a valid directive with the name "test" and the parameters "{", "me" and "}". I can't tell if this is intended though.
Author
Contributor

The test case was meant to include two errors:

  • missing newline after {
  • missing newline before }

How can the braces be interpreted as param? A directive-param is a "word" which is either an "atom", a "dquote-word", or a "squote-word". "dquote-word" and "squote-word" cannot be considered because of the missing quote characters.

What remains is "atom", which is either an ACHAR or an esc-pair. ACHAR excludes "{" and "}"

test \{ me \}

would be a directive named "test" with three params: "{", "me", "}"

The test case was meant to include two errors: * missing newline after { * missing newline before } How can the braces be interpreted as param? A directive-param is a "word" which is either an "atom", a "dquote-word", or a "squote-word". "dquote-word" and "squote-word" cannot be considered because of the missing quote characters. What remains is "atom", which is either an ACHAR or an esc-pair. ACHAR excludes "{" and "}" `test \{ me \}` would be a directive named "test" with three params: "{", "me", "}"
First-time contributor

Indeed, you're right. Not sure what I've read there... 😅 Sorry!

Indeed, you're right. Not sure what I've read there... 😅 Sorry!
emersion marked this conversation as resolved
Owner

Thanks for the patch, this definitely sounds like a good idea!

I've been wondering whether we should use a different format for the "expected" files, e.g. JSON. This would avoid having to encode scfg in a specific way (some encoder implementations may not always use double quotes), while still being easily readable from any implementation test suite. Or was the idea to test encoders as well?

Thanks for the patch, this definitely sounds like a good idea! I've been wondering whether we should use a different format for the "expected" files, e.g. JSON. This would avoid having to encode scfg in a specific way (some encoder implementations may not always use double quotes), while still being easily readable from any implementation test suite. Or was the idea to test encoders as well?
Author
Contributor

Thanks for the feedback. I'm open to suggestions regarding the output format. I also considered defining instructions to output JSON, but ultimately rejected the idea because I felt it would add another artificial hurdle to the implementation.

Even though the output is a valid scfg document, it's not intended as a best practice suggestion. The only requirement was a standard format that's easy to implement and compare (perhaps just for the test cases).

Furthermore, it's easier for implementers to write scfg than JSON.

When I think of a test case, I first write the scfg to be tested, then the expected output (with the transformation rules applied), and then run the implementation through the test suite to see if it delivers the expected result.

Thanks for the feedback. I'm open to suggestions regarding the output format. I also considered defining instructions to output JSON, but ultimately rejected the idea because I felt it would add another artificial hurdle to the implementation. Even though the output is a valid scfg document, it's not intended as a best practice suggestion. The only requirement was a standard format that's easy to implement and compare (perhaps just for the test cases). Furthermore, it's easier for implementers to write scfg than JSON. When I think of a test case, I first write the scfg to be tested, then the expected output (with the transformation rules applied), and then run the implementation through the test suite to see if it delivers the expected result.
emersion referenced this pull request from a commit 2026-01-22 11:49:52 +01:00
emersion left a comment
Owner

I've added support for this in libscfg: emersion/libscfg#1

This worked pretty well overall. comment_after_brace doesn't pass there (parses fine but should not), but it seems like a bug in the implementation.

I've added support for this in libscfg: https://codeberg.org/emersion/libscfg/pulls/1 This worked pretty well overall. `comment_after_brace` doesn't pass there (parses fine but should not), but it seems like a bug in the implementation.
@ -0,0 +23,4 @@
* All comments are removed.
* Extraneous whitespace (spaces, tabs, newlines) is removed
* All whitespace is normalized to a single space character.
* All names and parameters must be enclosed in double quotes.
Owner

In libscfg, because of this rule I've had to add a special "test mode" flag to the formatter. I generally dislike adding flags changing the behavior of an implementation under tests.

An alternative would be to define a canonical scfg format which uses double quotes only if a parameter contains special characters. But this feels a bit arbitrary, and the rules can grow quite complex if we want to reduce the clutter. For instance, one could decide to use single quotes if a parameter's special characters are only double quotes, or even count the number of special characters to reduce the number of escapes.

The current approach (unconditional double quotes) seems simplest.

In libscfg, because of this rule I've had to add a special "test mode" flag to the formatter. I generally dislike adding flags changing the behavior of an implementation under tests. An alternative would be to define a canonical scfg format which uses double quotes only if a parameter contains special characters. But this feels a bit arbitrary, and the rules can grow quite complex if we want to reduce the clutter. For instance, one could decide to use single quotes if a parameter's special characters are only double quotes, or even count the number of special characters to reduce the number of escapes. The current approach (unconditional double quotes) seems simplest.
Author
Contributor

My current implementation of scfg doesn't support serialization at all. The output is only implemented in my tests. As I said, I'm open to better suggestions (JSON, XML, …) for a canonical format; this was just the simplest one I could think of.

My current implementation of scfg doesn't support serialization at all. The output is only implemented in my tests. As I said, I'm open to better suggestions (JSON, XML, …) for a canonical format; this was just the simplest one I could think of.
Owner

Yeah, that makes sense. If I had to use JSON from libscfg, I would probably copy/paste the few lines of code required to encode JSON strings and hand-roll the encoder, resulting in code pretty similar to a scfg encoder. For languages with built-in JSON support, would be easier though.

Let's go with this approach - we can always convert the files if we decide we should switch to something else in the future.

Yeah, that makes sense. If I had to use JSON from libscfg, I would probably copy/paste the few lines of code required to encode JSON strings and hand-roll the encoder, resulting in code pretty similar to a scfg encoder. For languages with built-in JSON support, would be easier though. Let's go with this approach - we can always convert the files if we decide we should switch to something else in the future.
emersion marked this conversation as resolved
@ -0,0 +28,4 @@
* Opening braces `{` remain on the same line as their associated directive.
* Closing braces `}` are on their own line, indented to match the indentation
level of their opening directive
* All directives of a block must be indented by 4 spaces per level of nesting.
Owner

I don't want to rehash the decades-old argument about spaces vs tabs, but I personally use tabs everywhere. This repo's README also uses tabs.

I don't want to rehash the decades-old argument about spaces vs tabs, but I personally use tabs everywhere. This repo's README also uses tabs.
Author
Contributor

No problem, although I prefer spaces, I will change the indentation level to one tab.

No problem, although I prefer spaces, I will change the indentation level to one tab.
emersion marked this conversation as resolved
@ -0,0 +29,4 @@
* Closing braces `}` are on their own line, indented to match the indentation
level of their opening directive
* All directives of a block must be indented by 4 spaces per level of nesting.
* The output must end with a single trailing newline character (`\n`)
Owner

Test files in this directory end with two final newline characters: one for the final directive, and one extra. I'm not entirely following the intent here: why the extra newline? All directives and closing braces are followed by a newline, why add another one?

Test files in this directory end with two final newline characters: one for the final directive, and one extra. I'm not entirely following the intent here: why the extra newline? All directives and closing braces are followed by a newline, why add another one?
Author
Contributor

To be honest: The final line break arose purely out of laziness, without any technical reason. When I first had the idea to implement a general test suite for scfg, the initial test cases had a final but unnecessary line break.

I then stuck with it for all subsequent test cases.

I will correct this.

To be honest: The final line break arose purely out of laziness, without any technical reason. When I first had the idea to implement a general test suite for scfg, the initial test cases had a final but unnecessary line break. I then stuck with it for all subsequent test cases. I will correct this.
emersion marked this conversation as resolved
emersion referenced this pull request from a commit 2026-01-25 15:27:16 +01:00
emersion left a comment
Owner

LGTM, thanks!

Two more minor details:

  • Can you squash all of your commits together?
  • Can we drop the intermediate "scfg" directory under "tests"? The repository is already named "scfg" and there will never be tests intended for something else than scfg.
LGTM, thanks! Two more minor details: - Can you squash all of your commits together? - Can we drop the intermediate "scfg" directory under "tests"? The repository is already named "scfg" and there will never be tests intended for something else than scfg.
@ -0,0 +1 @@
"escape\t\r\n\nme"
First-time contributor

Maybe I missed something, but shouldn't this be escapetrnnme?

Maybe I missed something, but shouldn't this be `escapetrnnme`?
Author
Contributor

If you follow the grammar exactly, your result is correct. However, I thought the grammar meant that control characters should be encoded this way. What does libscfg say about the test?

If you follow the grammar exactly, your result is correct. However, I thought the grammar **meant** that control characters should be encoded this way. What does libscfg say about the test?
Owner

I agree the grammar is not enough to explain semantics. We should add prose in plain English to describe semantics.

The original intent of the spec is that \ escapes any following character, just like POSIX shell does. That is, \ is always dropped from the output.

libscfg test result:

scfg_format_file() mismatch, got:

"escapetrnnme"

want:

"escape\t\r\n\nme"

I agree the grammar is not enough to explain semantics. We should add prose in plain English to describe semantics. The original intent of the spec is that `\` escapes any following character, just like POSIX shell does. That is, `\` is always dropped from the output. libscfg test result: > scfg_format_file() mismatch, got: > ``` > "escapetrnnme" > > ``` > want: > ``` > "escape\t\r\n\nme" > > ```
Author
Contributor

Thanks for the feedback, I'll update the test case (and fix my implementation ;)).

To encode control characters like \t and \n someone has to use \\t \\n etc., right?

Thanks for the feedback, I'll update the test case (and fix my implementation ;)). To encode control characters like `\t` and `\n` someone has to use `\\t` `\\n` etc., right?
Owner

One could use single quotes to not interpret backslash: '\n' is equivalent to "\\n".

(This is a bit weird because it's halfway between POSIX shell and… something else: POSIX shell has \n interpretation for double quotes, but scfg doesn't.)

One could use single quotes to not interpret backslash: `'\n'` is equivalent to `"\\n"`. (This is a bit weird because it's halfway between POSIX shell and… something else: POSIX shell has `\n` interpretation for double quotes, but scfg doesn't.)
Author
Contributor

Thanks. It's fixed in 9082787b6e and I added two more test (sq_backslash and sq_backslash2)

Thanks. It's fixed in 9082787b6e and I added two more test (sq_backslash and sq_backslash2)
heuer marked this conversation as resolved
Author
Contributor

@emersion wrote in #1 (comment):

* Can you squash all of your commits together?
* Can we drop the intermediate "scfg" directory under "tests"? The repository is already named "scfg" and there will never be tests intended for something else than scfg.

Fixed.

If the file "escape_seq.scfg" needs to be corrected (see Hannes' comment), I can do that at short notice.

@emersion wrote in https://codeberg.org/emersion/scfg/pulls/1#issuecomment-10150569: > * Can you squash all of your commits together? > * Can we drop the intermediate "scfg" directory under "tests"? The repository is already named "scfg" and there will never be tests intended for something else than scfg. Fixed. If the file "escape_seq.scfg" needs to be corrected (see Hannes' comment), I can do that at short notice.
emersion merged commit 9082787b6e into master 2026-02-01 00:13:11 +01:00
Owner

Thank you!

Thank you!
emersion referenced this pull request from a commit 2026-02-01 18:22:16 +01:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
emersion/scfg!1
No description provided.