Added an implementation independent test suite #1

heuer · 2026-01-18T22:07:56+01:00

heuer commented

2026-01-18 22:07:56 +01:00

An attempt to create a test suite for scfg to test scfg deserializers independently and to have a common base to interpret scfg.

The test suite was created for a not yet Open Source project and is not tested against other implementations.

An attempt to create a test suite for scfg to test scfg deserializers independently and to have a common base to interpret scfg. The test suite was created for a not yet Open Source project and is not tested against other implementations.

hannesbraun reviewed

2026-01-19 18:49:59 +01:00

tests/scfg/README.md Outdated

					
				@ -0,0 +25,4 @@

				* All whitespace is normalized to a single space character.

				* All names and parameters must be enclosed in double quotes.

				* All unnecessary escapes are removed from names and parameters.

				* All directives of a block must be indented by 4 spaces per level of nesting.

While it could be inferred by looking at the expected output (or at some failing tests), it might be a good idea to also note how to append the final newlines. In the case of your expected output, you always append a newline at the end of a directive (or the closing brace of its block). At the end of the most outer block, you add another newline.

Thanks for the feedback.

I tried to address your points by 6787555b6a584ed3f2bc49c93917b2d9391e6c72
Let me know if the rules need more polishing.

Thanks for the feedback. I tried to address your points by 6787555b6a584ed3f2bc49c93917b2d9391e6c72 Let me know if the rules need more polishing.

emersion marked this conversation as resolved

tests/scfg/valid/comment_after_brace.scfg Outdated

					
				@ -0,0 +1,3 @@

				directive { # comment

As per the grammar, comments can only appear at the start of a line. Or is there some semi-authoritative example that suggests something else?
For example, the Go parser also only accepts comments at the beginning of a line:

case quote == 0 && len(words) == 0 && sb.Len() == 0 && ch == '#':

The same goes for my Java parser.

As per the grammar, comments can only appear at the start of a line. Or is there some semi-authoritative example that suggests something else? For example, the Go parser also only accepts comments at the beginning of a line: https://codeberg.org/emersion/go-scfg/src/commit/3051521019c3e412c2711006eb833ca7304dbf66/reader.go#L130 The same goes for my Java parser.

Thanks, you're right. I'll move the test to the "invalid" directory :)

emersion marked this conversation as resolved

hannesbraun reviewed

2026-01-19 19:19:28 +01:00

tests/scfg/invalid/block_without_newline.scfg Outdated

					
				@ -0,0 +1 @@

				test { me }

According to the grammar, this is actually a valid directive with the name "test" and the parameters "{", "me" and "}". I can't tell if this is intended though.

The test case was meant to include two errors:

missing newline after {
missing newline before }

How can the braces be interpreted as param? A directive-param is a "word" which is either an "atom", a "dquote-word", or a "squote-word". "dquote-word" and "squote-word" cannot be considered because of the missing quote characters.

What remains is "atom", which is either an ACHAR or an esc-pair. ACHAR excludes "{" and "}"

test \{ me \}

would be a directive named "test" with three params: "{", "me", "}"

The test case was meant to include two errors: * missing newline after { * missing newline before } How can the braces be interpreted as param? A directive-param is a "word" which is either an "atom", a "dquote-word", or a "squote-word". "dquote-word" and "squote-word" cannot be considered because of the missing quote characters. What remains is "atom", which is either an ACHAR or an esc-pair. ACHAR excludes "{" and "}" `test \{ me \}` would be a directive named "test" with three params: "{", "me", "}"

Indeed, you're right. Not sure what I've read there... 😅 Sorry!

emersion marked this conversation as resolved

emersion commented

2026-01-20 21:25:25 +01:00

Thanks for the patch, this definitely sounds like a good idea!

I've been wondering whether we should use a different format for the "expected" files, e.g. JSON. This would avoid having to encode scfg in a specific way (some encoder implementations may not always use double quotes), while still being easily readable from any implementation test suite. Or was the idea to test encoders as well?

Thanks for the patch, this definitely sounds like a good idea! I've been wondering whether we should use a different format for the "expected" files, e.g. JSON. This would avoid having to encode scfg in a specific way (some encoder implementations may not always use double quotes), while still being easily readable from any implementation test suite. Or was the idea to test encoders as well?

heuer commented

2026-01-20 21:59:15 +01:00

Thanks for the feedback. I'm open to suggestions regarding the output format. I also considered defining instructions to output JSON, but ultimately rejected the idea because I felt it would add another artificial hurdle to the implementation.

Even though the output is a valid scfg document, it's not intended as a best practice suggestion. The only requirement was a standard format that's easy to implement and compare (perhaps just for the test cases).

Furthermore, it's easier for implementers to write scfg than JSON.

When I think of a test case, I first write the scfg to be tested, then the expected output (with the transformation rules applied), and then run the implementation through the test suite to see if it delivers the expected result.

Thanks for the feedback. I'm open to suggestions regarding the output format. I also considered defining instructions to output JSON, but ultimately rejected the idea because I felt it would add another artificial hurdle to the implementation. Even though the output is a valid scfg document, it's not intended as a best practice suggestion. The only requirement was a standard format that's easy to implement and compare (perhaps just for the test cases). Furthermore, it's easier for implementers to write scfg than JSON. When I think of a test case, I first write the scfg to be tested, then the expected output (with the transformation rules applied), and then run the implementation through the test suite to see if it delivers the expected result.

emersion referenced this pull request from a commit

2026-01-22 11:49:52 +01:00

Add scfg spec tests

emersion referenced this pull request from emersion/libscfg

2026-01-22 11:49:59 +01:00

WIP: Add scfg spec tests #1

emersion reviewed

2026-01-22 12:04:02 +01:00

emersion left a comment

I've added support for this in libscfg: emersion/libscfg#1

This worked pretty well overall. comment_after_brace doesn't pass there (parses fine but should not), but it seems like a bug in the implementation.

I've added support for this in libscfg: https://codeberg.org/emersion/libscfg/pulls/1 This worked pretty well overall. `comment_after_brace` doesn't pass there (parses fine but should not), but it seems like a bug in the implementation.

tests/scfg/README.md Outdated

					
				@ -0,0 +23,4 @@

				* All comments are removed.

				* Extraneous whitespace (spaces, tabs, newlines) is removed

				* All whitespace is normalized to a single space character.

				* All names and parameters must be enclosed in double quotes.

In libscfg, because of this rule I've had to add a special "test mode" flag to the formatter. I generally dislike adding flags changing the behavior of an implementation under tests.

An alternative would be to define a canonical scfg format which uses double quotes only if a parameter contains special characters. But this feels a bit arbitrary, and the rules can grow quite complex if we want to reduce the clutter. For instance, one could decide to use single quotes if a parameter's special characters are only double quotes, or even count the number of special characters to reduce the number of escapes.

The current approach (unconditional double quotes) seems simplest.

In libscfg, because of this rule I've had to add a special "test mode" flag to the formatter. I generally dislike adding flags changing the behavior of an implementation under tests. An alternative would be to define a canonical scfg format which uses double quotes only if a parameter contains special characters. But this feels a bit arbitrary, and the rules can grow quite complex if we want to reduce the clutter. For instance, one could decide to use single quotes if a parameter's special characters are only double quotes, or even count the number of special characters to reduce the number of escapes. The current approach (unconditional double quotes) seems simplest.

My current implementation of scfg doesn't support serialization at all. The output is only implemented in my tests. As I said, I'm open to better suggestions (JSON, XML, …) for a canonical format; this was just the simplest one I could think of.

Yeah, that makes sense. If I had to use JSON from libscfg, I would probably copy/paste the few lines of code required to encode JSON strings and hand-roll the encoder, resulting in code pretty similar to a scfg encoder. For languages with built-in JSON support, would be easier though.

Let's go with this approach - we can always convert the files if we decide we should switch to something else in the future.

Yeah, that makes sense. If I had to use JSON from libscfg, I would probably copy/paste the few lines of code required to encode JSON strings and hand-roll the encoder, resulting in code pretty similar to a scfg encoder. For languages with built-in JSON support, would be easier though. Let's go with this approach - we can always convert the files if we decide we should switch to something else in the future.

emersion marked this conversation as resolved

tests/scfg/README.md Outdated

					
				@ -0,0 +28,4 @@

				* Opening braces `{` remain on the same line as their associated directive.

				* Closing braces `}` are on their own line, indented to match the indentation

				  level of their opening directive

				* All directives of a block must be indented by 4 spaces per level of nesting.

I don't want to rehash the decades-old argument about spaces vs tabs, but I personally use tabs everywhere. This repo's README also uses tabs.

No problem, although I prefer spaces, I will change the indentation level to one tab.

emersion marked this conversation as resolved

tests/scfg/README.md Outdated

					
				@ -0,0 +29,4 @@

				* Closing braces `}` are on their own line, indented to match the indentation

				  level of their opening directive

				* All directives of a block must be indented by 4 spaces per level of nesting.

				* The output must end with a single trailing newline character (`\n`)

Test files in this directory end with two final newline characters: one for the final directive, and one extra. I'm not entirely following the intent here: why the extra newline? All directives and closing braces are followed by a newline, why add another one?

To be honest: The final line break arose purely out of laziness, without any technical reason. When I first had the idea to implement a general test suite for scfg, the initial test cases had a final but unnecessary line break.

I then stuck with it for all subsequent test cases.

I will correct this.

To be honest: The final line break arose purely out of laziness, without any technical reason. When I first had the idea to implement a general test suite for scfg, the initial test cases had a final but unnecessary line break. I then stuck with it for all subsequent test cases. I will correct this.

emersion marked this conversation as resolved

emersion referenced this pull request from a commit

2026-01-25 15:27:16 +01:00

Add scfg spec tests

emersion reviewed

2026-01-25 17:41:53 +01:00

emersion left a comment

LGTM, thanks!

Two more minor details:

Can you squash all of your commits together?
Can we drop the intermediate "scfg" directory under "tests"? The repository is already named "scfg" and there will never be tests intended for something else than scfg.

LGTM, thanks! Two more minor details: - Can you squash all of your commits together? - Can we drop the intermediate "scfg" directory under "tests"? The repository is already named "scfg" and there will never be tests intended for something else than scfg.

hannesbraun reviewed

2026-01-26 14:23:32 +01:00

tests/scfg/expected/escape_seq.scfg Outdated

					
				@ -0,0 +1 @@

				"escape\t\r\n\nme"

Maybe I missed something, but shouldn't this be escapetrnnme?

Maybe I missed something, but shouldn't this be `escapetrnnme`?

If you follow the grammar exactly, your result is correct. However, I thought the grammar meant that control characters should be encoded this way. What does libscfg say about the test?

If you follow the grammar exactly, your result is correct. However, I thought the grammar **meant** that control characters should be encoded this way. What does libscfg say about the test?

I agree the grammar is not enough to explain semantics. We should add prose in plain English to describe semantics.

The original intent of the spec is that \ escapes any following character, just like POSIX shell does. That is, \ is always dropped from the output.

libscfg test result:

scfg_format_file() mismatch, got:
"escapetrnnme"
want:
"escape\t\r\n\nme"

I agree the grammar is not enough to explain semantics. We should add prose in plain English to describe semantics. The original intent of the spec is that `\` escapes any following character, just like POSIX shell does. That is, `\` is always dropped from the output. libscfg test result: > scfg_format_file() mismatch, got: > ``` > "escapetrnnme" > > ``` > want: > ``` > "escape\t\r\n\nme" > > ```

Thanks for the feedback, I'll update the test case (and fix my implementation ;)).

To encode control characters like \t and \n someone has to use \\t \\n etc., right?

Thanks for the feedback, I'll update the test case (and fix my implementation ;)). To encode control characters like `\t` and `\n` someone has to use `\\t` `\\n` etc., right?

One could use single quotes to not interpret backslash: '\n' is equivalent to "\\n".

(This is a bit weird because it's halfway between POSIX shell and… something else: POSIX shell has \n interpretation for double quotes, but scfg doesn't.)

One could use single quotes to not interpret backslash: `'\n'` is equivalent to `"\\n"`. (This is a bit weird because it's halfway between POSIX shell and… something else: POSIX shell has `\n` interpretation for double quotes, but scfg doesn't.)

Thanks. It's fixed in 9082787b6e and I added two more test (sq_backslash and sq_backslash2)

Thanks. It's fixed in 9082787b6e and I added two more test (sq_backslash and sq_backslash2)

heuer marked this conversation as resolved

heuer force-pushed master from 32da49032f to 7096cde400

2026-01-26 18:13:17 +01:00

Compare

heuer force-pushed master from 7096cde400 to 0d5910ebc9

2026-01-26 18:23:54 +01:00

Compare

heuer commented

2026-01-26 18:33:38 +01:00

@emersion wrote in #1 (comment):

* Can you squash all of your commits together?
* Can we drop the intermediate "scfg" directory under "tests"? The repository is already named "scfg" and there will never be tests intended for something else than scfg.

Fixed.

If the file "escape_seq.scfg" needs to be corrected (see Hannes' comment), I can do that at short notice.

@emersion wrote in https://codeberg.org/emersion/scfg/pulls/1#issuecomment-10150569: > * Can you squash all of your commits together? > * Can we drop the intermediate "scfg" directory under "tests"? The repository is already named "scfg" and there will never be tests intended for something else than scfg. Fixed. If the file "escape_seq.scfg" needs to be corrected (see Hannes' comment), I can do that at short notice.

heuer force-pushed master from 0d5910ebc9 to 9082787b6e

2026-01-30 19:31:24 +01:00

Compare

emersion merged commit 9082787b6e into master

2026-02-01 00:13:11 +01:00

emersion commented

2026-02-01 00:13:21 +01:00

Thank you!

emersion referenced this pull request from a commit

2026-02-01 18:22:16 +01:00

Add scfg spec tests

Sign in to join this conversation.