Skip to content

Implemented inpos and outpos in lou_checkyaml.c.#430

Closed
BueVest wants to merge 22 commits into
liblouis:masterfrom
BueVest:buevest_yaml_inpos_outpos
Closed

Implemented inpos and outpos in lou_checkyaml.c.#430
BueVest wants to merge 22 commits into
liblouis:masterfrom
BueVest:buevest_yaml_inpos_outpos

Conversation

@BueVest

@BueVest BueVest commented Oct 23, 2017

Copy link
Copy Markdown
Contributor

When approved, I will make multi-pass tests for inpos and outpos

… to brl_checks.c to distinguish between inpos, outpos and cursorpos.
Comment thread tools/lou_checkyaml.c Outdated
} else if (!strcmp(option_name, "cursorPos")) {
*typeform = read_typeforms(parser, wordLen);

// FIXME: Comparisons should be truly case insensitive, but apparently no standard function for that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally prefer that the option names are not case insensitive at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were already some inconsistancy between cursorPos (cap P) and inpos and outpos in other contexts (small p). You could say that cursor position is clearly two words, while inpos and outpos are not necesarily two words. Nevertheless, it is a potential source of confusion. So, I thought we should be lenient on cases. Either that or stick to lower case completely.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just call it "inputPos", "outputPos" and "cursorPos" for now. Hopefully we can abstract out inputPos and outputPos in the future anyway. And eventually when cursorPos is only a single number I'd like to rename it to simply "cursor" ("pos" is kinda redundant).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@bertfrees bertfrees requested a review from egli October 23, 2017 11:10
@bertfrees

Copy link
Copy Markdown
Member

I had a quick look and it looked alright. @egli maybe you can have a look too? Ideally I'd like to have a better syntax, but right now I don't want to get in the way of Bue who wants to actually write some tests. That is the most important thing right now.

@egli

egli commented Oct 23, 2017

Copy link
Copy Markdown
Member

Having only looked at it very briefly I can say that it looks good. There are maybe a few things that I would have done differently and there is a buffer overrun lurking. It could probably go in and I can clean it up afterwards. But what I'm really curious is: How does a inpos YAML test look like?

@bertfrees

Copy link
Copy Markdown
Member

You'll see when Bue has made some :)

@egli egli left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good generally. You should be the maintainer :-)

Comment thread tools/lou_checkyaml.c
if ((const char *)event.data.scalar.value == tail)
error_at_line(EXIT_FAILURE, 0, file_name, event.start_mark.line + 1,
"No digits found in inpos position '%s'. Must be a number\n",
event.data.scalar.value);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a buffer overrun waiting to happen. Never read more than you allocated. Add a condition to the while loop (i has to be smaller than len).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was code that I took from the original read_cursorPos() function. I will have a look at it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hehe. I guess at the time I hadn't been burned by the CVEs. Yes looking at it it looks like it has the same problems. Basically it should be smth along the line of

  while ((parse_error = yaml_parser_parse(parser, &event)) &&
	 (event.type == YAML_SCALAR_EVENT) &&
	 (i < len)) {

in all places

Comment thread tools/lou_checkyaml.c
if ((const char *)event.data.scalar.value == tail)
error_at_line(EXIT_FAILURE, 0, file_name, event.start_mark.line + 1,
"No digits found in outpos position '%s'. Must be a number\n",
event.data.scalar.value);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again: buffer overrun

Comment thread tools/lou_checkyaml.c
"Not a valid outpos position '%s'. Must be a number\n",
event.data.scalar.value);
if ((const char *)event.data.scalar.value == tail)
error_at_line(EXIT_FAILURE, 0, file_name, event.start_mark.line + 1,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code path indicates that chars other than digits where found in the value, so there could be digits but there definitely are also other non-digits characters. Maybe adapt the error message.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget what I said here. I think the error message is good enough

Comment thread tools/lou_checkyaml.c Outdated
*typeform = read_typeforms(parser, wordLen);

// FIXME: Comparisons should be truly case insensitive, but apparently no standard function for that.
} else if (!strcmp(option_name, "inPos") || !strcmp(option_name, "inpos")) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I like to API so relaxed, i.e. allowing multiple ways to specify the test ("inpos" and inPos"). Do we do that in other places too?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't think so. I agree that one way should be enough.

Comment thread tools/lou_checkyaml.c
if (xfail != check_cursor_pos(*table, word, cursorPos)) {
if (description) fprintf(stderr, "%s\n", description);
error_at_line(0, 0, file_name, event.start_mark.line + 1,
if (inPos || outPos || cursorPos) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there this if a or b or c, followed by if a? Couldn't this just be a series if if a else if b else if c?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bue just adapted to the existing else if (hyphenation) below. I think there should be an error message though when cursorPos is specified in hyphenation test mode.

@BueVest

BueVest commented Oct 23, 2017

Copy link
Copy Markdown
Contributor Author

Here is an example that I made to fail to check what it looks like:

  • ['altid', '⠁⠞⠙', {inPos: [0,1,2], outPos: [0,1,2,3,4]}]

This produces the following result:

Inpos failure:
Expected 1, received 0 in index 1
Expected 2, received 0 in index 2
lou_checkyaml:da-dk-g28.yaml:262: Failure
Outpos failure:
Expected 1, received 0 in index 1
Expected 2, received 0 in index 2
Expected 3, received 0 in index 3
Expected 4, received 0 in index 4
lou_checkyaml:da-dk-g28.yaml:262: Failure

Note that "altid" (always) is a word contraction, so it is atomic. All positions should be 0.

@bertfrees

Copy link
Copy Markdown
Member

I'd just merge, we can fix things afterwards if needed.

@egli

egli commented Oct 24, 2017

Copy link
Copy Markdown
Member

@bertfrees are you talking to me?

@bertfrees

Copy link
Copy Markdown
Member

@egli you're the only one here.

@BueVest

BueVest commented Oct 24, 2017 via email

Copy link
Copy Markdown
Contributor Author

@bertfrees

Copy link
Copy Markdown
Member

OK.

@BueVest

BueVest commented Oct 25, 2017

Copy link
Copy Markdown
Contributor Author

How does the code look now?

The following tests are the same as those given as examples in the doc. They can be copied into a file, e.g. inpos_outpos_simple.yaml, to illustrate how it works. However, they depend on en-us-ctb, so, perhaps they should not be merged into the project

table: [tables/unicode.dis, tables/en-us-g2.ctb]
tests:
  -
    - went
    - ⠺⠢⠞
    - inputPos: [0,1,3]
  -
    - went
    - ⠺⠢⠞
    - outputPos: [0,1,1,2]
  -
    - went
    - ⠺⠑⠝⠞
    - cursorPos: [0,1,2,3]

@bertfrees

Copy link
Copy Markdown
Member

Great. Looks very good!

The only thing I wouldn't have done are these checks:

if (val >= wrdlen)
    error_at_line(EXIT_FAILURE, 0, file_name, event.start_mark.line + 1,
        "Input position too great: %d.\n" +
        "In this test, input position cannot be greater than %d.", val, wrdlen-1);

They are not needed to make the code safe. It is true that these conditions need to be met for the test to pass, however there are more conditions (e.g. the array must be monotonically non-decreasing) and these conditions are automatically checked by the test itself.

But don't worry about this now. We can tweak things later.

@egli

egli commented Oct 26, 2017

Copy link
Copy Markdown
Member

@bertfrees well I think the test is usefull to give decent error messages.
@BueVest now the only problem is that because of that test (val >= wrdlen) existing tests fail, namely

lou_checkyaml:./yaml/en-GB-g2_harness.yaml:224: Cursor position too great: 9.
In this test, cursor position cannot be greater than 8.

lou_checkyaml:./yaml/hu-hu-g1_harness.yaml:1188: Cursor position too great: 17.
In this test, cursor position cannot be greater than 16.

Are these tests faulty or is there a problem with the (val >= wrdlen) test?

Comment thread doc/liblouis.texi
passed in using the ``flow style'' notation.
behavior as in the following example:
note that compbrlAtCursor is implicitly specified for all cursor positions. This makes this test suitable only for testing a single word, since the translation would otherwise vary according to the cursor position.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that brl_checks.c does indeed hard code the mode when invoking the cursor position tests. You seem to say that this is wrong. But should we not fix this behaviour instead of documenting it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. But later.

@bertfrees

Copy link
Copy Markdown
Member

I think the test is useful to give decent error messages

Ok but then I'd say implement the other conditions (such as monotonically non-decreasing) too.

Anyway, let's not put any more effort in it now. This PR was meant to implement a quick solution, not a perfect one.

@BueVest

BueVest commented Oct 26, 2017 via email

Copy link
Copy Markdown
Contributor Author

@egli

egli commented Oct 26, 2017

Copy link
Copy Markdown
Member

Well, I'm happy to merge. But I need the test suite to pass. We can either

  • Remove the (val >= wrdlen) test or
  • look at the failing tests and fix them

@bertfrees

Copy link
Copy Markdown
Member

Just comment out those tests. They suck.

@bertfrees

Copy link
Copy Markdown
Member

Alternatively Bue could just base his new PR with the tests on this one. Then we have some time to fix the failing tests.

But honestly I would just wait with that until the cursorPos test feature is fixed (see #133 for what is wrong with it) and look at the original JSON tests for the expected output.

@BueVest

BueVest commented Oct 26, 2017 via email

Copy link
Copy Markdown
Contributor Author

@BueVest

BueVest commented Oct 26, 2017 via email

Copy link
Copy Markdown
Contributor Author

@BueVest

BueVest commented Oct 28, 2017 via email

Copy link
Copy Markdown
Contributor Author

@egli

egli commented Oct 28, 2017

Copy link
Copy Markdown
Member

Well, looking at the comments around those tests makes me believe that those tests do kinda suck. But nevertheless I think there is some problem with the (val >= wrdlen) test. Either the wrdlen isn't calculated properly or the something else is fishy. I hope to find some time on Monday to look at this and merge.

@BueVest

BueVest commented Nov 30, 2017 via email

Copy link
Copy Markdown
Contributor Author

BueVest and others added 16 commits December 4, 2017 13:19
…sts are commented out, because they will pass or fail at random.
…s or outputPos. Uncovers a fail in fr-bfu-g2_harness.yaml (marked with xfail).
Previously it was only being updated when inputPositions and
outputPositions are not both NULL, while in other functions it was
updated unconditionally.

Note that posMapping is not only used for computing inputPositions and
outputPositions, but also for updating typebuf after pass 0. However
for_updatePositions is not used in pass 0 which is why emphasis was
working fine.
@bertfrees bertfrees force-pushed the buevest_yaml_inpos_outpos branch from 222dd44 to c772eac Compare December 4, 2017 12:30
@bertfrees

Copy link
Copy Markdown
Member

@egli Travis doesn't build this branch anymore. It says "GitHub payload is missing a merge commit (mergeable_state: "unknown", merged: false)". What should I do?

@bertfrees

Copy link
Copy Markdown
Member

I pushed it to a new branch, now it's building.

@bertfrees

Copy link
Copy Markdown
Member

If it passes I'll merge, ok?

@egli

egli commented Dec 4, 2017

Copy link
Copy Markdown
Member

Sure feel free to merge, just be careful with merging liblouis.texi

@bertfrees

Copy link
Copy Markdown
Member

Ah yes there's one conflict. It looks like commit 420b50b was also included in #444, so I'll drop it from this PR.

@bertfrees

Copy link
Copy Markdown
Member

Merged in ac59139

@BueVest BueVest deleted the buevest_yaml_inpos_outpos branch September 1, 2025 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants