Implemented inpos and outpos in lou_checkyaml.c. by BueVest · Pull Request #430 · liblouis/liblouis

BueVest · 2017-10-23T10:17:01Z

When approved, I will make multi-pass tests for inpos and outpos

… to brl_checks.c to distinguish between inpos, outpos and cursorpos.

bertfrees · 2017-10-23T11:08:10Z

-		} else if (!strcmp(option_name, "cursorPos")) {
+			*typeform = read_typeforms(parser, wordLen);
+
+			// FIXME: Comparisons should be truly case insensitive, but apparently no standard function for that.


I would personally prefer that the option names are not case insensitive at all.

There were already some inconsistancy between cursorPos (cap P) and inpos and outpos in other contexts (small p). You could say that cursor position is clearly two words, while inpos and outpos are not necesarily two words. Nevertheless, it is a potential source of confusion. So, I thought we should be lenient on cases. Either that or stick to lower case completely.

Let's just call it "inputPos", "outputPos" and "cursorPos" for now. Hopefully we can abstract out inputPos and outputPos in the future anyway. And eventually when cursorPos is only a single number I'd like to rename it to simply "cursor" ("pos" is kinda redundant).

bertfrees · 2017-10-23T11:12:51Z

I had a quick look and it looked alright. @egli maybe you can have a look too? Ideally I'd like to have a better syntax, but right now I don't want to get in the way of Bue who wants to actually write some tests. That is the most important thing right now.

egli · 2017-10-23T12:24:12Z

Having only looked at it very briefly I can say that it looks good. There are maybe a few things that I would have done differently and there is a buffer overrun lurking. It could probably go in and I can clean it up afterwards. But what I'm really curious is: How does a inpos YAML test look like?

bertfrees · 2017-10-23T12:41:19Z

You'll see when Bue has made some :)

egli

Looks good generally. You should be the maintainer :-)

egli · 2017-10-23T12:28:09Z

+		if ((const char *)event.data.scalar.value == tail)
+			error_at_line(EXIT_FAILURE, 0, file_name, event.start_mark.line + 1,
+					"No digits found in inpos position '%s'. Must be a number\n",
+					event.data.scalar.value);


This is a buffer overrun waiting to happen. Never read more than you allocated. Add a condition to the while loop (i has to be smaller than len).

It was code that I took from the original read_cursorPos() function. I will have a look at it.

Ah, hehe. I guess at the time I hadn't been burned by the CVEs. Yes looking at it it looks like it has the same problems. Basically it should be smth along the line of

while ((parse_error = yaml_parser_parse(parser, &event)) && (event.type == YAML_SCALAR_EVENT) && (i < len)) {

in all places

egli · 2017-10-23T12:29:10Z

+		if ((const char *)event.data.scalar.value == tail)
+			error_at_line(EXIT_FAILURE, 0, file_name, event.start_mark.line + 1,
+					"No digits found in outpos position '%s'. Must be a number\n",
+					event.data.scalar.value);


Again: buffer overrun

egli · 2017-10-23T12:38:44Z

+					"Not a valid outpos position '%s'. Must be a number\n",
+					event.data.scalar.value);
+		if ((const char *)event.data.scalar.value == tail)
+			error_at_line(EXIT_FAILURE, 0, file_name, event.start_mark.line + 1,


I think this code path indicates that chars other than digits where found in the value, so there could be digits but there definitely are also other non-digits characters. Maybe adapt the error message.

Forget what I said here. I think the error message is good enough

egli · 2017-10-23T12:42:48Z

+			*typeform = read_typeforms(parser, wordLen);
+
+			// FIXME: Comparisons should be truly case insensitive, but apparently no standard function for that.
+		} else if (!strcmp(option_name, "inPos") || !strcmp(option_name, "inpos")) {


Not sure if I like to API so relaxed, i.e. allowing multiple ways to specify the test ("inpos" and inPos"). Do we do that in other places too?

No I don't think so. I agree that one way should be enough.

egli · 2017-10-23T12:44:35Z

-			if (xfail != check_cursor_pos(*table, word, cursorPos)) {
-				if (description) fprintf(stderr, "%s\n", description);
-				error_at_line(0, 0, file_name, event.start_mark.line + 1,
+		if (inPos || outPos || cursorPos) {


Why is there this if a or b or c, followed by if a? Couldn't this just be a series if if a else if b else if c?

Bue just adapted to the existing else if (hyphenation) below. I think there should be an error message though when cursorPos is specified in hyphenation test mode.

BueVest · 2017-10-23T14:32:39Z

Here is an example that I made to fail to check what it looks like:

['altid', '⠁⠞⠙', {inPos: [0,1,2], outPos: [0,1,2,3,4]}]

This produces the following result:

Inpos failure:
Expected 1, received 0 in index 1
Expected 2, received 0 in index 2
lou_checkyaml:da-dk-g28.yaml:262: Failure
Outpos failure:
Expected 1, received 0 in index 1
Expected 2, received 0 in index 2
Expected 3, received 0 in index 3
Expected 4, received 0 in index 4
lou_checkyaml:da-dk-g28.yaml:262: Failure

Note that "altid" (always) is a word contraction, so it is atomic. All positions should be 0.

bertfrees · 2017-10-24T08:09:14Z

I'd just merge, we can fix things afterwards if needed.

egli · 2017-10-24T08:28:54Z

@bertfrees are you talking to me?

bertfrees · 2017-10-24T08:36:44Z

@egli you're the only one here.

BueVest · 2017-10-24T16:54:21Z

Please, wait a sec. I am editing just now. I would also like to add to the documentation.

bertfrees · 2017-10-24T17:00:56Z

OK.

…ase versions.

BueVest · 2017-10-25T15:23:58Z

How does the code look now?

The following tests are the same as those given as examples in the doc. They can be copied into a file, e.g. inpos_outpos_simple.yaml, to illustrate how it works. However, they depend on en-us-ctb, so, perhaps they should not be merged into the project

table: [tables/unicode.dis, tables/en-us-g2.ctb]
tests:
  -
    - went
    - ⠺⠢⠞
    - inputPos: [0,1,3]
  -
    - went
    - ⠺⠢⠞
    - outputPos: [0,1,1,2]
  -
    - went
    - ⠺⠑⠝⠞
    - cursorPos: [0,1,2,3]

bertfrees · 2017-10-25T15:34:55Z

Great. Looks very good!

The only thing I wouldn't have done are these checks:

if (val >= wrdlen)
    error_at_line(EXIT_FAILURE, 0, file_name, event.start_mark.line + 1,
        "Input position too great: %d.\n" +
        "In this test, input position cannot be greater than %d.", val, wrdlen-1);

They are not needed to make the code safe. It is true that these conditions need to be met for the test to pass, however there are more conditions (e.g. the array must be monotonically non-decreasing) and these conditions are automatically checked by the test itself.

But don't worry about this now. We can tweak things later.

egli · 2017-10-26T06:16:03Z

@bertfrees well I think the test is usefull to give decent error messages.
@BueVest now the only problem is that because of that test (val >= wrdlen) existing tests fail, namely

lou_checkyaml:./yaml/en-GB-g2_harness.yaml:224: Cursor position too great: 9.
In this test, cursor position cannot be greater than 8.

lou_checkyaml:./yaml/hu-hu-g1_harness.yaml:1188: Cursor position too great: 17.
In this test, cursor position cannot be greater than 16.

Are these tests faulty or is there a problem with the (val >= wrdlen) test?

egli · 2017-10-26T06:27:19Z

-passed in using the ``flow style'' notation.
+behavior as in the following example:
+note that compbrlAtCursor is implicitly specified for all cursor positions. This makes this test suitable only for testing a single word, since the translation would otherwise vary according to the cursor position.



I see that brl_checks.c does indeed hard code the mode when invoking the cursor position tests. You seem to say that this is wrong. But should we not fix this behaviour instead of documenting it?

I agree. But later.

bertfrees · 2017-10-26T09:26:07Z

I think the test is useful to give decent error messages

Ok but then I'd say implement the other conditions (such as monotonically non-decreasing) too.

Anyway, let's not put any more effort in it now. This PR was meant to implement a quick solution, not a perfect one.

BueVest · 2017-10-26T09:37:19Z

Me too. But I didn’t know if it was intended or unintended. I also think that cursorPos should be one int, not a list, but, as Bert says, let’s do that later. I would now like to make the tests that Bert originally asked for to show expected and actual behavior for multi-pass. Perhaps that should be in another PR.

egli · 2017-10-26T09:40:39Z

Well, I'm happy to merge. But I need the test suite to pass. We can either

Remove the (val >= wrdlen) test or
look at the failing tests and fix them

bertfrees · 2017-10-26T09:46:10Z

Just comment out those tests. They suck.

bertfrees · 2017-10-26T09:51:31Z

Alternatively Bue could just base his new PR with the tests on this one. Then we have some time to fix the failing tests.

But honestly I would just wait with that until the cursorPos test feature is fixed (see #133 for what is wrong with it) and look at the original JSON tests for the expected output.

BueVest · 2017-10-26T18:03:44Z

Excuse me. I can see why the harness test fails, but why the inpos test. I don’t think I changed anything about that one.

BueVest · 2017-10-26T21:06:30Z

Sorry, I actually did modify the inpos test, but it was on another branch, which I didn’t commit, so I forgot about it.

BueVest · 2017-10-28T09:23:52Z

Just comment out those tests. They suck.

Do you mean that? I can easily do it, but I wouldn’t dare do it by myself.

egli · 2017-10-28T18:42:47Z

Well, looking at the comments around those tests makes me believe that those tests do kinda suck. But nevertheless I think there is some problem with the (val >= wrdlen) test. Either the wrdlen isn't calculated properly or the something else is fishy. I hope to find some time on Monday to look at this and merge.

BueVest · 2017-11-30T14:07:45Z

Yes, now the context rules would act like the rest of the multi-pass rules, i.e. move the cursor to the first position after the inserted content. This is in accordance with the manual, and it may be fine when you are inserting something, e.g. a Braille indicator, but it fails when you do a replacement like in the tests where “-“ (@36) is replaced with “,-“ (@6-36) depending on the pass. In these cases, when the input cursor is on a “-“, the output cursor will be on the next letter, e.g. “o”, because “-“ is being replaced with “,-“. Admittedly, I could have avoided this in the test by not replacing “-“ but simply inserting “,”, but in many real life situations, you have to do a replacement, not just an insertion. Da-dk-g28.ctb contains quite a few examples of this. Currently, the context rules act more like the other pass 1 rules, i.e. placing the cursor on the first cell of the replacement string. It would be a possibility to do that also with the other multi-pass rules. That may be fine for replacement rules, but then, in the case of just inserting a Braille indicator, the cursor would be placed on the indicator and not on the following cell like with the normal Braille indicators supported directly by Liblouis. I hope you can see the dilemma here. This is why I landed on a solution where the table author can specify the cursor position relative to the start of the replacement string. This may of course be some work to implement. So, until we can do that, perhaps the best solution would be to have all multi-pass rules act like the normal pass 1 rules, i.e. for each rule, let the first character/cell of the “string to be replaced” correspond with the first character/cell of the “replacement string”. If the string to be replaced is empty ([]), the following position in the input should be used, which should take care of some of the rules to insert extra indicators. Like I wrote, this may not create a perfect result in all cases, but the logic will be consistent through all passes, and later, we can compensate by letting the table authors specify another cursor position relationship for each rule. I hope this all makes sense. Anyhow, I think we should merge, like we agreed last week. Then I will add a new test file with extensive pass 1 inpos and outpos tests to keep it separate from the multi-pass issues. What do you think?

…sts are commented out, because they will pass or fail at random.

…s or outputPos. Uncovers a fail in fr-bfu-g2_harness.yaml (marked with xfail).

see liblouis#430 (comment)

Also renamed some variables.

Previously it was only being updated when inputPositions and outputPositions are not both NULL, while in other functions it was updated unconditionally. Note that posMapping is not only used for computing inputPositions and outputPositions, but also for updating typebuf after pass 0. However for_updatePositions is not used in pass 0 which is why emphasis was working fine.

fixes liblouis#445

bertfrees · 2017-12-04T12:45:01Z

@egli Travis doesn't build this branch anymore. It says "GitHub payload is missing a merge commit (mergeable_state: "unknown", merged: false)". What should I do?

bertfrees · 2017-12-04T12:48:31Z

I pushed it to a new branch, now it's building.

bertfrees · 2017-12-04T12:48:49Z

If it passes I'll merge, ok?

egli · 2017-12-04T13:02:20Z

Sure feel free to merge, just be careful with merging liblouis.texi

bertfrees · 2017-12-04T13:13:30Z

Ah yes there's one conflict. It looks like commit 420b50b was also included in #444, so I'll drop it from this PR.

see #430 (comment)

bertfrees · 2017-12-04T13:28:38Z

Merged in ac59139

Implemented inpos and outpos in lou_checkyaml.c. Added extra messages…

aa29a37

… to brl_checks.c to distinguish between inpos, outpos and cursorpos.

bertfrees reviewed Oct 23, 2017

View reviewed changes

bertfrees requested a review from egli October 23, 2017 11:10

egli reviewed Oct 23, 2017

View reviewed changes

BueVest added 3 commits October 24, 2017 19:02

renamed inPos to inputPos and outpos to outputPos and removed lower c…

ee7d29e

…ase versions.

added error checks.

cdf20a1

changed documentation.

420b50b

egli reviewed Oct 26, 2017

View reviewed changes

Removed check for values > max position.

adf8ea6

bertfrees mentioned this pull request Nov 20, 2017

Update output positions during multi pass (forward translation only) #133

Closed

egli assigned bertfrees Nov 27, 2017

bertfrees added the 4 - Done label Nov 27, 2017

BueVest and others added 16 commits December 4, 2017 13:19

added yaml tests to reflect the current behaviour. Note: to outpos te…

965b71d

…sts are commented out, because they will pass or fail at random.

Ensured actual test of translation when specifying cursorPos, inputPo…

965bca3

…s or outputPos. Uncovers a fail in fr-bfu-g2_harness.yaml (marked with xfail).

Fix a bug: inlen was being set to the real length AFTER the correct pass

36e5a3d

see liblouis#430 (comment)

Uncommented tests that previously changed inlen.

30e3686

The mode variable was never passed to check_full.

140fb61

Fix pass1Only mode

9473d1a

Add YAML test for pass1Only mode and remove C test

05ee0c1

Updated inpos_outpos.yaml to match new requirements

550fa10

Revert expected behavior of context rule

0163dfa

Add some more tests to inpos_outpos.yaml

30e702e

Remove C test for output positions

967eb83

Refactoring: combine some variables in structs

8f412b8

Also renamed some variables.

Issue warning when using pass1Only flag

526e02a

fixes liblouis#445

Simplify code

b77de46

Compute outputPositions from inputPositions

c772eac

bertfrees force-pushed the buevest_yaml_inpos_outpos branch from 222dd44 to c772eac Compare December 4, 2017 12:30

bertfrees added a commit that referenced this pull request Dec 4, 2017

Fix a bug: inlen was being set to the real length AFTER the correct pass

77f08a5

see #430 (comment)

bertfrees closed this Dec 4, 2017

BueVest deleted the buevest_yaml_inpos_outpos branch September 1, 2025 16:52

Uh oh!

Conversation

BueVest commented Oct 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bertfrees commented Oct 23, 2017

Uh oh!

egli commented Oct 23, 2017

Uh oh!

bertfrees commented Oct 23, 2017

Uh oh!

egli left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BueVest commented Oct 23, 2017

Uh oh!

bertfrees commented Oct 24, 2017

Uh oh!

egli commented Oct 24, 2017

Uh oh!

bertfrees commented Oct 24, 2017

Uh oh!

BueVest commented Oct 24, 2017 via email • edited by bertfrees Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bertfrees commented Oct 24, 2017

Uh oh!

BueVest commented Oct 25, 2017 • edited by bertfrees Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bertfrees commented Oct 25, 2017

Uh oh!

egli commented Oct 26, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bertfrees commented Oct 26, 2017

Uh oh!

BueVest commented Oct 26, 2017 via email • edited by egli Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

egli commented Oct 26, 2017

Uh oh!

bertfrees commented Oct 26, 2017

Uh oh!

bertfrees commented Oct 26, 2017

Uh oh!

BueVest commented Oct 26, 2017 via email • edited by egli Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BueVest commented Oct 26, 2017 via email

BueVest commented Oct 24, 2017 via email •

edited by bertfrees

Loading

BueVest commented Oct 25, 2017 •

edited by bertfrees

Loading

BueVest commented Oct 26, 2017 via email •

edited by egli

Loading

BueVest commented Oct 26, 2017 via email •

edited by egli

Loading

BueVest commented Oct 28, 2017 via email •

edited by egli

Loading

BueVest commented Nov 30, 2017 via email •

edited by bertfrees

Loading