Buevest german backtrans#842
Conversation
|
Apparently, the Travis build fails with the following message. What could be going wrong? |
|
@BueVest I just restarted that particular test and now all is green. Might have been a fluke |
|
I think we should merge this PR now. It contains fully working back-translatable grade 0 and 1. Grade 2 will require some substantial development and a lot of work. So, I think it would be better to do that in another PR. Then we can start using g0 and g1 and gain some experience. |
|
@BueVest OK, we'll see what we can do. It is a bit late that you tell this and there are a lot of changes to review. Tomorrow is release day. |
|
Had a quick look. Here are some comments:
|
|
You wrote:
* What's with the temporary test file?
Sorry, I thought I had already removed it. Didn’t I move it to the proper test directory?
* Why do we need separate "bd" tables? Couldn't both forward and backward translations be defined in the same tables?
BD is for “bidirectional” (feel free to rename). So, they are in fact for both forward and backward translation. Some aspects of the Braille created by the original (non-BD) tables are less suited for back-translation, e.g. the lack of capital letters, the unconditional removal of some spaces, the use of the not so detailed accented letters etc. This corresponds to the difference between the Danish 6 dots grade 1 and 2 tables and their literary counterparts. The literary tables generate beautiful Braille for reading, but it is less suited for back-translation than the Braille created by the other tables. Hope it makes sense.
* You have created quite a lot of duplication with the "bd" tables. Even if there is a need to have two sets of tables, could something be done about the duplication?
Yes, that would be nice indeed, but that would require a change of the file structure for the German tables. We could move the lines that can be shared into new files, which are then shared between the forward/literary tables and the bidirectional ones. However, if this change was to be made, I would rather make it in collaboration with the original table author, and not by myself. Currently, the chardefs file is shared, because I could make the necessary changes without affecting forward translation in the original tables.
* Why do we need the "bd" tests?
The BD tests test the bidirectional tables in both directions. The other tests test the original tables, and only in the forward direction. The files could in theory be merged, but since they test two separate sets of tables, I think they should be separate yaml files.
* Changes were made to de-chardefs6.cti, but this was not reflected in the tests.
The changes are primarily to the order of character definitions to ensure correct back-translation. They should not affect forward translation.
If there are any big problems with merging now, we can do it later. I just thought it would be a good idea to gain some broader user experience with the tables now. Especially, since people are currently using the “forward only” tables for back-translation with screen readers, e.g. NVDA, and flagging back-translation problems as bugs rather than request for new features. Perhaps we should flag it as an error to use a forward table for back-translation? Just thinking…
|
Yes, I assumed it was like the Danish tables, but still: couldn't they be combined in the same table? Or would that be too confusing?
OK. I think it would be good to do this refactoring right from the start. This also gives us the opportunity to get some feedback from the original table author.
I think it might be easier to follow if the tests were grouped into the things that work the same in both versions of the tables and the things were the tables differ.
Hmm, are you sure? I thought I saw some actual differences?
Yes, it sure is nice to get user feedback as soon as possible, however we shouldn't use time pressure as an excuse to do things in a sloppy way. If you would have done the PR a week ago I'm sure we would have been able to fix all the issues. I do trust you that you would fix the issues afterwards if we would merge the PR now, but still...
This is something that needs to be handled on the NVDA side ideally. Adding these kind of metadata based limitations into the library itself only works if you can select tables only through metadata, but this is not the case at the moment. |
|
What needs to be done here? Concerning Bert's suggestions above:
Any suggestions are welcome. |
|
@BueVest What happened in Git? There are two seemingly identical branches that got merged. Now the diff in Github has become unreadable. I cleaned it up locally and also rebased onto master. If you want I can push it. |
Yes. Well, it was just a suggestion. I think it might help in understanding how the tables differ. But I don't know how feasible it actually is. It depends on how much and what kind of differences there are. And you will indeed have to run the common tests in both directions for both tables. Which brings us to one my others questions: Do you really need the two tables? Couldn't the backward behavior of your "bidi" table be combined with the forward behavior of the original table? Is the forward part of the "bidi" table really important to have? It's just a naive question. Maybe the tables are so different that they are not compatible at all? I don't know. I need to understand better how exactly the tables differ. Let's assume for a moment the answer to the above question is yes (we do really need both directions of the new table). Couldn't the backward behaviors of the main and "bidi" tables be aligned? Some back-translation is always better than a back-translation that is not working at all, right? |
|
I think first and foremost we need a summary of things where the two tables differ. |
|
Yes, the forward part is indeed important. It creates Braille that can be back-translated, whereas the Braille from the original tables can’t be back-translated. There are some distinct differences of which I have already mentioned the marking of capital letters and the processing of accented letters.
If the caller could set a flag, which could then be checked by various lines in the tables, then we could probably merge the two sets (#ifdef someFlag/ #ifndef someflag / #endif), but I don’t see how this could currently be done.
The whole thing about the different flavours of Braille within the same Braille code and why some flavours are more back-translatable than others is all about the different way we use Braille, i.e. reading books (possibly on paper) vs. reading and writing documents with a screen reader or on a Braille note-taker. It is a discussion which is probably appropriate to most languages, except where they have deliberately changed the Braille code to make back-translation easier, e.g. UEB.
If you are interested, I will be happy to try to explain it to you in more details, but it will probably be easier over Skype than through a PR.
|
|
So what about the second part of my question: Could the backward behavior of the main table be the same as that from the bidi table? Maybe I just don't know enough details, but from my vague understanding of it it sounds like both variants of the braille are not incompatible. One just contains more information than the other (like information of capitals) that make the back-translation better. If you want to do a Skype call to discuss this PR, that is fine for me. However I think we need some written explanation of the table anyway. It's interesting for me, but I'm not the only person who needs to know. It's also and primarily the original author of the table (Christian's colleague), and other German braille people, that need to understand. |
|
Regarding the tests, I gave it another thought: for dictionary tests it doesn't matter that there are a lot of files with possibly duplication. They are not meant as documentation. But it would be nice to have a YAML file that explains the differences between the main German braille code and the variant optimized for back-translation. |
|
So what about the second part of my question: Could the backward behavior of the main table be the same as that from the bidi table?
Maybe I just don't know enough details, but from my vague understanding of it it sounds like both variants of the braille are not incompatible. One just contains > more information than the other (like information of capitals) that make the back-translation better.
Yes, you are right. A great part of the work is simply making the tables produce Braille which contains as much information as possible while still following the rules.
I am not quite sure, but the original tables could probably be made to perform the same back-translation as the BD tables. In fact, they would probably perform better now than before anyway, because I have changed the order of character definitions, which are common to the two sets of tables.
On the other hand, as far as I remember, the original tables have some weird work-around to get rid of unwanted capsigns. That might have to be re-worked quite a bit for proper back-translation within the same table.
I could describe the differences in forward translation between the two table sets within the bd tables themselves., provided that back-translation is simply not defined for the original tables. However, describing the whole philosophy of why you would want two sets of tables? That is somewhat harder to describe in few words. I guess it is something, which is perhaps obvious to Braille users, but perhaps not so much to many others. That is why I suggested a Skype call. If I am to describe it, I need to understand what it is that is so difficult to understand about it, so to speak.
Also, such an explanation should probably not be hidden away in a specific table, but rather be in the manual in the section about back-translation. It could be relevant for work on back-translation in any language that has advanced grade 2 or grade 3, and where the Braille code is not specifically made for automatic back-translation.
Hope it makes sense.
|
Yes! That's exactly what I'm after.
It is not so difficult to understand at all. I'm just asking a lot of questions (sometimes deliberately naive) to make sure that we are doing it the best possible way. For example I think whether we should try to handle both variants of the braille code within the main table (or both tables) when back-translating is a valid question. If it is indeed doable it would be a major improvement. You wouldn't need to figure out which table to select, you can just pick one and it would work. Anyway, I'm not asking you to describe the whole philosophy of the two sets of table. I think it would indeed be a valuable addition, and useful for developing other tables, but for now all I'm asking for is a proper description of the behavior. |
|
By the way can I push the cleaned up branch that I have locally or not? I want to be able to look at the combined diff on Github, that is currently not possible. |
|
Yes, of course you can.
|
and make it clear in the comments that the table is unofficial and experimental.
which makes more sense as an abbreviation of "bidirectional".
bf4e0ed to
aa108a5
Compare
Still preliminary work. G0 is about ready for testing in a wider context. G1 should be relatively easy to add.