Skip to content

Mising words due to parse errors #73

@SimonTeixidor

Description

@SimonTeixidor

I believe that all words following "Roller bearing" from the CIDE.R source file are missing from the resulting dictionary. See "Rut", "Ruta-baga", etc.

I ran a git bisect, and it appears that the breaking change was introduced in 3375fe6. I tried to read the changes introduced there but I haven't been able to figure the issue out yet.

There's a similar issue for words following "Stooge", such as "Sweet". Here the issue seems to be that the source data is missing a closing </p> tag for the "Stooge" entry. I guess this should be reported upstream to GCIDE, but perhaps we could make the parser more robust against things like that?

Given that I just stumbled upon some examples, I suspect that there are quite a few words missing. I wonder if we could come up with an automated way to verify that the resulting dictionary contains all words from the source files?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions