Skip to content

Conversation

@acoulton
Copy link
Contributor

Previously, when generating snippets for undefined step definitions,
there was no guarantee that the pattern would actually match the step
text. In some cases (in particular with turnip) this proved to be an
issue when feature files contained complex or unexpected syntax:

  • The first run would apparently provide a valid snippet.
  • The second run would still not match this snippet, so would propose a
    new step with the same pattern.
  • The third run would fail due to the presence of duplicate patterns in
    the steps from the first and second run.

Instead, check that the generated snippet actually matches the step text
at the time of generating it. If not, show the user a warning with
information about either generating these as regex or manually defining
a step to match their features.

Fixes #1653

@acoulton acoulton force-pushed the bug-unsupported-pattern-generation branch from 5e97ae5 to 668abff Compare August 26, 2025 12:38
Comment on lines +9 to +11
// @todo: Add translations for other languages
'snippet_generation_failure_title' => 'Could not automatically generate snippets matching the following steps:',
'snippet_generation_failure_hint' => '(try using --snippets-type=regex, or manually define the step)',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're happy with the message in English I can generate some translations. I've made it two separate strings because these lines are rendered indented to match the style of the output above, and it feels like this indent should be the responsibility of the printer rather than the translation texts.

*
* @return false|array<int|string,string>
*/
public function matchPattern(string $pattern, string $stepText): false|array
Copy link
Contributor Author

@acoulton acoulton Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't 100% feel like this method belongs in a class called PatternTransformer but I felt it was an OK trade-off for being able to be certain that the ContextSnippetGenerator is using the exact same logic as the RepositorySearchEngine to check if the pattern matches the stepText.

PatternTransformer is a final class and used directly by RepositorySearchEngine and ContextSnippetGenerator so there's no BC issue with the new method.

$failures = $this->registry->getGenerationFailures();
count($failures) && $this->output->writeln('');

$this->printer->printSnippetGenerationFailures($failures);
Copy link
Contributor Author

@acoulton acoulton Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although there is a SnippetPrinter interface, which does not contain this new method, the SnippetsController depends directly on ConsoleSnippetPrinter so this is BC without any additional checks required.

The same is true for the new getGenerationFailures method called above - it's not in the SnippetRepository interface, but this class depends directly on ``SnippetRegistry` rather than the interface.

@carlos-granados
Copy link
Contributor

@acoulton I guess you would need to rebase and modify this PR once the PR to modify how we run the fixture files is merged. I would prefer to wait until this is done before I review this other PR, please let me know

@acoulton
Copy link
Contributor Author

Absolutely, I'll rebase and update & let you know when it's ready.

A step may contain literal punctuation that cannot be safely represented
in a turnip pattern - since there is no official strategy for escaping
the characters that turnip uses for placeholders etc.

Until now, we have generated snippets that do not actually match the
step text. This means these snippets are generated repeatedly and
ultimately then cause an exception on the next run due to the presence
of multiple snippets with the same pattern.
Previously, the RepositorySearchEngine asked the PatternTransformer
to convert each individual pattern to regex before itself calling
preg_match to attempt to match the pattern.

Instead, move the responsibility for the matching the pattern
(converting and testing it) to the PatternTransformer. This will make it
easier to reliably reuse the same logic to test that a step *actually*
matches the generated pattern when generating snippets.
Previously, when generating snippets for undefined step definitions,
there was no guarantee that the pattern would actually match the step
text. In some cases (in particular with turnip) this proved to be an
issue when feature files contained complex or unexpected syntax:

* The first run would apparently provide a valid snippet.
* The second run would still not match this snippet, so would propose a
  new step with the same pattern.
* The third run would fail due to the presence of duplicate patterns in
  the steps from the first and second run.

Instead, check that the generated snippet actually matches the step text
at the time of generating it. If not, show the user a warning with
information about either generating these as regex or manually defining
a step to match their features.
@acoulton acoulton force-pushed the bug-unsupported-pattern-generation branch from 668abff to a066b60 Compare August 29, 2025 07:38
@acoulton
Copy link
Contributor Author

@carlos-granados this is ready to review (the only outstanding thing for me is to generate translations of the messages if we're happy with the English versions)

Copy link
Contributor

@carlos-granados carlos-granados left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my small comment, this looks all ready to me, so please generate the translations in other languages and I think we should be able to go ahead with this

Using ChatGPT with the prompts:

> Please generate translations for the snippet_generation_failure_title
  and snippet_generation_failure_hint messages in the attached i18n.php
  file in all the configured languages

> Insert them directly into the PHP file

> Can you combine them with the existing messages that were already
  defined in the file?

https://chatgpt.com/share/68c007c9-0ee0-8013-b0e8-e64601262518
@acoulton
Copy link
Contributor Author

acoulton commented Sep 9, 2025

@carlos-granados that's this updated now. The translations look plausible to me (in character sets / languages I'm vaguely familiar with) but interested to see what you make of the Spanish!

i18n.php Outdated
'snippet_proposal_use' => 'No olvides estas %count% declaraciones de use:',
'snippet_missing_title' => '<snippet_undefined>Las plantillas para los siguientes pasos en <snippet_keyword>%count%</snippet_keyword> no fueron generadas (revisa tu configuración):</snippet_undefined>',
'snippet_generation_failure_title' => 'No se pudieron generar automáticamente fragmentos que coincidan con los siguientes pasos:',
'snippet_generation_failure_hint' => '(intente usar --snippets-type=regex o defina el paso manualmente)',
Copy link
Contributor

@carlos-granados carlos-granados Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Spanish there are two "voices" for the second-person singular: "tú" which is more informal and "usted" which is more formal. In all other Spanish texts we are using the informal "tú" version but this text uses the formal "usted" version, so it should be changed to
"intenta usar --snippets-type=regex o define el paso manualmente"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - fixed now. Impressed it got that close!

I guess that's like the French tu and vous? Interestingly we appear to be using vous (-ez) forms for French - someone must have decided to be more polite / formal to francophones 😂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In French "vous" is much more used than "usted" in Spanish, in French "tu" is reserved to very close people while in Spanish "tú" is much more widely used. This was different a century ago, where in Spanish "usted" was still the most common usage but has changed a lot since the 60s-70s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's great to add to my extremely basic knowledge of Spanish! :)

Copy link
Contributor

@carlos-granados carlos-granados left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, the texts in all languages seem to be correct, I added a small comment for the Spanish version

Copy link
Contributor

@carlos-granados carlos-granados left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @acoulton everything looking great now

@acoulton acoulton merged commit f846e4c into Behat:master Sep 10, 2025
19 checks passed
@acoulton acoulton deleted the bug-unsupported-pattern-generation branch September 10, 2025 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Behat suggests a snippet that already exists

3 participants