Add smart fill support for language fields#28
Add smart fill support for language fields#28Shantanugupta43 merged 3 commits intoShantanugupta43:mainfrom
Conversation
Shantanugupta43
left a comment
There was a problem hiding this comment.
Hey thanks for contributing. There are few changes the PR needs after that it would be ready for merge. Good work
| if (optionEntries.length > 0) { | ||
| const matched = optionEntries.find(option => | ||
| variations.some(variation => { | ||
| const normalizedVariation = normalizeCandidateValue(variation); | ||
| return option.normalized === normalizedVariation || | ||
| option.normalized.includes(normalizedVariation) || | ||
| normalizedVariation.includes(option.normalized); | ||
| }) |
There was a problem hiding this comment.
Wrong languages suggested for short locale codes (e.g. "en")
When the browser locale is very short (like "en"), the current matching logic checks if the text appears anywhere inside language names.
Because "en" appears inside many words (like Bengali, French, Slovenian), the system sometimes suggests the wrong language as the top result.
Fix
We avoid using very short locale codes (2 characters) for substring matching, or require word-level matching instead. This prevents false matches and ensures English users actually see English as the top suggestion.
There was a problem hiding this comment.
Updated this in 5cd32b2. Short locale codes like en no longer use substring matching, so they won't incorrectly match language names such as Bengali, French, or Slovenian.
| if (/(website|portfolio|personal[_\s-]?site|homepage|url|link)/.test(combined)) return 'website'; | ||
| if (/(years[_\s]?of[_\s]?exp|experience[_\s]?years|yoe)/.test(combined)) return 'experience_years'; | ||
| if (/(skill|expertise|technology|tech[_\s]?stack|languages|tools)/.test(combined)) return 'skills'; | ||
| if (/(preferred[_\s-]?language|spoken[_\s-]?language|languages?)/.test(combined)) return 'languages'; |
There was a problem hiding this comment.
Fix overly broad language field detection in _classifyField
Problem
The regex used to detect language-related fields was too broad:
/(preferred[_\s-]?language|spoken[_\s-]?language|languages?)/
The languages? part matches any field containing the word "language", including unrelated fields such as:
coding_language
query_language
body_language
language_style
primary_language
These fields are common in developer tools, CMS editors, and technical forms. Because of the broad match, they were incorrectly classified as spoken-language inputs and routed to the language-picker autofill instead of normal AI suggestions.
Fix
Restrict matching to specific spoken-language patterns and ensure "language" only matches when used as a standalone field name.
For example updated regex could be:
/(preferred[_\s-]?language|spoken[_\s-]?language|^languages?$|native[_\s-]?language)/
Result
- Prevents incorrect classification of technical fields like coding_language
- Keeps smart autofill focused on actual spoken-language inputs
- Aligns _classifyField logic with the more precise keyword strategy already used in content-script
- Reduces false positives in developer tools, CMS platforms, and form builders
There was a problem hiding this comment.
Updated this in 5cd32b2 as well. Spoken-language detection is now restricted to explicit spoken-language patterns or standalone language fields, so technical fields like coding_language, query_language, and primary_language no longer get classified as spoken-language inputs.
|
Updated this. Short locale codes like \en\ no longer use substring matching, and spoken-language detection is now narrowed to avoid classifying technical language fields like \coding_language\ and \primary_language\ incorrectly. |
|
Will review tomorrow |
There was a problem hiding this comment.
Hey, great work on Issue 1 that's fully resolved.
For Issue 2, the short locale code guard in matchesLanguageOption() only protects the content-script path.
There's still a gap in form-detector.js when Intl.DisplayNames is unavailable or returns null, _detectLanguages() falls back to the raw locale string (e.g. "en", "fr") and passes it directly as a candidate, bypassing the 2-char protection entirely.
Fix needed in _detectLanguages() — after mapping, filter out any result that's still a raw 2-char code:
javascript.map(locale => {
const code = locale.split('-')[0];
const displayName = displayNames?.of(code);
if (!displayName || /^[a-z]{2}$/i.test(displayName)) return null;
return displayName;
})
.filter(Boolean)
Small change, but without it the form-detector.js path is still vulnerable to the same bug Issue 2 was meant to fix.
After this I will merge your PR.
There was a problem hiding this comment.
Updated this. _detectLanguages() now drops raw 2-letter locale-code fallbacks when Intl.DisplayNames is unavailable or returns no display name, so the form-detector path no longer suggests values like en or fr.
Shantanugupta43
left a comment
There was a problem hiding this comment.
LGTM thanks @terminalchai
Summary
Adds support for more README-requested form field types and improves smart fill for language fields.
Changes
Validation
ode --check on:
Notes
This follows the project README contribution direction around expanding form field coverage.