Skip to content

Add smart fill support for language fields#28

Merged
Shantanugupta43 merged 3 commits intoShantanugupta43:mainfrom
terminalchai:feature/add-language-form-fill
Apr 4, 2026
Merged

Add smart fill support for language fields#28
Shantanugupta43 merged 3 commits intoShantanugupta43:mainfrom
terminalchai:feature/add-language-form-fill

Conversation

@terminalchai
Copy link
Copy Markdown
Contributor

Summary

Adds support for more README-requested form field types and improves smart fill for language fields.

Changes

  • classify languages, pronouns, and education separately instead of folding languages into skills
  • smart-fill language fields from browser language preferences
  • match browser languages against select and datalist options when available
  • keep form-field AI responses on the smart-fill UI path

Validation

  • ran
    ode --check on:
    • src/content/content-script.js
    • src/services/form-detector.js
    • src/services/groq-service.js

Notes

This follows the project README contribution direction around expanding form field coverage.

Copy link
Copy Markdown
Owner

@Shantanugupta43 Shantanugupta43 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey thanks for contributing. There are few changes the PR needs after that it would be ready for merge. Good work

Comment on lines +166 to +173
if (optionEntries.length > 0) {
const matched = optionEntries.find(option =>
variations.some(variation => {
const normalizedVariation = normalizeCandidateValue(variation);
return option.normalized === normalizedVariation ||
option.normalized.includes(normalizedVariation) ||
normalizedVariation.includes(option.normalized);
})
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong languages suggested for short locale codes (e.g. "en")

When the browser locale is very short (like "en"), the current matching logic checks if the text appears anywhere inside language names.

Because "en" appears inside many words (like Bengali, French, Slovenian), the system sometimes suggests the wrong language as the top result.

Fix
We avoid using very short locale codes (2 characters) for substring matching, or require word-level matching instead. This prevents false matches and ensures English users actually see English as the top suggestion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this in 5cd32b2. Short locale codes like en no longer use substring matching, so they won't incorrectly match language names such as Bengali, French, or Slovenian.

Comment thread src/services/form-detector.js Outdated
if (/(website|portfolio|personal[_\s-]?site|homepage|url|link)/.test(combined)) return 'website';
if (/(years[_\s]?of[_\s]?exp|experience[_\s]?years|yoe)/.test(combined)) return 'experience_years';
if (/(skill|expertise|technology|tech[_\s]?stack|languages|tools)/.test(combined)) return 'skills';
if (/(preferred[_\s-]?language|spoken[_\s-]?language|languages?)/.test(combined)) return 'languages';
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix overly broad language field detection in _classifyField

Problem

The regex used to detect language-related fields was too broad:

/(preferred[_\s-]?language|spoken[_\s-]?language|languages?)/

The languages? part matches any field containing the word "language", including unrelated fields such as:

coding_language
query_language
body_language
language_style
primary_language

These fields are common in developer tools, CMS editors, and technical forms. Because of the broad match, they were incorrectly classified as spoken-language inputs and routed to the language-picker autofill instead of normal AI suggestions.

Fix

Restrict matching to specific spoken-language patterns and ensure "language" only matches when used as a standalone field name.

For example updated regex could be:

/(preferred[_\s-]?language|spoken[_\s-]?language|^languages?$|native[_\s-]?language)/

Result

  • Prevents incorrect classification of technical fields like coding_language
  • Keeps smart autofill focused on actual spoken-language inputs
  • Aligns _classifyField logic with the more precise keyword strategy already used in content-script
  • Reduces false positives in developer tools, CMS platforms, and form builders

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this in 5cd32b2 as well. Spoken-language detection is now restricted to explicit spoken-language patterns or standalone language fields, so technical fields like coding_language, query_language, and primary_language no longer get classified as spoken-language inputs.

@terminalchai
Copy link
Copy Markdown
Contributor Author

Updated this. Short locale codes like \en\ no longer use substring matching, and spoken-language detection is now narrowed to avoid classifying technical language fields like \coding_language\ and \primary_language\ incorrectly.

@Shantanugupta43
Copy link
Copy Markdown
Owner

Shantanugupta43 commented Mar 30, 2026

Will review tomorrow

Comment thread src/services/form-detector.js Outdated
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, great work on Issue 1 that's fully resolved.
For Issue 2, the short locale code guard in matchesLanguageOption() only protects the content-script path.

There's still a gap in form-detector.js when Intl.DisplayNames is unavailable or returns null, _detectLanguages() falls back to the raw locale string (e.g. "en", "fr") and passes it directly as a candidate, bypassing the 2-char protection entirely.

Fix needed in _detectLanguages() — after mapping, filter out any result that's still a raw 2-char code:

javascript.map(locale => {
  const code = locale.split('-')[0];
  const displayName = displayNames?.of(code);
  if (!displayName || /^[a-z]{2}$/i.test(displayName)) return null;
  return displayName;
})
.filter(Boolean)

Small change, but without it the form-detector.js path is still vulnerable to the same bug Issue 2 was meant to fix.

After this I will merge your PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this. _detectLanguages() now drops raw 2-letter locale-code fallbacks when Intl.DisplayNames is unavailable or returns no display name, so the form-detector path no longer suggests values like en or fr.

Copy link
Copy Markdown
Owner

@Shantanugupta43 Shantanugupta43 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @terminalchai

@Shantanugupta43 Shantanugupta43 merged commit 20e7ecc into Shantanugupta43:main Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants