Skip to content

Locale & timezone faithfulness (non-English users + correct local time) #486

@jayzalowitz

Description

@jayzalowitz

Locale & timezone faithfulness: serve non-English users and correct local time

Context

The inbox-intelligence extractors lean on English by default — keyword classifiers,
commitment regexes ("I'll", "I will"), and an English-centric date library. A user who
writes and receives mail in another language falls through to near-empty results: no
commitments found, no deadlines, no security flags, a generic briefing. Separately,
relative deadlines ("by Friday", "end of day") can't resolve correctly because there
is no per-user timezone. This spec makes the pipeline faithful to who the user is:
their language and their clock. It unblocks the quality ceiling of specs 02/03/06 for
the non-English and non-UTC population.

Current State

Verified 2026-06-06.

  • packages/policy-prompts/prompts/briefing-prose/v1.md:30 — the prompt accepts
    {{language}}, but apps/worker/src/jobs/briefing-generator.ts:346 passes it
    hardcoded 'en', never from a user profile.
  • packages/db/src/schemas/schema.sql (users table, ~lines 8-16) has no
    language, locale, or timezone column
    (verified: grep for those names returns
    nothing on the users table).
  • packages/decision-engine/src/situation-interpreter.ts:107-192classifySituation
    matches English-only keywords (meeting, invoice, payment, deadline, etc.).
    No locale awareness anywhere in signal classification.
  • Planned specs lean English: spec 02's rule fallback regex, spec 06's security
    markers, and spec 03's suggested chrono-node are all English-first. The LLM
    strategies in those specs can handle other languages, but the deterministic
    fallbacks cannot — and nothing routes non-English users to the LLM path on purpose.

Proposed Change

  1. Add language and timezone to the user profile, populated from the connector
    identity where available (Google profiles expose locale and the calendar carries
    a timezone), with a safe default and a settings override.
  2. Wire the briefing language from the profile instead of the hardcoded 'en'.
  3. Make extraction locale-aware by routing, not by translating every rule. The
    honest, low-cost design: the LLM strategy is the multilingual path; the English
    regex/keyword fallbacks are explicitly one locale's fallback. When the content
    language is non-English AND only the deterministic fallback is available, the
    extractor MUST log that it is running degraded (English rules on non-English text)
    rather than silently returning empty — so coverage gaps are visible (feeds spec 13).
  4. Timezone feeds spec 03 so relative deadlines resolve against the user's clock,
    not UTC.

Synthetic illustration (no real data): a commitment "te lo envío mañana" or
"明日送ります" should be caught by the LLM commitment strategy; the English regex
fallback would miss it, and in that case the extractor records a degraded:locale
note instead of a false "no commitments."

Implementation Details

  1. Migration — add language STRING and timezone STRING to the users table
    (nullable; backfill null). A new packages/db/src/migrations/0NN-user-locale-tz.sql.
  2. Population — on connector auth / profile sync, capture Google locale
    language and the primary calendar's timezone → timezone. Fallbacks: language
    defaults to 'en', timezone defaults to UTC with a logged warning (never
    guess silently). A settings UI override is a thin add (out of scope to design here;
    the column + API field is enough).
  3. Briefing wiringbriefing-generator.ts:346 reads profile.language ?? 'en'.
  4. Locale-aware extraction contract — extend the extractor interface (specs
    02/03/06, all consuming spec 07's SignalText) with locale derived from
    SignalText (detected content language or the user's language). Strategy:
    • LLM strategy available → it handles the locale natively (it already does);
      pass locale so the prompt can be locale-appropriate.
    • Only deterministic fallback available AND locale ≠ supported-rule-locale
      run the fallback but emit a degraded: 'locale' marker on the result and log
      it. Do NOT silently return [] as if the content had no commitments/deadlines.
  5. Detection — use a lightweight language hint (the user's language, or a small
    script/charset heuristic on the body) to decide the route. Do not add a heavy
    language-detection dependency for v1; the user-profile language is the primary
    signal.
  6. Timezone to spec 03deadline-extractor reads timezone from the profile
    (replacing its UTC fallback) for end-of-day / weekday resolution.

Acceptance Criteria

  1. The users table has language and timezone columns; a new user populated from a
    non-English Google profile gets the right language and a non-UTC timezone.
  2. The briefing's {{language}} reflects the user's profile, not hardcoded 'en'
    (verified: a non-English profile yields a non-en value passed to the prompt).
  3. A non-English commitment/deadline/security fixture is correctly extracted via the
    LLM strategy (locale passed through).
  4. When only the deterministic fallback runs on non-English content, the result
    carries a degraded: 'locale' marker and a log line — it does NOT silently return
    empty.
  5. Relative deadlines resolve in the user's timezone (a "by Friday" fixture resolves
    differently for a UTC+9 vs UTC-8 profile).
  6. Missing timezone falls back to UTC WITH a logged warning (never a silent guess).
  7. English users are unaffected (no regression).
  8. Tests written and passing. No degradation of existing functionality.

Testing Plan

Layer What Count
Unit Profile population from connector locale/tz; defaults + warnings +3
Unit Briefing language sourced from profile, not hardcoded +2
Unit Extractor routing: LLM for non-English; fallback emits degraded +3
Unit Timezone resolution for relative deadlines across two zones +2
Integration Non-English fixture → commitments/deadlines extracted end-to-end +2
Migration Columns added; existing rows backfill null without error +1

Rollback Plan

Columns are additive/nullable. Briefing language read falls back to 'en' if the
column is null, so reverting is a no-op for existing users. The extraction routing is
flag-guarded (LOCALE_AWARE_EXTRACTION); off → current English-first behavior. No
destructive migration.

Effort Estimate

  • Migration + profile population from connectors: ~4h
  • Briefing language wiring: ~1h
  • Locale-aware routing + degraded marker + logging: ~4h
  • Timezone wiring into spec 03: ~2h
  • Tests: ~4h

Total: ~2 days.

Files Reference

File Change
packages/db/src/migrations/0NN-user-locale-tz.sql New: language, timezone columns
apps/worker/src/jobs/briefing-generator.ts:346 Read language from profile
packages/decision-engine/src/situation-interpreter.ts:107-192 Reference (English-keyword classifier)
extractors (specs 02/03/06) Accept locale; LLM-route non-English; mark degraded on fallback
connector auth / profile sync Capture locale + timezone

Out of Scope

  • Translating every keyword/regex set into N languages (LLM-first is the strategy;
    rule sets stay English as one locale's fallback).
  • A full settings UI for language/timezone (column + API field only here).
  • Heavy ML language detection (profile language is the primary route signal).
  • RTL / locale-specific UI layout (a separate design concern).

Related

  • Unblocks non-English quality for specs 02 (commitments), 03 (deadlines, also needs
    the timezone), 06 (security markers).
  • degraded: 'locale' markers feed spec 13's coverage transparency.
  • Consumes spec 07's SignalText (the locale travels on it).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions