Skip to content

deriveCodeSourceId in gstack-gbrain-sync.ts produces invalid gbrain source IDs for github.com remotes #1320

@kengwei

Description

@kengwei

Summary

/sync-gbrain --full against a GitHub-hosted repo fails at the code stage because deriveCodeSourceId generates source IDs that violate gbrain's [a-z0-9-]{1,32} validator.

Command / context

  • gstack: v1.26.3.0
  • gbrain: v0.27.0 (Postgres engine, Supabase pooler)
  • OS: macOS Darwin 25.4.0
  • Repo: https://github.com/kengwei/unerase
  • Invocation: /sync-gbrain --full (Claude Code skill)

Observed

$ bun run ~/.claude/skills/gstack/bin/gstack-gbrain-sync.ts --full

[gbrain-sync] mode=full engine=unknown

gstack-gbrain-sync (full):
  ERR   code         source registration failed: gbrain sources add gstack-code-github.com-kengwei-unerase failed: Invalid source id "gstack-code-github.com-kengwei-unerase". Must be 1-32 lowercase alnum chars with optional interior hyphens (e.g. "wiki", "yc-media").
 (0.7s)
  OK    memory       ingest pass complete (29.7s)
  OK    brain-sync   curated artifacts pushed (0.2s)

  2 ok, 1 error, 0 skipped

Memory + brain-sync stages run fine. Only the code stage fails, so the source never gets registered and code indexing for the repo never happens.

Root cause

deriveCodeSourceId() in ~/.claude/skills/gstack/bin/gstack-gbrain-sync.ts (around line 170 in v1.26.3.0):

function deriveCodeSourceId(repoPath: string): string {
  const remote = canonicalizeRemote(originUrl());
  if (remote) {
    return `gstack-code-${remote.replace(/[\/\s]+/g, "-").replace(/-+/g, "-")}`;
  }
  // fallback...
}

For https://github.com/kengwei/unerase.git:

  • canonicalizeRemote()github.com/kengwei/unerase
  • Regex /[\/\s]+/g replaces slashes and whitespace but leaves dots
  • Result: gstack-code-github.com-kengwei-unerase — 38 chars AND contains .

gbrain's sources add validator requires [a-z0-9-]{1,32} so this fails. Affects essentially every github.com/<org>/<repo> remote.

Expected

deriveCodeSourceId should always produce IDs that satisfy gbrain's [a-z0-9-]{1,32} validator, regardless of the remote URL shape.

Suggested fix

Two viable approaches:

// Option A: include dots in the replace, then truncate
return `gstack-code-${remote.replace(/[\/.\s]+/g, "-").replace(/-+/g, "-")}`.slice(0, 32);

// Option B: strip the host, use org-repo only
const orgRepo = remote.split("/").slice(-2).join("-");  // "kengwei-unerase"
return `gstack-code-${orgRepo}`.slice(0, 32);

Option B is more readable and produces stable IDs across hosts (gitlab.com/kengwei/unerase and github.com/kengwei/unerase would collide, but that's vanishingly rare in practice). Option A preserves the host distinction.

Either option should still wrap with a final length-bound or hash-truncate to handle pathological long org/repo names.

The fallback branch for repos without an origin (line ~174) already does .replace(/[^a-z0-9-]+/g, "-") correctly — that pattern would work as a model for the with-remote branch too.

Workaround

Register the source manually with a short ID:

gbrain sources add gstack-code-<repo-basename> --path "$(pwd)" --federated
gbrain sync --source gstack-code-<repo-basename>

Related

Filed alongside garrytan/gbrain#457 (gbrain's first-sync path doesn't honor --strategy code even when the source ID is valid). The two bugs stack: gstack-side ID generator fails first; if it didn't, the gbrain-side first-sync would still skip code files. End-user impact is the same in either case (no code indexing).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions