Summary
.tf, .tfvars, and .hcl files are skipped by gbrain sync --strategy code because none of the HCL extensions are in CODE_EXTENSIONS at src/core/sync.ts:46-75. Terraform repos (and any other HCL-based infra tooling — Packer, Vault, Nomad, Consul) have the vast majority of their app surface in .tf / .tfvars; gbrain currently treats them as invisible.
Repro
$ find . -name '*.tf' ! -path '*/.terraform/*' | wc -l
163
$ find . -name '*.tfvars' ! -path '*/.terraform/*' | wc -l
51
$ gbrain sync --source <id> --strategy code
[gbrain phase] import.collect_files done 4ms files=6
Found 6 code files # 0 .tf, 0 .tfvars
…
$ gbrain sources list --json | jq '.sources[] | select(.id=="<id>") | .page_count'
7 # 6 .json + 1 .py only — entire IaC body skipped
Root cause
src/core/sync.ts:46-75 declares the allowlist:
const CODE_EXTENSIONS = new Set<string>([
'.ts', '.tsx', '.mts', '.cts',
'.js', '.jsx', '.mjs', '.cjs',
'.py', '.rb', '.go', '.rs', '.java', '.cs',
'.cpp', '.cc', '.cxx', '.hpp', '.hxx', '.hh',
'.c', '.h',
'.php', '.swift', '.kt', '.kts', '.scala', '.sc',
'.lua', '.ex', '.exs', '.elm', '.ml', '.mli',
'.dart', '.zig', '.sol',
'.sh', '.bash',
'.css', '.html', '.htm',
'.vue',
'.json', '.yaml', '.yml', '.toml',
]);
Note .toml and .yaml are included as config-file extensions. .tf / .tfvars are the structural equivalent for Terraform: a declarative config language with named blocks, variables, resource references, and module imports — same indexability profile as YAML/TOML and same need for code-aware chunking.
The comment at line 35-44 says this allowlist is "kept as-is for now for isAllowedByStrategy fast-path + tests" while detectCodeLanguage is the long-term source of truth — so HCL may also need adding to whatever map detectCodeLanguage uses, depending on which one wins at runtime.
Suggested fix
Add HCL extensions to CODE_EXTENSIONS:
'.toml', '.hcl', '.tf', '.tfvars',
A first-party tree-sitter grammar exists (tree-sitter-hcl) and covers both .tf and .hcl. If the chunker has no HCL parser yet, the existing config-style fallback that handles .toml should chunk .tf / .tfvars reasonably — they're block-structured and line-oriented, well-suited to the recursive chunker.
Why it matters
Terraform is the dominant IaC tool in the Cloudflare / AWS / Azure ecosystem, and HCL is also the config language for Packer, Vault, Nomad, Consul, and Waypoint. A typical infra repo is 80-95% .tf / .tfvars by file count. Without HCL support, gbrain code-def / code-refs / code-callers return empty for every module, resource, and variable in the repo — exactly the questions an SRE / platform engineer would want to ask the brain ("where is azurerm_kubernetes_cluster configured?", "what calls module aks?", "which env declares enable_auto_upgrade?").
This is the same pattern as #709 (.astro missing), just for the IaC ecosystem instead of the static-site one.
Environment
- gbrain: v0.31.3 (commit 9c60b3a, master)
- Bun: 1.3.11
- Database: Akamai managed Postgres
- Platform: Linux 6.8.0-111-generic x86_64
- Repo: 163
.tf + 51 .tfvars + 20 .md + 9 .json — code sync indexed 7 of the 243 files
Related
Summary
.tf,.tfvars, and.hclfiles are skipped bygbrain sync --strategy codebecause none of the HCL extensions are inCODE_EXTENSIONSatsrc/core/sync.ts:46-75. Terraform repos (and any other HCL-based infra tooling — Packer, Vault, Nomad, Consul) have the vast majority of their app surface in.tf/.tfvars; gbrain currently treats them as invisible.Repro
Root cause
src/core/sync.ts:46-75declares the allowlist:Note
.tomland.yamlare included as config-file extensions..tf/.tfvarsare the structural equivalent for Terraform: a declarative config language with named blocks, variables, resource references, and module imports — same indexability profile as YAML/TOML and same need for code-aware chunking.The comment at line 35-44 says this allowlist is "kept as-is for now for
isAllowedByStrategyfast-path + tests" whiledetectCodeLanguageis the long-term source of truth — so HCL may also need adding to whatever mapdetectCodeLanguageuses, depending on which one wins at runtime.Suggested fix
Add HCL extensions to
CODE_EXTENSIONS:A first-party tree-sitter grammar exists (
tree-sitter-hcl) and covers both.tfand.hcl. If the chunker has no HCL parser yet, the existing config-style fallback that handles.tomlshould chunk.tf/.tfvarsreasonably — they're block-structured and line-oriented, well-suited to the recursive chunker.Why it matters
Terraform is the dominant IaC tool in the Cloudflare / AWS / Azure ecosystem, and HCL is also the config language for Packer, Vault, Nomad, Consul, and Waypoint. A typical infra repo is 80-95%
.tf/.tfvarsby file count. Without HCL support,gbrain code-def/code-refs/code-callersreturn empty for every module, resource, and variable in the repo — exactly the questions an SRE / platform engineer would want to ask the brain ("where isazurerm_kubernetes_clusterconfigured?", "what calls moduleaks?", "which env declaresenable_auto_upgrade?").This is the same pattern as #709 (
.astromissing), just for the IaC ecosystem instead of the static-site one.Environment
.tf+ 51.tfvars+ 20.md+ 9.json— code sync indexed 7 of the 243 filesRelated
.astromissing) — exact same shape, different ecosystem. Filing separately per the maintainer's labeling cadence; this one is small enough to be a good-first-issue candidate.--strategy codeignored) — broader code-indexing umbrella.