Skip to content

bug: .tf / .tfvars / .hcl extensions missing from CODE_EXTENSIONS — Terraform repos invisible to code sync #878

@johnybradshaw

Description

@johnybradshaw

Summary

.tf, .tfvars, and .hcl files are skipped by gbrain sync --strategy code because none of the HCL extensions are in CODE_EXTENSIONS at src/core/sync.ts:46-75. Terraform repos (and any other HCL-based infra tooling — Packer, Vault, Nomad, Consul) have the vast majority of their app surface in .tf / .tfvars; gbrain currently treats them as invisible.

Repro

$ find . -name '*.tf' ! -path '*/.terraform/*' | wc -l
163
$ find . -name '*.tfvars' ! -path '*/.terraform/*' | wc -l
51

$ gbrain sync --source <id> --strategy code
[gbrain phase] import.collect_files done 4ms files=6
Found 6 code files                # 0 .tf, 0 .tfvars
…
$ gbrain sources list --json | jq '.sources[] | select(.id=="<id>") | .page_count'
7                                  # 6 .json + 1 .py only — entire IaC body skipped

Root cause

src/core/sync.ts:46-75 declares the allowlist:

const CODE_EXTENSIONS = new Set<string>([
  '.ts', '.tsx', '.mts', '.cts',
  '.js', '.jsx', '.mjs', '.cjs',
  '.py', '.rb', '.go', '.rs', '.java', '.cs',
  '.cpp', '.cc', '.cxx', '.hpp', '.hxx', '.hh',
  '.c', '.h',
  '.php', '.swift', '.kt', '.kts', '.scala', '.sc',
  '.lua', '.ex', '.exs', '.elm', '.ml', '.mli',
  '.dart', '.zig', '.sol',
  '.sh', '.bash',
  '.css', '.html', '.htm',
  '.vue',
  '.json', '.yaml', '.yml', '.toml',
]);

Note .toml and .yaml are included as config-file extensions. .tf / .tfvars are the structural equivalent for Terraform: a declarative config language with named blocks, variables, resource references, and module imports — same indexability profile as YAML/TOML and same need for code-aware chunking.

The comment at line 35-44 says this allowlist is "kept as-is for now for isAllowedByStrategy fast-path + tests" while detectCodeLanguage is the long-term source of truth — so HCL may also need adding to whatever map detectCodeLanguage uses, depending on which one wins at runtime.

Suggested fix

Add HCL extensions to CODE_EXTENSIONS:

'.toml', '.hcl', '.tf', '.tfvars',

A first-party tree-sitter grammar exists (tree-sitter-hcl) and covers both .tf and .hcl. If the chunker has no HCL parser yet, the existing config-style fallback that handles .toml should chunk .tf / .tfvars reasonably — they're block-structured and line-oriented, well-suited to the recursive chunker.

Why it matters

Terraform is the dominant IaC tool in the Cloudflare / AWS / Azure ecosystem, and HCL is also the config language for Packer, Vault, Nomad, Consul, and Waypoint. A typical infra repo is 80-95% .tf / .tfvars by file count. Without HCL support, gbrain code-def / code-refs / code-callers return empty for every module, resource, and variable in the repo — exactly the questions an SRE / platform engineer would want to ask the brain ("where is azurerm_kubernetes_cluster configured?", "what calls module aks?", "which env declares enable_auto_upgrade?").

This is the same pattern as #709 (.astro missing), just for the IaC ecosystem instead of the static-site one.

Environment

  • gbrain: v0.31.3 (commit 9c60b3a, master)
  • Bun: 1.3.11
  • Database: Akamai managed Postgres
  • Platform: Linux 6.8.0-111-generic x86_64
  • Repo: 163 .tf + 51 .tfvars + 20 .md + 9 .json — code sync indexed 7 of the 243 files

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions