Skip to content

feat(linux-do): refactor adapters with unified feed, tags, user commands#434

Merged
jackwener merged 5 commits intojackwener:mainfrom
tiaot33:feat/linux-do-adapters
Mar 25, 2026
Merged

feat(linux-do): refactor adapters with unified feed, tags, user commands#434
jackwener merged 5 commits intojackwener:mainfrom
tiaot33:feat/linux-do-adapters

Conversation

@tiaot33
Copy link
Copy Markdown
Contributor

@tiaot33 tiaot33 commented Mar 25, 2026

  • Replace hot/latest/category with unified feed command (tag/category/view routing)
  • Add tags, user-topics, user-posts commands
  • Add static data files for categories and tags lookup
  • Fix error handling: use CliError subclasses instead of raw Error
  • Fix Discourse API field mapping in search (tags, created)
  • Add strategy: cookie to all YAML adapters
  • Update docs and README command listings
  • Update E2E tests for new command signatures

Description

Refactor all linux-do (Discourse) adapters: consolidate commands, fix API issues, and add new capabilities.

Related issue:

Type of Change

  • ✨ New feature
  • 🐛 Bug fix
  • 📝 Documentation

Changes

New commands:

  • feed — unified topic listing replacing hot/latest/category (supports --view,--tag, --category filtering)
  • tags — list popular tags
  • user-topics — topics created by a user
  • user-posts — replies posted by a user

Bug fixes:

  • Fix tag URL path (/tag/{id}-tag/{id} instead of incorrect /tags/{id}-slug/{id})
  • Fix search.yaml referencing non-existent Discourse search API fields (views, likes)
  • Fix search.yaml tags showing [object Object] (Discourse returns tag objects, not
    strings)
  • Replace raw throw new Error with ArgumentError / CommandExecutionError /AuthRequiredError
  • Add missing strategy: cookie to all 6 YAML adapters

Static data:

  • categories.data.ts — 50 categories with id/name/slug/parentCategoryId
  • tags.data.ts — full tag dataset for offline name/slug/id resolution

Docs & tests:

  • Rewrite docs/adapters/browser/linux-do.md with complete usage examples
  • Update command listings in README.md, README.zh-CN.md, docs/adapters/index.md
  • Update E2E tests in browser-auth.test.ts (7 tests covering all 7 commands)

Checklist

  • I ran the checks relevant to this PR
  • I updated tests or docs if needed
  • I included output or screenshots when useful

Documentation (if adding/modifying an adapter)

  • Added doc page under docs/adapters/ (if new adapter)
  • Updated docs/adapters/index.md table (if new adapter)
  • Updated sidebar in docs/.vitepress/config.mts (if new adapter)
  • Updated README.md / README.zh-CN.md when command discoverability changed
  • Used positional args for the command's primary subject unless a named flag is clearly better
  • Normalized expected adapter failures to CliError subclasses instead of raw Error

- Replace hot/latest/category with unified `feed` command (tag/category/view routing)
- Add `tags`, `user-topics`, `user-posts` commands
- Add static data files for categories and tags lookup
- Fix error handling: use CliError subclasses instead of raw Error
- Fix Discourse API field mapping in search (tags, created)
- Add strategy: cookie to all YAML adapters
- Update docs and README command listings
- Update E2E tests for new command signatures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Astro-Han
Copy link
Copy Markdown
Contributor

Thanks for the effort consolidating the linux-do adapters! Found a few issues during review:

Critical

1. Tag URL construction is broken

feed.ts builds tag URLs as ${id}-tag/${id} (e.g. /tag/3-tag/3.json), which returns 403. Discourse tag routes are slug-based: /tag/<slug>.json. Verified: /tag/chatgpt.json → 200, /tag/3-tag/3.json → 403.

2. user-posts.yamlfilter param misused

The filter parameter in Discourse user_actions.json is an action type enum (e.g. 5 = replies), not a limit. Current mapping passes limit value as filter, returning wrong data or empty results.

3. user-posts.yamlstrip() is undefined

strip() is not a JavaScript built-in. This will throw ReferenceError at runtime.

Warnings

4. "Replies" count semantic regression

Old adapters used (posts_count || 1) - 1 to show reply count (excluding OP). New code returns raw posts_count, silently inflating the number by 1.

5. 4,300 lines of hardcoded static data is unnecessary

tags.data.ts (3,958 lines) and categories.data.ts (365 lines) account for 87% of this PR. This data will go stale, and it contradicts the live API calls in categories.yaml and tags.yaml. A user can discover a new tag via linux-do tags, but feed --tag will reject it because it's not in the local snapshot.

The right approach is to resolve tag/category slugs at runtime via the Discourse API. This eliminates both data files and simplifies feed.ts significantly.

6. Breaking changes without migration path

hot, latest, category are deleted with no aliases or deprecation notice. Existing scripts will break silently.

@Astro-Han
Copy link
Copy Markdown
Contributor

One more design concern:

The refactor adds complexity without clear benefit

The original hot, latest, category commands are each ~40-50 lines of self-contained YAML — about 150 lines total. They're intuitive and require zero documentation:

opencli linux-do hot
opencli linux-do latest
opencli linux-do category 开发调优

The refactored version replaces them with:

opencli linux-do feed --view hot
opencli linux-do feed --view latest
opencli linux-do feed --category 开发调优 --tag ChatGPT

This trades three obvious commands for one overloaded command that requires users to learn --view, --tag, --category flags. The word "feed" itself is ambiguous — hot and latest are immediately clear.

The only justification for merging would be significant shared logic, but the original YAMLs had almost no duplication. After merging, feed.ts alone is 270 lines + 4,300 lines of static data — a net increase in complexity by an order of magnitude.

@tiaot33
Copy link
Copy Markdown
Contributor Author

tiaot33 commented Mar 25, 2026

  1. 标签 URL 在我这是正常工作的
image image 5. tag列表与分类列表基本不会变。不然每次根据分类获取都需要查询分类,然后分别输入 分类slug 和分类id,这样更不利于ai调用

@Astro-Han
Copy link
Copy Markdown
Contributor

Astro-Han commented Mar 25, 2026

感谢回复!跟进几点:

关于 tag URL — Discourse 标准 tag 路由是 /tag/<slug>.json(基于 slug)。${id}-tag/${id} 这种拼法在登录状态下可能正常,但能否确认未登录时也能访问?

关于静态数据 — "tag 基本不会变"和这个 PR 本身包含一个 tags.yaml 实时查询命令是矛盾的。如果 tag 不变,为什么需要实时列表命令?如果会变(linux.do 用户可以创建新 tag),静态快照就会拒绝合法的新 tag。

对于 AI 调用场景,运行时调一次 API 把 tag 名解析为 slug,比维护一个 4000 行的静态文件更简单可靠。

关于整体规模 — 仓库中现有的适配器大多都在 1000 行代码以内(除 twitter / xiaohongshu / douyin / boss / youtube 等热门站点外),这个 PR 为单个站点引入了近 5000 行代码(其中 87% 是静态数据)。之前的 hot/latest/category 三个 YAML 加起来不到 150 行,功能清晰。重构后复杂度增加了一个数量级,但用户体验没有对应的提升。

其余问题尚未回应:

  • user-posts.yamllimit 传给了 filter(这是 action type 枚举,不是分页参数)
  • strip() 不是 JavaScript 内置方法
  • 回复数语义变化(posts_count vs posts_count - 1
  • 删除 hot/latest/category 没有兼容过渡

@tiaot33
Copy link
Copy Markdown
Contributor Author

tiaot33 commented Mar 25, 2026

关于 tag URL

.json在linux.do就是需要过cf的,必须要浏览器环境。在无痕浏览器我也试了,是可以的

顺便一提这个接口是有rss接口对应的,rss不需要过cf,但是我看之前的代码已经是cookies状态需要登录,所以就选择了.json,这样可以获取到登录才能看到的权限贴信息

关于静态数据

tag部分

tag去掉静态数据倒是没什么。

分类部分

但是分类有个二级分类,第一次获取分类列表之后,只得到了子分类(例如开发调优, Lv1)的id,要实现按二级分类查询需要再查询 https://linux.do/categories.json?parent_category_id=4 。页面上也是选择一级分类之后才能选二级分类。

我提交的categories.yaml默认是获取一级分类,二级分类需要 --subcategories 参数开启。

二级分类可以监控 linux.do 论坛里比较重要的三级贴

没有走查询接口就得不到 分类slug ,这样就无法按二级分类查询。走categories.yaml开启 --subcategories 参数执行的网络请求,不太符合用户正常的操作

这个不知道怎么处理好

其余问题

  1. 以下三个问题已经在本地修改,暂时还没提交
  • user-posts.yaml 把 limit 传给了 filter(这是 action type 枚举,不是分页参数)
  • strip() 不是 JavaScript 内置方法
  • 回复数语义变化(posts_count vs posts_count - 1)
  1. 关于删除 hot/latest/category 没有兼容过渡的问题,我觉得之前的 hot/latest/category 只能获取帖子名字,连url或者topic_id都没返回,应该是不可用于实际工作的,只能用来展示。
    如果需要,我可以重新把 hot/latest/category 恢复

@Astro-Han
Copy link
Copy Markdown
Contributor

感谢详细回复!逐条回应:

Tag URL — 理解了,无痕浏览器测试通过,没问题。

静态数据 — tag 部分同意去掉静态数据,分类的二级查询确实需要多步 API 调用,理解这个痛点。但硬编码 365 行 .ts 文件不是最优解,建议用运行时本地缓存

首次运行 feed --category 时自动 fetch /categories.json + 各一级分类的 ?parent_category_id=X,把完整的一级/二级分类树缓存到 ~/.opencli/cache/linux-do/categories.json。后续直接读本地文件,效果和硬编码一样快,但不会过期、不增加仓库体积。缓存设个 TTL(比如 7 天)自动刷新即可。这是 CLI 工具的标准做法(brew、npm、pip 都用类似模式)。

三个 bug — 期待推送修复后再看。

关于命令结构 — 项目中所有适配器都用扁平命令命名:hackernews topv2ex latestbilibili hot。没有适配器用 feed --view hot 这种统一命令 + flag 路由的模式。建议在原有 hot/latest/category 基础上改进(修 bug、补字段、加 --tag 过滤参数),新功能(tagsuser-topicsuser-posts)作为独立命令新增。这样保持项目一致性,也不需要 breaking change。

@jackwener jackwener merged commit e640462 into jackwener:main Mar 25, 2026
24 checks passed
@jackwener jackwener mentioned this pull request Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants