Skip to content

feat(web-search): default to Bing, drop Mojeek, dashboard engine switcher#1558

Merged
esengine merged 2 commits into
mainfrom
feat/bing-default-drop-mojeek
May 22, 2026
Merged

feat(web-search): default to Bing, drop Mojeek, dashboard engine switcher#1558
esengine merged 2 commits into
mainfrom
feat/bing-default-drop-mojeek

Conversation

@esengine

Copy link
Copy Markdown
Owner

Summary

Mojeek's IP-geolocation reputation system 403s most CN-residential clients on the very first request, leaving the default web_search tool unusable for the bulk of users. Swap the default to cn.bing.com, which:

  • Serves clean HTML from CN with no proxy required
  • Returns real URLs (not the /ck/a?u=a1<base64> click-tracking wrappers www.bing.com adds)
  • Has no per-IP daily quota
  • Live probes across three queries × two endpoints returned 200 with zero anti-bot markers

Mojeek is removed rather than kept as an option — the engine list stays short, and any user who still wants a Mojeek-style backend can run SearXNG locally (/se searxng).

Changes

  • src/tools/web.ts — new searchBing + parseBingResults scrapes <li class=\"b_algo\"> blocks (h2 a[href] + div.b_caption p) from cn.bing.com. searchMojeek / parseMojeekResults / MOJEEK_ENDPOINT removed. Dispatcher fallback is now Bing.
  • src/config.tswebSearchEngine() loader returns "bing" for unset / unknown / legacy \"mojeek\" values. Read-only: we don't rewrite config.json, so a user who explicitly types /se mojeek later sees an immediate "invalid engine" reject — no silent revert loop next launch.
  • src/cli/ui/slash/handlers/web-search-engine.ts + commands.ts — "mojeek" removed from validation set + argCompleter; usage hint now leads with bing.
  • Dashboard settings panel — new <select> under Defaults with all six valid backends; POST /api/settings validates the new webSearchEngine field, persists to config, includes it in the GET response + appliesAt hints.
  • i18nwebErrors.mojeek*bing*; usageMojeekusageBing; engine-list strings in error suggestions updated. Both EN and zh-CN.

Test plan

  • parseBingResults against real cn.bing.com HTML fixture (captured live 2026-05), empty input, h2-less block skip
  • webSearch dispatcher hits cn.bing.com with browser UA and the right query string
  • webSearchEngine config loader: enum round-trip, default = bing, legacy "mojeek" value reads back as bing (read-only migration)
  • public-api.test.ts updated for the renamed export
  • npm run verify passes (3548 tests)

…rd switcher

Mojeek's IP-geolocation reputation system 403s most CN-residential
clients on the very first request, leaving the default `web_search`
tool unusable for the bulk of this project's users. cn.bing.com serves
clean HTML from CN without a proxy, returns real URLs (not the
`/ck/a?u=a1<base64>` click-tracking wrappers the international `www.bing.com`
uses), and has no per-IP daily quota. Live probes across three queries
× two endpoints returned 200 with zero anti-bot markers.

- New `searchBing` + `parseBingResults` scrapes `<li class="b_algo">`
  blocks (h2 a[href] + div.b_caption p) from cn.bing.com.
- `webSearch` dispatcher's fallback is now Bing; `webSearchEngine()`
  loader returns "bing" for unset / unknown / legacy "mojeek" values.
- Legacy migration is read-only — we don't rewrite the user's
  config.json. `/search-engine mojeek` is no longer in the slash
  validation set, so users who explicitly retry the dead name see an
  immediate "invalid engine" reject instead of a silent revert loop
  on the next launch.
- Dashboard settings panel gets a Search Engine `<select>` with the
  six valid backends; POST /api/settings validates the new
  `webSearchEngine` field, persists to config, and includes it in the
  GET response + `appliesAt` hints.
- searchMojeek/parseMojeekResults/MOJEEK_ENDPOINT removed; i18n
  webErrors.mojeek* renamed to bing*; public-API surface swaps
  parseMojeekResults for parseBingResults.

Tests:
- parseBingResults: real cn.bing.com HTML fixture, empty input, h2-less
  block skip
- webSearch dispatcher hits cn.bing.com with browser UA
- webSearchEngine config loader: enum round-trip, default = bing, legacy
  "mojeek" value reads back as bing
Comment thread src/tools/web.ts Fixed
Comment thread src/tools/web.ts Fixed
Comment thread src/tools/web.ts Fixed
… scanner

CodeQL flagged three polynomial-backtracking patterns in the regex-based
parser. node-html-parser is already in the file (used by
parseSearxngHtmlResults); using its selector API eliminates the
backtracking class entirely without changing parse semantics.
@esengine esengine merged commit 65c0d64 into main May 22, 2026
4 checks passed
@esengine esengine deleted the feat/bing-default-drop-mojeek branch May 22, 2026 14:57
esengine added a commit that referenced this pull request May 22, 2026
…se (#1565)

* chore(release): 0.49.0 — static-history TUI, queued steers, Bing default, lifecycle plans

Headline themes:
- TUI: Static-history renderer is the only path; virtual-viewport layers removed (#1529 stages 1-4)
- Chat: queued mid-turn steer handling so input mid-render doesn't drop or fight the live frame (#1501)
- Web search: default switches to Bing; dashboard engine switcher; Mojeek dropped (#1558)
- Plans: lifecycle evidence summaries surface why a plan is ready to accept (#1500)
- Desktop: native OS notifications for approvals + completion (#1519)
- i18n: CLI command output (/mcp /sessions /prune /theme) + approval-prompt labels translated (#1524, #1560)
- Security: SSRF block in web_fetch (#1544), edit-snapshot path containment (#1454), shell redirect sandbox (#1457), Task integrity guardrail (#1516)
- Tools: per-turn dispatch-rate limit (#1356); run_command discourages shell-based edits (#1514)
- Client: DeepSeek 429 → concurrency-limit hint (#1526); timeoutMs honored with AbortSignal (#1535); --no-proxy opt-out for direct route (#1507)
- Files: read/edit/restore preserves source encoding (GB18030 / UTF-8 BOM) (#1518)
- Context: pinned constraints survive folds + full tail capture (#1515, #1552)
- Refactor: lifecycle risk policy extracted into its own module (#1557)

See CHANGELOG for the full list.

* fix(context): align fold summary prefix with main agent for cache reuse

The summarizer call was sending a bespoke "You compress conversation
history" system prompt and no tools, guaranteeing a 0% cache hit
against the main agent's just-cached prefix. Reshape the request so
system + tools + head bytes mirror the live agent's last call — the
only novel bytes are the trailing summarize instruction.

Skill-pin handling now collects bodies read-only instead of stubbing
mid-head, so the cache prefix stays unbroken. The summarize
instruction names pinned skills so the model knows not to paraphrase
their bodies (which we append verbatim regardless).

Measured on a real session at 48.7K prompt tokens:
  OLD shape: 0.0% cache hit  → $0.145 per fold
  NEW shape: 99.6% cache hit → $0.015 per fold
  saving: 89.6% per fold

* tools: add fold-cache shape + live benchmarks

bench-fold-cache-shape.mjs replays real session jsonls, simulates
OLD vs NEW summary-call shapes at the fold point, and reports
byte-level shared-prefix with the main agent's preceding request.
Pure local — no API required.

bench-fold-cache-live.mjs sends one priming + two summary calls to
DeepSeek and reports prompt_cache_hit_tokens / cost for each shape.
Used to confirm the shape change actually translates to API-side
cache hits.

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
esengine added a commit that referenced this pull request May 23, 2026
…1618)

The dashboard's <select> for switching web search backends shipped in
#1558 against the old Preact panel; the React port (#1418) dropped it
silently and the i18n strings have been dead code since. The desktop
settings panel never had it. Both surfaces now expose the same dropdown
under General → Behavior, alongside reasoning effort / edit mode /
budget — six engines (bing, searxng, metaso, tavily, perplexity, exa)
matching the `/search-engine` slash.

- protocol: `webSearchEngine` on SettingsEvent + SettingsPatch in both
  desktop and dashboard mirrors
- backend: `src/cli/commands/desktop.ts` settings_save persists the
  field via writeConfig; emitSettings reads it back via webSearchEngine()
- dashboard bridge: emitServerSettings forwards the field from
  `GET /api/settings` (server side was already wired by #1558)
- desktop i18n: copies the six already-translated strings that were
  living only in the dashboard bundle

API keys for metaso / tavily / perplexity / exa still go through the
slash command (or config.json) — UI-side key entry can come later if
demand shows up.

Co-authored-by: reasonix <reasonix@deepseek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants