fix: typesense doc sync #11807

G0maa · 2025-11-27T08:10:26Z

Description of change

Currently typesnese index workflow fails a lot due to limited memory, this PR helps optimize memory usage a bit by:

Deleting old (failed) collections before syncing.
Deleting unused facets. Facets are used as parameters in search bar, we currently do not support that.

I've tested this on my own free-tier Typesense cloud deployment and it worked, but please note that this is not guaranteed to fix OUT_OF_MEMORY issues, we may eventually need to increase memory 😄

Pull-Request Checklist

Code is up-to-date with the master branch
This pull request links relevant issues as Fixes #00000
There are new or updated unit tests validating the change N\A
Documentation has been updated to reflect this change N\A

- Refactor workflow to delete temp collections. - Delete unsued facets from docsearch scraper config.

qodo-free-for-open-source-projects · 2025-11-27T08:10:45Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 Security concerns Sensitive information exposure: The script uses `TYPESENSE_API_KEY` from secrets, which is appropriate. However, curl commands with `-s` flag suppress errors but still output responses that might contain sensitive data in workflow logs. Consider using `-f` flag to fail on HTTP errors and redirect output to `/dev/null` for DELETE operations to minimize log exposure. Also, ensure that error messages from failed API calls don't leak the API key or other sensitive information in the workflow logs.
⚡ Recommended focus areas for review Error Handling The bash script lacks error handling for curl commands. If API requests fail (network issues, authentication problems), the script continues silently. Consider adding error checks after curl commands and using `set -e` to exit on errors. ALIAS_COLLECTION=$(curl -s -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/aliases/typeorm-docs" \ \| jq -r '.collection_name') if [ "$ALIAS_COLLECTION" = "null" ] \|\| [ -z "$ALIAS_COLLECTION" ]; then echo "Alias does not exist; skipping collection cleanup." exit 0 fi echo "Alias currently points to: $ALIAS_COLLECTION" COLLECTIONS=$(curl -s -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/collections" \ \| jq -r '.[].name') for col in $COLLECTIONS; do if [ "$col" != "$ALIAS_COLLECTION" ]; then echo "Deleting unaliased collection: $col" curl -s -X DELETE -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/collections/$col" fi Race Condition There's a potential race condition between fetching the alias collection and deleting collections. If the alias changes between the two operations, the currently aliased collection could be deleted. Consider re-checking the alias before each deletion. for col in $COLLECTIONS; do if [ "$col" != "$ALIAS_COLLECTION" ]; then echo "Deleting unaliased collection: $col" curl -s -X DELETE -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/collections/$col" fi done

qodo-free-for-open-source-projects · 2025-11-27T08:11:54Z

PR Code Suggestions ✨

Latest suggestions up to eecce94

Category	Suggestion	Impact
General	Add API error handling Add error handling for the `curl` command that fetches the alias information. The script should check the HTTP status code and exit if the API call fails. .github/workflows/docsearch.yml [22-24] -ALIAS_COLLECTION=$(curl -s -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ - "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/aliases/typeorm-docs" \ - \| jq -r '.collection_name') +ALIAS_RESPONSE=$(curl -s -w "\n%{http_code}" -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ + "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/aliases/typeorm-docs") +HTTP_CODE=$(echo "$ALIAS_RESPONSE" \| tail -n1) +ALIAS_COLLECTION=$(echo "$ALIAS_RESPONSE" \| head -n -1 \| jq -r '.collection_name') +if [ "$HTTP_CODE" -ne 200 ] && [ "$HTTP_CODE" -ne 404 ]; then + echo "Error: Failed to fetch alias information (HTTP $HTTP_CODE)" + exit 1 +fi + Apply / Chat Suggestion importance[1-10]: 8 __ Why: This is a valuable suggestion that adds crucial error handling for the initial API call. Failing to get the alias information correctly could lead to deleting the wrong collections, so exiting on failure makes the script much safer.	Medium
General	Add deletion error handling Add error handling to the `curl` DELETE request. Check the HTTP status code and log a warning if deleting a collection fails to prevent silent failures. .github/workflows/docsearch.yml [37-43] for col in $COLLECTIONS; do if [ "$col" != "$ALIAS_COLLECTION" ]; then echo "Deleting unaliased collection: $col" - curl -s -X DELETE -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ - "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/collections/$col" + RESPONSE=$(curl -s -w "\n%{http_code}" -X DELETE -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ + "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/collections/$col") + HTTP_CODE=$(echo "$RESPONSE" \| tail -n1) + if [ "$HTTP_CODE" -ne 200 ] && [ "$HTTP_CODE" -ne 404 ]; then + echo "Warning: Failed to delete collection $col (HTTP $HTTP_CODE)" + fi fi done Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out that the `curl` command silences errors, and adds robust error handling by checking the HTTP status code, which improves the reliability of the cleanup script.	Medium
More

Previous suggestions

Suggestions up to commit 17ab414

Category	Suggestion	Impact
Possible issue	Filter collections by name pattern Filter the list of collections to be deleted to only include those starting with the `typeorm-docs` prefix. This prevents the accidental deletion of unrelated collections that might exist in the same Typesense instance. .github/workflows/docsearch.yml [33-37] COLLECTIONS=$(curl -s -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/collections" \ - \| jq -r '.[].name') + \| jq -r '.[].name \| select(startswith("typeorm-docs"))') for col in $COLLECTIONS; do Suggestion importance[1-10]: 9 __ Why: This suggestion addresses a critical flaw where the script could delete unrelated collections in a shared Typesense instance. Filtering collections by a specific prefix (`typeorm-docs`) is a crucial safeguard against accidental data loss.	High
General	Add deletion error handling Add error handling to the `curl` command that deletes Typesense collections. Check the HTTP response code to ensure the deletion was successful and log an error if it fails. .github/workflows/docsearch.yml [37-43] for col in $COLLECTIONS; do if [ "$col" != "$ALIAS_COLLECTION" ]; then echo "Deleting unaliased collection: $col" - curl -s -X DELETE -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ - "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/collections/$col" + RESPONSE=$(curl -s -w "\n%{http_code}" -X DELETE -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ + "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/collections/$col") + HTTP_CODE=$(echo "$RESPONSE" \| tail -n1) + if [ "$HTTP_CODE" != "200" ]; then + echo "Failed to delete collection $col (HTTP $HTTP_CODE)" + fi fi done Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out that the `curl` command to delete collections lacks error handling, which could lead to silent failures. Adding a check for the HTTP status code makes the cleanup script more robust and easier to debug.	Medium
General	Add API error handling Add explicit error handling for the `curl` command that fetches the alias information. Check the HTTP status code and exit with an error if the API call fails for reasons other than a `404 Not Found`. .github/workflows/docsearch.yml [22-24] -ALIAS_COLLECTION=$(curl -s -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ - "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/aliases/typeorm-docs" \ - \| jq -r '.collection_name') +ALIAS_RESPONSE=$(curl -s -w "\n%{http_code}" -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ + "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/aliases/typeorm-docs") +HTTP_CODE=$(echo "$ALIAS_RESPONSE" \| tail -n1) +ALIAS_COLLECTION=$(echo "$ALIAS_RESPONSE" \| head -n-1 \| jq -r '.collection_name') +if [ "$HTTP_CODE" != "200" ] && [ "$HTTP_CODE" != "404" ]; then + echo "Failed to fetch alias information (HTTP $HTTP_CODE)" + exit 1 +fi + Suggestion importance[1-10]: 6 __ Why: The suggestion improves the script's robustness by adding explicit error handling for the API call that fetches the alias. While the existing code has a check that implicitly handles some failures, this change makes the error handling more explicit and reliable, causing the job to fail on unexpected API errors.	Low

Suggestions up to commit 16b9b5b

Category	Suggestion	Impact
Possible issue	Use a while loop for safer iteration Replace the `for` loop with a `while read` loop to safely iterate over collection names. This prevents potential issues if collection names contain spaces or special characters. .github/workflows/docsearch.yml [37-43] -for col in $COLLECTIONS; do - if [ "$col" != "$ALIAS_COLLECTION" ]; then +echo "$COLLEctions" \| while IFS= read -r col; do + if [ -n "$col" ] && [ "$col" != "$ALIAS_COLLECTION" ]; then echo "Deleting unaliased collection: $col" curl -s -X DELETE -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \ "$TYPESENSE_PROTOCOL://$TYPESENSE_HOST:$TYPESENSE_PORT/collections/$col" fi done Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a potential bug where collection names with spaces would be processed incorrectly due to shell word splitting, and proposes a more robust iteration method.	Medium

gioboa

Thanks @G0maa it looks fine to me. Let's try this step in the main pipeline. 💪

alumni

LGTM, but I'm not familiar with Typesense and our account setup. @dlhck and @naorpeled would be the best candidates to review :)

alumni · 2025-11-28T15:15:19Z

Nice @G0maa, seems the indexing is now running without failing 👏🏻

Does anyone have any idea how the docs are published? Seems that on typeorm.io the docs stopped being published sometime after v0.3.27. Is that right?

Also: is Typesense indexing our markdown files or the website?

Maybe in the next meeting we can discuss docs too - we should talk about publishing + indexing for 2-3 versions (0.3, master and next).

G0maa · 2025-11-28T15:50:47Z

@alumni

Nice @G0maa, seems the indexing is now running without failing 👏🏻

For now 🙈

Does anyone have any idea how the docs are published? Seems that on typeorm.io the docs stopped being published sometime after v0.3.27. Is that right?

Sadly no idea.

Also: is Typesense indexing our markdown files or the website?

Based on these lines it's indexing the website. I think I saw it somewhere that Typesense supports indexing docusaurus directly (but will have to verify).

Maybe in the next meeting we can discuss docs too - we should talk about publishing + indexing for 2-3 versions (0.3, master and next).

Totally agree.

gioboa

typesense-docsearch-scraper is the command for indexing the website. we have it in the workflow.
In the latest run there is the out of memory error as well.

G0maa · 2025-11-28T16:10:10Z

In the latest run there is the out of memory error as well

That's not good 😅

alumni · 2025-11-28T16:10:14Z

@gioboa I noticed that, but it's in the part where it displays the recent logs (so I though that log might be from the previous runs, but I might be wrong).

gioboa · 2025-11-28T16:11:16Z

@gioboa I noticed that, but it's in the part where it displays the recent logs (so I though that log might be from the previous runs, but I might be wrong).

I run the task again, same error. :(

G0maa · 2025-11-28T16:27:11Z

Hmmmm... I'll try to figure out a workaround for this.

I believe we shouldn't need to increase memory, TypeORM docs aren't that massive.
Although I'm unaware what increases memory usage, collection size is a factor, but what about e.g. searches happening while writing a new collection? 🤔

Going through Typesense docs there's a way to limit the batch_size see. So I might modify that next, say 10 instead of 40.

alumni · 2025-11-28T16:38:41Z

@G0maa that might actually be a good idea, since we go OOM, not out of disk space. Some of the pages are long (but still, they should not take MBs).

Co-authored-by: Giorgio Boa <35845425+gioboa@users.noreply.github.com>

gioboa

I increased the typesense RAM, it's working fine now

Co-authored-by: Giorgio Boa <35845425+gioboa@users.noreply.github.com>

fix: typesense

16b9b5b

- Refactor workflow to delete temp collections. - Delete unsued facets from docsearch scraper config.

qodo-free-for-open-source-projects bot added Review effort 2/5 Possible security concern labels Nov 27, 2025

G0maa requested review from gioboa and pkuczynski and removed request for gioboa November 27, 2025 08:11

Merge branch 'master' into fix/typesense

17ab414

G0maa changed the title ~~fix: typesense~~ fix: typesense doc sync Nov 27, 2025

gioboa approved these changes Nov 27, 2025

View reviewed changes

G0maa requested review from alumni, dlhck and mguida22 November 28, 2025 08:27

alumni requested a review from naorpeled November 28, 2025 10:28

alumni reviewed Nov 28, 2025

View reviewed changes

naorpeled approved these changes Nov 28, 2025

View reviewed changes

Merge branch 'master' into fix/typesense

eecce94

dlhck approved these changes Nov 28, 2025

View reviewed changes

dlhck merged commit d0b5454 into typeorm:master Nov 28, 2025
10 checks passed

gioboa reviewed Nov 28, 2025

View reviewed changes

ThbltLmr pushed a commit to ThbltLmr/typeorm that referenced this pull request Dec 2, 2025

fix: typesense doc sync (typeorm#11807)

028a073

Co-authored-by: Giorgio Boa <35845425+gioboa@users.noreply.github.com>

gioboa reviewed Dec 2, 2025

View reviewed changes

Minishlink mentioned this pull request Jan 12, 2026

fix(postgres): handle timestamptz correctly #11916

Open

mgohin pushed a commit to mgohin/typeorm that referenced this pull request Jan 15, 2026

fix: typesense doc sync (typeorm#11807)

c6be99d

Co-authored-by: Giorgio Boa <35845425+gioboa@users.noreply.github.com>

Uh oh!

fix: typesense doc sync #11807

fix: typesense doc sync #11807

Conversation

G0maa commented Nov 27, 2025

Description of change

Pull-Request Checklist

Uh oh!

qodo-free-for-open-source-projects bot commented Nov 27, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-free-for-open-source-projects bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

gioboa left a comment

Choose a reason for hiding this comment

Uh oh!

alumni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alumni commented Nov 28, 2025

Uh oh!

G0maa commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gioboa left a comment

Choose a reason for hiding this comment

Uh oh!

G0maa commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alumni commented Nov 28, 2025

Uh oh!

gioboa commented Nov 28, 2025

Uh oh!

G0maa commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alumni commented Nov 28, 2025

Uh oh!

gioboa left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

qodo-free-for-open-source-projects bot commented Nov 27, 2025 •

edited

Loading

G0maa commented Nov 28, 2025 •

edited

Loading

G0maa commented Nov 28, 2025 •

edited

Loading

G0maa commented Nov 28, 2025 •

edited

Loading