Skip to content

update to Unicode 17.0.0#1116

Merged
tomtau merged 1 commit intopest-parser:masterfrom
tomtau:unicode-update-17
Sep 10, 2025
Merged

update to Unicode 17.0.0#1116
tomtau merged 1 commit intopest-parser:masterfrom
tomtau:unicode-update-17

Conversation

@tomtau
Copy link
Contributor

@tomtau tomtau commented Sep 10, 2025

Summary by CodeRabbit

  • Chores
    • Hardened the Unicode data update script with stricter error handling to prevent silent failures.
    • Switched to version-specific Unicode downloads, improving reproducibility and making it easier to target a specific Unicode release.
    • Ensured the generation step remains functionally the same, with no impact on existing behavior.
    • Overall, these changes improve reliability of the update workflow without altering user-facing features.

@tomtau tomtau requested a review from a team as a code owner September 10, 2025 07:41
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

Walkthrough

Updates a shell script to enforce unset-variable errors, parameterizes the Unicode data download URL using $UNICODE_VERSION, and retains the existing ucd-generate invocation. No exported/public interfaces are changed.

Changes

Cohort / File(s) Summary
Script hardening and versioned URL
update_unicode.sh
Add set -u; replace hard-coded UCD.zip URL with versioned path using $UNICODE_VERSION; retain ucd-generate call (no semantic change).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • NoahTheDuke

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “update to Unicode 17.0.0” succinctly and accurately conveys the primary change of the pull request—upgrading the Unicode data version in the update_unicode.sh script—without extraneous detail or ambiguity.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Poem

🐇 I hop through scripts with tidy cheer,
A set -u to catch what’s unclear.
Versioned paths I neatly twirl,
To fetch the proper Unicode swirl.
Zip, unzip—then generate true;
Carrots aligned, the build hops through. 🌿

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
update_unicode.sh (1)

13-15: Reproducibility: pin ucd-generate and avoid masking install failures.

The earlier cargo install ucd-generate || true (Line 4) masks failures and may use an arbitrary preinstalled version, risking non-reproducible outputs for these generated files. Pin and fail on install; also log the version before generation.

Suggested (outside the selected lines):

if ! command -v ucd-generate >/dev/null; then
  cargo install --locked ucd-generate@<known-good-version>
fi
ucd-generate --version

If you want, I can propose a full patched script with set -euo pipefail, pushd/popd, robust curl, and pinned ucd-generate.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d3fbaf9 and 6e2278c.

📒 Files selected for processing (2)
  • pest/src/unicode/script.rs (21 hunks)
  • update_unicode.sh (1 hunks)
🔥 Files not summarized due to errors (1)
  • pest/src/unicode/script.rs: Error: Server error: no LLM provider could handle the message
👮 Files not reviewed due to content moderation or server errors (1)
  • pest/src/unicode/script.rs
🧰 Additional context used
🪛 Shellcheck (0.10.0)
update_unicode.sh

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: check for semver compatibility
  • GitHub Check: check for no_std compatibility
  • GitHub Check: cargo hack check --feature-powerset
  • GitHub Check: Test Coverage
  • GitHub Check: Minimal Versions Testing
  • GitHub Check: Documentation check
  • GitHub Check: Unit, Style, and Lint Testing
  • GitHub Check: Fuzzing

@@ -1,14 +1,15 @@
#!/usr/bin/env bash
set -u
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fail fast: prefer set -euo pipefail.

Only set -u is enabled. If curl/unzip fails (e.g., 404), the script will continue and later steps may silently generate stale files. Enable -e and -o pipefail to stop on errors.

Apply:

-set -u
+set -euo pipefail

Also consider cd guards per ShellCheck SC2164 (use pushd/popd or cd ... || exit 1).

🤖 Prompt for AI Agents
In update_unicode.sh around line 2, the script currently only uses "set -u"
which won’t stop execution on command failures or failed pipes; change the shell
options to "set -euo pipefail" to fail fast on errors and pipe failures, and
ensure any directory changes use guarded commands (e.g., use pushd/popd or "cd
<dir> || exit 1") so the script exits immediately if a cd fails.

Comment on lines +9 to 10
curl -LO https://www.unicode.org/Public/$UNICODE_VERSION/ucd/UCD.zip
unzip UCD.zip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden the download and validate UNICODE_VERSION.

  • Add a friendly check for UNICODE_VERSION (current set -u will emit a generic error).
  • Make curl fail on HTTP errors and surface logs; quote the URL; write to a known filename.
  • Minor: switch to rm -rf + mkdir -p and pushd/popd for idempotence and safer directory handling.

Minimal changes:

-curl -LO https://www.unicode.org/Public/$UNICODE_VERSION/ucd/UCD.zip
+ : "${UNICODE_VERSION:?Set UNICODE_VERSION (e.g., 17.0.0)}"
+curl -fSLo UCD.zip "https://www.unicode.org/Public/${UNICODE_VERSION}/ucd/UCD.zip"

Nice-to-haves (outside the selected lines):

rm -rf ./target/ucd/
mkdir -p ./target/ucd/
pushd ./target/ucd/ >/dev/null || exit 1
# ... curl/unzip ...
popd >/dev/null || exit 1

Optionally verify integrity if upstream provides checksums:

# Example if a checksum file exists alongside UCD.zip
curl -fSLo UCD.zip.md5 "https://www.unicode.org/Public/${UNICODE_VERSION}/ucd/UCD.zip.md5"
md5sum -c UCD.zip.md5
🤖 Prompt for AI Agents
In update_unicode.sh around lines 9-10, the script should validate
UNICODE_VERSION with a friendly error if unset, create/clean a target directory
(rm -rf then mkdir -p), change into it with pushd (and return with popd) for
idempotence, download the UCD zip to a known filename using curl with flags to
fail on HTTP errors and show errors (e.g. -fSLo) while quoting the URL built
from ${UNICODE_VERSION}, and then unzip that known filename; update the script
to implement these steps so downloads fail loudly, filenames are deterministic,
and directory handling is safe.

@tomtau tomtau merged commit 2cf69b8 into pest-parser:master Sep 10, 2025
9 checks passed
@tomtau tomtau deleted the unicode-update-17 branch September 10, 2025 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant