Skip to content

Release wasm + R bindings#175

Merged
nleroy917 merged 76 commits intomasterfrom
dev
Oct 8, 2025
Merged

Release wasm + R bindings#175
nleroy917 merged 76 commits intomasterfrom
dev

Conversation

@nleroy917
Copy link
Copy Markdown
Member

@nleroy917 nleroy917 commented Oct 5, 2025

This PR should introduce both wasm bindings (and push them to npm via CI/CD) and then R bindings. We need to bump versions for the following crates:

  1. gtars-core (added feature flags to gate bigtools)
  2. gtars-tokenizers (added some From impls for SpecialTokens)
  3. gtars-wasm (renamed, and then added tokenizers)

will need to do the following releases:

  • v0.5.1, release the core crate and publish to cargo (we updated gtars-core + gtars-tokenizers)
  • wasm-0.5.1, release the web assembly/JS/TS bindings

Theres no need to do a release for the CLI or python bindings as nothing changed in those crates

TODO:

  • bump versions as needed
  • changelog

Version bump/release/publish plan

gtars-core

- version = "0.5.0"
+ version = "0.5.1"

Release to crates.io using cargo publish

gtars-tokenizers

- version = "0.5.0"
+ version = "0.5.1"

Release to crates.io using cargo publish

gtars

- version = "0.5.0"
+ version = "0.5.1"

- gtars-core = { path = "../gtars-core", optional = true, version="0.5.0" }
- gtars-tokenizers = { path = "../gtars-tokenizers", features = ["huggingface"], optional = true, version="0.5.0" }
+ gtars-core = { path = "../gtars-core", features=["http", "bigbed"], optional = true, version="0.5.1" }
+ gtars-tokenizers = { path = "../gtars-tokenizers", features = ["huggingface"], optional = true, version="0.5.1" }

Release to crates.io using cargo publish. Create tag v0.5.1 and create release on GitHub.

gtars-python

- version = "0.5.0"
+ version = "0.5.1"
- gtars-core = { path = "../gtars-core" }
+ gtars-core = { path = "../gtars-core", features=["bigbed", "http"] }

Create tag py-0.5.1 and create release on GitHub. Release via PyPI CI/CD.

gtars-wasm

- version = "0.5.0"
+ version = "0.5.1"

Create tag wasm-0.5.1 and create release on GitHub. Release via NPM CI/CD.

gtars-r

+ version = "0.5.1"

Create tag r-0.5.1 and create release on GitHub. no CI/CD setup for this, so that's it

@sanghoonio
Copy link
Copy Markdown
Member

sanghoonio commented Oct 7, 2025

The lib target gtars in package `gtars-r v0.5.0

A way to solve this is change the [lib.name] attribute in the gtars-r crate to gtars_r. Not sure what the consequences of this are, however

The problem with this is that extendr uses the package name in the DESCRIPTION file to generate wrappers that call useDynLib(gtars, .registration = TRUE) which means it looks for the gtars lib, but can't find it because we changed it to gtars_r. We can rename the R package in the description file to something like gtars.r or gtarsr and use this under [lib], but not gtars_r because only alphanumeric and . characters are allowed, or somehow keep it as gtars in the toml file

@nleroy917
Copy link
Copy Markdown
Member Author

nleroy917 commented Oct 7, 2025

@sanghoonio gotcha thanks for the explaination. I think I am partial to gtarsr.... what do others think?

The consequence of this is that when you install and import you need to do this?

library(gtarsr)

? Thats a bit annoying it'd be nice to just do library(gtars). I have this problem with the wasm bindings too where you can't override the package name -- it just uses the crate name which is a non-starter since we cant call it gtars. Its a conflict and also just a bit confusing

For reference:

So it'd be nice if everything was just gtars across environments. Right now I think we have a solution for everything:

  • Rust: wrapper crate called gtars:
// cargo add gtars
use gtars::tokenizers::{Tokenizer};
  • Python: python wheel called gtars:
# pip install gtars
import gtars
  • Web/JS: package called gtars:
// npm i @databio/gtars
import { Tokenizer } from '@databio/gtars'

So R would be the odd-one out:

# install.packages("gtarsr")
library(gtarsr)

@sanghoonio
Copy link
Copy Markdown
Member

The gtarsr is definitely annoying, is there any reason we can't use gtars for the R package if we rename everything else?

@nleroy917
Copy link
Copy Markdown
Member Author

The gtarsr is definitely annoying, is there any reason we can't use gtars for the R package if we rename everything else?

well the wrapper crate sort of needs to be called gtars I think that's important.

@nleroy917
Copy link
Copy Markdown
Member Author

nleroy917 commented Oct 8, 2025

Ok I dont actually think the gtars_r thing is a problem, right? I think that its only really an issue if we intend to publish to crates.io, which we aren't planning on doing, nor does that make any sense. I guess put another way... because the R bindings are completely downstream of the core crates (gtars, gtars-core, gtars-refget, etc), it shouldnt actually be a problem.

Put another way... I think that we only would have a problem if we tried to import this crate alongside something else like gtars... think. As long as the package name can be gtars-r then the [lib.name] is safe to be named gtars.

I just tested it locally with rextendr::document() and it seems to work well

Copy link
Copy Markdown
Member

@khoroshevskyi khoroshevskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions/concerns, especially about python binding crate naming in cargo

Comment on lines +14 to +19
[patch."https://github.com/databio/gtars"]
gtars-core = { path = "gtars-core" }
gtars-io = { path = "gtars-io" }
gtars-igd = { path = "gtars-igd" }
gtars-refget = { path = "gtars-refget" }
gtars-tokenizers = { path = "gtars-tokenizers" } No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. should this dependencies be here?
  2. What does patch.".." mean?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question for @sanghoonio

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[patch] is for overriding dependencies for the R bindings. The cargo.toml in gtars-r has these dependencies:

[dependencies]
extendr-api = { git = "https://github.com/extendr/extendr", branch = "master" }
anyhow = "1.0.82"
gtars-core = { git = "https://github.com/databio/gtars", branch = "dev" }
gtars-io = { git = "https://github.com/databio/gtars", branch = "dev" }
gtars-igd = { git = "https://github.com/databio/gtars", branch = "dev" }
gtars-refget = { git = "https://github.com/databio/gtars", branch = "dev" }
gtars-tokenizers = { git = "https://github.com/databio/gtars", branch = "dev", features = ["huggingface"] }

because when you install an R package, R copies the source files to a temp directory before compiling which in our case would not include the parent gtars directory that contain the required workspace crates. So the dependencies here are linked to github instead to pull from. The problem is if you make a change to a gtars workspace crate while developing R bindings, the changes don't get reflected until you push changes to github. The [patch] is for the workspace to redirect these github dependencies to the local paths so that rextendr::document() (which compiles in place) uses the local workspace crates instead, and not the github paths. Basically it is just a workaround for a workaround to make development more convenient.

Copy link
Copy Markdown
Member

@donaldcampbelljr donaldcampbelljr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great.

@nleroy917 nleroy917 mentioned this pull request Oct 8, 2025
2 tasks
@nleroy917 nleroy917 merged commit c0e75df into master Oct 8, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants