Skip to content

Generate Rust types from the official specification #151

@ebkalderon

Description

@ebkalderon

Introduction

It would be awesome if this project also supported generating Rust types from the official spec as well.

This ticket was created in response to a brief discussion with @brettcannon over at ebkalderon/tower-lsp#361 (comment) and is intended track future developments. Happy to get some productive dialog going! 😄

Background

Considering that installing and running a Python package as a prerequisite from a cargo build would be pretty messy, I would assume this effort would likely entail the creation of a pure Rust equivalent that can consume the same spec. Consider the following setup:

  • An lsprotocol-codegen crate would process the spec and generate a Rust file containing type and optionally trait definitions.
    • Provided as a library intended to be used from a Cargo build.rs script.
    • Essentially amounts to a simple generate() function that takes an input spec and some configuration switches.
    • Generates an lsprotocol.rs file in $OUT_DIR that can be included in a Rust project with:
      include!(concat!(env!("OUT_DIR"), "/lsprotocol.rs"));
    • Could also ship with a src/bin.rs that wraps this library and provides a CLI interface, should others prefer to use that.
    • See other popular build helper crates for an example of this interface, e.g. prost-build (Protobuf), tonic-build (gRPC), and dbus-codegen (dbus) for prior art.
    • Consider using schemafy for processing the JSON Schema (draft 4) file in this repo.
  • An lsprotocol crate would export native Rust types to the user.
    • Main crate that most downstream users would consume.
    • These types would be generated in advance by lsprotocol-codegen and vendored directly into the repository.
    • Updated periodically on an as-needed basis (perhaps automatically using a GitHub CI job) and then published to Crates.io.
    • Generated types would implement most common std traits (Clone, Debug, Default, Eq, PartialEq, etc).
    • Generated types would implement serde::Serialize and serde::Deserialize for easy (de)serialization to and from JSON using serde_json.
    • See the community de-facto standard Rust crate lsp-types for prior art.
      • Something with an API that looks very similar to this would be amazing! Dumb structs with all pub fields, enums used to represent multiple states, and everything implements Serialize and Deserialize.
      • The included Notification and Request traits are very handy for generic code. Would love to have that here too.
      • Not sure if the helper macros are essential to replicate. I rarely use them, personally, but some may find them nice.
      • Please note that the structs, enums, traits, and documentation in this particular crate are all updated by hand, at present. As such, a source of auto-generated Rust types would be wonderful to have (less manual labor for maintainers while delivering quicker updates to downstream users).

Ideally, the above Rust crates would share the same spec file as the Python project so the codegen for both could be tested for correctness in CI.

Open Questions

One notable open question is how we should best handle "proposed" features. It would be nice if downstream users of lsprotocol could opt into certain not-quite-standardized features in advance, provided they explicitly agree to the API instability when switching this on. The de-facto community standard crate lsp-types does precisely this (as does my own downstream LSP library for Rust tower-lsp) using an off-by-default proposed = [] Cargo feature which, if enabled at compile-time, would activate these types in the public API.

If we choose to offer the same thing here, it would be fairly trivial on the surface: the lsprotocol crate would expose Rust types for the entire superset of the LSP specification, marking certain types #[cfg(feature = "proposed")] and offering users an off-by-default proposed = [] feature they can choose to enable if they wish. This would fall in line with other popular LSP crates used in the community today.

However, what remains to be seen is how fancy we'd like to be with lsprotocol-codegen. Consider the following questions:

  1. Does the current spec file used in this repo include proposed features at all? Is this something the equivalent Python lsprotocol package supports today? Is this something we even want to support?
  2. If we do want to support proposed protocol features in this crate, how should lsprotocol-codegen implement said support?
    • Presumably, lsprotocol-codegen would output a single lsprotocol.rs file containing all types with some marked #[cfg(feature = "proposed")], ready to be vendored into this repo as lsprotocol/src/lib.rs and published directly to Crates.io as the lsprotocol crate.
    • Should we add a config switch to lsprotocol_codegen::generate() and the CLI interface (if we choose to provide one) for the user to change the "proposed" feature name string in the output #[cfg]s to something else?
    • Should we support a config switch for not including types for proposed features in the lsprotocol.rs output entirely?
    • I don't think either of the above two bullets are at all necessary for an MVP, but may be nice to have down the road.

Another open question is regarding specification version support in general. Do we support the latest version of the LSP spec only? Or do we want to be able to perform codegen for all available versions of the spec? I presume the former, but I think it's good to get solid confirmation on this.

Versioning

Crates published to Crates.io are expected to adhere to semantic versioning as strictly as possible. This is quite significant for us because adding new pub fields to an existing struct or new variants to an existing enum is considered a breaking API change, unless those types are explicitly marked #[non_exhaustive] (reference), at which point users are forced to include a _ => fallthrough case or wildcard .. when matching on or destructuring those types.

We should not annotate every single type in the generated code with #[non_exhaustive], of course, since this would severely hamper the crate's usability. Downstream code would not be able to directly construct instances of any of the generated structs, even with ..Default::default() syntax, even though all fields are pub (see bug report rust-lang/rust#70564). We should carefully consider whether or not to mark certain types #[non_exhaustive] and when it would be most applicable to do so.

Thankfully, serde seems to support serializing from and deserializing to #[non_exhaustive] types out of the box, so this should not be a factor in us deciding when or not to apply this attribute in the lsprotocol-codegen output.

Metadata

Metadata

Labels

feature-requestRequest for new features or functionalityrust

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions