Skip to content

Conversation

@jqnatividad
Copy link
Collaborator

implements #2858

as the gdate data types inferencing are heuristics-based, in fast mode, the inferred type has a double question mark suffix ("??") as the inference is just based on the available percentile values.

In comprehensive mode, it has a single question mark suffix ("?") to indicate the inference is of a higher confidence as it is based on a comprehensive scan of all values for the column.

@kulnor , reading the spec (https://www.w3.org/TR/xmlschema-2/#date), I went with my interpretation of what the prescribed "lexical representation" of gMonthDay, gDay and gMonth with the prescribed number of hyphen prefixes - but it seems odd...

Can you check?

- fast mode - using percentile samples to infer gregorian date type
- comprehensive - scanning all records for a column to infer gregorian date type
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds Gregorian date type inference capability to the moarstats command with two scanning modes: fast (uses percentile values) and comprehensive (uses min/max values). The feature adds an xsd_subtype field to identify XSD Gregorian date types (gYear, gYearMonth, gMonthDay, gDay, gMonth) with confidence indicators ("??" for fast mode, "?" for comprehensive mode).

Key Changes

  • Introduces --xsd-gdate-scan flag with "fast" (default) and "comprehensive" modes
  • Implements pattern-based detection for five Gregorian date types
  • Adds automatic fallback from fast to comprehensive mode when percentiles are unavailable

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
src/cmd/moarstats.rs Implements Gregorian date type detection with detect_gregorian_date_type() function, adds parse_all_percentile_string_values() helper, integrates detection into infer_xsd_type(), and adds --xsd-gdate-scan command-line option
tests/test_moarstats.rs Adds comprehensive test suite covering fast mode, comprehensive mode, Integer gYear detection, invalid mode handling, default behavior, and fallback scenarios; updates existing test expectations to reflect gYear detection for precinct field

jqnatividad and others added 8 commits January 1, 2026 09:31
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…H Copilot review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

jqnatividad and others added 8 commits January 1, 2026 13:49
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
but not doing leap year, which is overkill for this func

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jqnatividad jqnatividad merged commit ea44b9b into master Jan 1, 2026
16 checks passed
@jqnatividad jqnatividad deleted the 2858-moarstats-xsd-subtype-gregoriandate-inferencing branch January 1, 2026 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants