Skip to content

chore: refactor field creation for maintainability#26028

Merged
davidby-influx merged 2 commits intomaster-1.xfrom
DSB_field_refactor
Feb 18, 2025
Merged

chore: refactor field creation for maintainability#26028
davidby-influx merged 2 commits intomaster-1.xfrom
DSB_field_refactor

Conversation

@davidby-influx
Copy link
Copy Markdown
Contributor

@davidby-influx davidby-influx commented Feb 15, 2025

Refactor the new field creation code for maintainability.

Address review comments in the port work of the
field creation. Also fixes one bug in returning the wrong
error.

var err PartialWriteError
switch {
case errors.As(validateErr, &err):
// This will turn into an error later, outside this lambda
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer a lambda


// Field represents a series field. All of the fields must be hashable.
type Field struct {
Name string `json:"name,omitempty"`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing the json tagging? Just a question, not really a change comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't JSON-ize this struct anywhere; it gets copied to a protobuf for serialization these days.

Copy link
Copy Markdown
Contributor

@philjb philjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing CreateFieldIfNotExists to accept a string as the first argument and removing FieldBytes and removing the inline lambda that was confusing!

I have spotted and recommend another change to ValidateFields to avoid a tricky type switch on the returned error type since ValidateFields only returns a PartialWriteError - I do recommend this within this pr but won't hold up this pr's merging. I call this one "recommended and optionally, but i'd really like to see it done" :smiling.

I think have also spotted a broken check against timebytes for field keys... this is bad but very historical so should probably be a separate cleanup, if you want to do it at all; but i also support doing it in the this cleanup pr to avoid additional burden of porting. I call this one "suggested and totally optional".

mf := engine.MeasurementFields(name)
// Check with the field validator.
if newFields, validateErr = ValidateFields(mf, p, s.options.Config.SkipFieldSizeValidation); validateErr != nil {
var err PartialWriteError
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we're now in a mix of original code (before this PR and before #25998), but we are touching this and I now recommend another change for clarity, not correctness: removal of dead/unreachable code.

ValidateFields() only returns a PartialWriteError at this point so the switch { case errors.As(validateErr, &err) ... is more than is needed as the default: case is unreachable code and therefore complicates the logic flow.

My recommendation is to change ValidateFields to this signature (eliding args) so it returns the specific error it can only return:

func ValidateFields(...) ([]*FieldCreate, *PartialWriteError), update it internally for &PartialWriteError and then update here to

var newFields []*FieldCreate
		var validateErr *PartialWriteError
		name := p.Name()
		mf := engine.MeasurementFields(name)
		// Check with the field validator.
		if newFields, validateErr = ValidateFields(mf, p, s.options.Config.SkipFieldSizeValidation); validateErr != nil {
			if reason == "" {
				// record the first reason only
				reason = validateErr.Reason
			}
			dropped += validateErr.Dropped
			atomic.AddInt64(&s.stats.WritePointsDropped, int64(validateErr.Dropped))
			continue
		}

I especially recommend this change since determining what to return in the default case was already tricky as this pr description notes that it: Also fixes one bug in returning the wrong error.

coda:
I know this is slightly against the usual golang "return error interface in the ultimate return position" but in this case, for a helper method, I think it's clarifying instead of confusing: aka I find the switch with default more confusing than the momentary thought about why this helper doesn't return an error interface.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this, because I can imagine ValidateFields returning other errors in the future. There may be validations that have to touch some other resource, which we could fail to obtain, for instance, which would cause more significant errors, so I would prefer to leave this as an error return type and switch on the type.

name := p.Name()
mf := engine.MeasurementFields(name)
// Check with the field validator.
if newFields, validateErr = ValidateFields(mf, p, s.options.Config.SkipFieldSizeValidation); validateErr != nil {
Copy link
Copy Markdown
Contributor

@philjb philjb Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also now notice (and should have noticed in #25998 instead that the point's field names ("key") is checked against timeBytes twice.

First here in validateSeriesAndFields:

influxdb/tsdb/shard.go

Lines 684 to 690 in cda0ef4

for iter.Next() {
if bytes.Equal(iter.FieldKey(), timeBytes) {
continue
}
validField = true
break
}

Actually - I think this first check against timeBytes is totally broken. The check is inverted - it's missing an ! meaning the second check is actually doing the work and this first check is a just a waste. EDIT: see next comment: this is doing some work but its nearly redundant with ValidateFields and a minor update there could delete one of the two checks for illegal time fields.

and then later in ValidateFields:

fieldKey := iter.FieldKey()
// Skip fields name "time", they are illegal.
if bytes.Equal(fieldKey, timeBytes) {
continue
}

My recommendation is to clean this up to but in a separate pr: It'll be one of the 5 of us who will next be reading this code so perhaps think kindly on our future selves!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing this. The first one, in validateSeriesAndFields checks that there is at least one field not named time in the point, otherwise it drops the point.

The second check in ValidateFields skips creating any fields named time.

Am I misreading the code?

Copy link
Copy Markdown
Contributor

@philjb philjb Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't those the same thing? aka they cause the same effect?

p0=Point{fields:["time"]} 
p1=Point{fields:["time", "time", "foo"} 

For these two points, the first p0 will be rejected by what is in validateSeriesAndFields (all fields = time), but it would also be rejected by ValidateFields. For the second p1, it would not be rejected by validateSeriesAndFields (one field not equal to "time"), and then in ValidateFields both of the time fields would be skipped but the foo field would be marked for creation.

If the check in validateSeriesAndFields is removed, the result is the same, only foo is marked for creation.

BUT I see that the test TestWriteTimeTag fails because it's looking for a PartialWriteError in the p0 case which ValidateFields doesn't return - this is a latent bug in Validatefields. So the rework is more than just a deletion - this is all a suggested change - not important.

I admit i misread the intention of the check in validateSeriesAndFields in my initial comment too! its not "wrong" as it is excluding all points where every field is an illegal one (and returning an error), but there's only one illegal field (time) so its redundant now. Perhaps there used to be more illegal ones or the intention of more illegal ones. It's also kind of weird to validate fields both before and inside ValidateFields.

All in - this is a pretty minor point (pun intended). And hopefully i'm not misreading it! (I'll edit my original comment some)

@davidby-influx davidby-influx merged commit 5f57633 into master-1.x Feb 18, 2025
10 checks passed
@davidby-influx davidby-influx deleted the DSB_field_refactor branch February 18, 2025 22:00
philjb added a commit that referenced this pull request Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants