Skip to content

Extend survey functionality to include connection targets#3642

Merged
latobarita merged 3 commits intostellar:masterfrom
marta-lokhova:survey_extension
Jan 12, 2023
Merged

Extend survey functionality to include connection targets#3642
latobarita merged 3 commits intostellar:masterfrom
marta-lokhova:survey_extension

Conversation

@marta-lokhova
Copy link
Contributor

@marta-lokhova marta-lokhova commented Jan 6, 2023

Update network survey functionality to include connectivity targets (inbound/outbound). Note that the new-style survey requests will only be flooded to version-compatible nodes, so we'll see better results overtime as more nodes upgrade their core version.

Since this PR changes XDR, an appropriate change is needed in the XDR next repo (CI won't build currently)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should work like this:
relaying nodes don't have access to the encrypted survey result (ie: SurveyResponseBody is not visible on the wire), it's probably better to just change the surveying code to respond with the new extended result, and leave everything else as is.

That way:

  • existing nodes will be able to relay requests and (encrypted) result even if they're running an older version of the survey code
  • we can deprecate right away responding with "old style" survey responses
  • old nodes can't survey the network (because they may receive responses from newer nodes), but that seems fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. My thinking was that as per our usual overlay processes, we provide backward compatibility for at least some time, which should allow old nodes to use survey functionality. We've never upgraded survey before, so perhaps we're lacking a process and backward compatibility expectations here. Given that this isn't threatening to consensus and can be mitigated with a version upgrade, maybe it's ok to stop supporting survey for older nodes (I don't know who is using though and if it will cause problems).

Also - the fact that old nodes can still properly process newer requests/responses is true for this particular version but it's not guaranteed in the future. As the survey format changes, we'll still have to do what this PR does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. My thinking was that as per our usual overlay processes, we provide backward compatibility for at least some time, which should allow old nodes to use survey functionality. We've never upgraded survey before, so perhaps we're lacking a process and backward compatibility expectations here. Given that this isn't threatening to consensus and can be mitigated with a version upgrade, maybe it's ok to stop supporting survey for older nodes (I don't know who is using though and if it will cause problems).

I think it's acceptable to require surveying nodes to run the latest version.

Also - the fact that old nodes can still properly process newer requests/responses is true for this particular version but it's not guaranteed in the future. As the survey format changes, we'll still have to do what this PR does.

I don't think this is true: the survey request and relaying messages should not change format. The only thing that should change is the format of the survey response (in response to a generic SURVEY_REQUEST message) and is encoded as an encrypted opaque blob.

Copy link
Contributor Author

@marta-lokhova marta-lokhova Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true: the survey request and relaying messages should not change format. The only thing that should change is the format of the survey response (in response to a generic SURVEY_REQUEST message) and is encoded as an encrypted opaque blob.

I don't think we can guarantee this. We don't know what the survey needs will look like in the future. The request itself contains SurveyMessageCommandType which assumes the possibility of running different survey types. All I'm trying to say is that it's not a problem right now, but we might still need to deal with this in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is not needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with my proposed change, you don't even need to update this version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I guess you could keep this to make it easier for testing: a node with OVERLAY_PROTOCOL_VERSION<=26 would return old style surveys)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes behavior of overlay, so the version here needs to be bumped regardless (it will be 27 in 19.7.0. either way due to other changes too)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking it's not because the part that is really part of the overlay protocol does not include the survey response format (old nodes would forward and respond to survey requests just fine)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow - if I'm talking to a node of a particular overlay version, I'm expecting a concrete protocol to be followed and a specific message format that I know about. Having the two nodes with "the same version", yet not being able to process each other's messages doesn't seem right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Survey is not implemented at this layer: you can receive a new format survey response via an old peer just fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated way will be bool extendedSurvey = body.type() than couple with commandType

{
case SURVEY_TOPOLOGY:
return "SURVEY_TOPOLOGY";
case SURVEY_TOPOLOGY_EXTENDED:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably cleanup the XDR: there are more than 1 enum.

One contains the list of "commands", and the only supported command is SURVEY_TOPOLOGY.

Then there are different SurveyResponseBody versions (this PR adds 1 new one), so we should just create a new enum for that SurveyResponseType with values SURVEY_TOPOLOGY_RESPONSE_V0=0 and SURVEY_TOPOLOGY_RESPONSE_V1=1

As for this function we can just call xdr::xdr_traits<SurveyMessageCommandType>::enum_name(type) instead in getMsgSummary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it's updated. I will submit a PR in the vNext repo when this PR is approved.

Copy link
Contributor

@MonsieurNicolas MonsieurNicolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. You just need to rev the xdr for vNext and we'll merge.

Thank you

@MonsieurNicolas
Copy link
Contributor

r+ 4b72619

@latobarita latobarita merged commit 98bc79b into stellar:master Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants