Skip to content

feat: Diff states#278

Merged
elementbound merged 42 commits intofoxssake:mainfrom
TheYellowArchitect:diff-states-no-serialization
Nov 20, 2024
Merged

feat: Diff states#278
elementbound merged 42 commits intofoxssake:mainfrom
TheYellowArchitect:diff-states-no-serialization

Conversation

@TheYellowArchitect
Copy link
Copy Markdown
Contributor

@TheYellowArchitect TheYellowArchitect commented Sep 14, 2024

All states are sent UNRELIABLE_ORDERED regardless if they are a full state (contains all properties) or a diff state (contains only the properties with new values)
Once the client receives any state, he sends a RELIABLE rpc to the server that he has indeed received a full state, and so it can start receiving diff states. Once this RELIABLE rpc is received by the server, all future states sent to that specific peer are diff states, instead of full states. Diff states are states whose property/values included are only those which have changed from the previous tick.

This implementation also solves the diff states problem mentioned in #264 (this is the 2nd way, and I think its the cleanest and most performant)

I have tested it and found no issues.

Solves #157
As for the bandwidth it saves, see forest-brawl example: "Displaceable:mass", "Displaceable:impulse", ":speed" which change only once via powerup, so 3/5 states are skipped with diff states. Assuming the variables were of same size, with diff states, there is 60% less bandwidth.

Requires #277 which is just static typing additions (so this PR is easy to read, instead of having double the size)

get:
return ProjectSettings.get_setting("netfox/rollback/enable_state_diffs", true)
set(v):
enable_state_diffs = v
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the setter needless? I am not familiar with Godot's setters/getters.
Also are project settings' values cached on runtime, or does it search project settings' values each time I get()?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for this flag to be writable during runtime? I'd just go with a push_error message.

ProjectSettings.get_setting boils down to a Node.get call under the hood, so I'd say it's fine as it is. But if you want to be proactive about perf, feel free to introduce a variable with an underscore prefix that's initialized in _ready and use that.

Copy link
Copy Markdown
Contributor Author

@TheYellowArchitect TheYellowArchitect Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it push_error as you suggested. As for the second part of this, I opened #304

…RED) regardless of full or diff. The server always starts by sending full states. Once the client receives any (full) state, he sends an ack to the server to start sending diffs.
@olivatooo
Copy link
Copy Markdown

I really hope the author reads this

@elementbound
Copy link
Copy Markdown
Contributor

@olivatooo getting there 🙂

@elementbound
Copy link
Copy Markdown
Contributor

elementbound commented Oct 10, 2024

Cleaned up the PR a bit so we can focus on the actual changes. In the future, please keep in mind the following for PRs:

  • One PR should implement a single feature
  • The PR shouldn't contain changes not related to the feature it implements
  • Preferably don't start rewriting unrelated parts of the codebase along different coding conventions, instead start a discussion. In this PR specifically, I mean adding static typing annotations to variables inside function scopes.
  • Even though Godot modifies it, you can just not commit the Brawler scene 🙂

Thanks!

@TheYellowArchitect
Copy link
Copy Markdown
Contributor Author

TheYellowArchitect commented Oct 11, 2024

Even though Godot modifies it, you can just not commit the Brawler scene

I apologize for this slipping through, it pops up in each PR I make by default, I gotta manually exclude this, and it seems in this PR I missed it.
Btw, which godot version do you use to edit netfox? I use 4.1.0

The PR shouldn't contain changes not related to the feature it implements
One PR should implement a single feature
commit-message: "remove input diff fragment"

This PR still contains the input changes which are unrelated to diff states. I will move them to a different PR. Freely close that input refactor PR.


Btw on the removal of static typing, I gotta say its harder to read this PR without types, because you cant tell if a property variable is PropertyEntry or String

var result = PropertyEntry.new()
result.node = root.get_node(NodePath(path))
result.property = path.erase(0, path.find(":") + 1)
result.type = typeof(result.get_value())
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I make a PR for caching the type of the property entry?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, that would need a separate discussion on why that would be beneficial.

static func apply(properties: Dictionary, cache: PropertyCache):
for property in properties:
var pe: PropertyEntry = cache.get_entry(property)
var pe = cache.get_entry(property)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no static typing here, at least a rename is required of pe to property_entry

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but either in a different PR, or later in this PR's lifecycle

@TheYellowArchitect
Copy link
Copy Markdown
Contributor Author

TheYellowArchitect commented Oct 12, 2024

I removed the input refactor so this PR is exclusively on state diffs. I found a weird bug though on Godot 4.1 (haven't tested other versions)

It sends state ticks in order (e.g. 45, 46, 47) but sometimes they arrive out of order (e.g. 45, 47, 46) which shouldn't happen because the RPC is "unreliable_ordered". I want to test this with other godot versions because until Godot 4.4 dev3 some RPC bugs are fixed, and this may include this.

Nonetheless, this made me realize that even if Godot doesn't bug, packet loss and packets being shuffled are a possibility, so it's good to cover those edge scenarios. So I made a condition block for this scenario (see the for picked_tick in range(_latest_state_tick, tick))

The above works and has no bugs but there is still that rare jittering bug where you run 0 ping but there is a jittering at some player once in 10 boots or something (which existed before this PR), which I suspect is caused by this "unreliable_ordered" failure. I will open an issue on this once I confirm its root and can reproduce it consistently.

Anyway, I left the debug comments and stuff inside, turning this into a draft for now until I remove the debug comments and look deeper into this bug (e.g. test other godot versions)
Despite it being in draft, it works as intended btw, so test freely.

I will open a PR for the input refactor later, gotta do tasks irl will take a few days.

@TheYellowArchitect TheYellowArchitect marked this pull request as draft October 12, 2024 11:09
sanitized[property_name] = value

# Duplicates the previous state(s), including the current one
# Also fixes packet loss (e.g. a state missing)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Packet loss doesn't need "fixing". It is intentional that the state history buffer may have gaps in it.


if (NetworkRollback.enable_state_diffs and not _has_received_full_state):
_has_received_full_state = true
_receive_full_state_ack.rpc_id(1, received_tick)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't hardcode the host peer ID - the RollbackSynchronizer has an owner, which owns the state.

return

if (_states.is_empty() || _latest_state_tick == -1):
_logger.error("Received a state without any properties from %s, but missing previous states!" % multiplayer.get_remote_sender_id())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't exactly follow this error message - what do you mean "without any properties"? There's no check here whether the received diff state has any properties.

Copy link
Copy Markdown
Contributor Author

@TheYellowArchitect TheYellowArchitect Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find this codeblock in the latest version, but this condition was included for the scenario where the client received a diff state, without having previously received a full state. In the case of the full state being victim of packet loss (since it is "unreliable_ordered")

I agree that the error message is unhelpful and vague.

I suggest re-placing this condition block, and renaming the logger error message to something like:
"Received diff state, but there is no full state received prior."

Copy link
Copy Markdown
Contributor Author

@TheYellowArchitect TheYellowArchitect Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, this if condition shouldn't ever trigger because if the first full state suffers packet loss, the client doesn't send an ack to the server to start sending diff states. If you want to be safe, add the condition and we remove it in a subsequent PR if no one triggers it, otherwise it is better removed for clean code 👍

@elementbound
Copy link
Copy Markdown
Contributor

But the bug was that it should be -1 before any state is set. With this PR, _latest_state_tick starts with -1 and doesn't change until an actual state is added.

_latest_state was working as intended, as a way to track where to roll back in case we don't own input for the node. Since the original diff state impl expected gapless state history, I suspect how that managed to broke things. Either way, I still suggest to expect gaps in state history, and instead of always comparing to the previous state, specify which tick is the received state diffed from.

@elementbound elementbound changed the title Added diff states (no binary serialization, no static typing) - includes fix for the first-state problem. feat: Diff states Nov 10, 2024
var value = snapshot[property]
var authority := property_entry.node.get_multiplayer_authority()

if authority == sender:
Copy link
Copy Markdown
Contributor Author

@TheYellowArchitect TheYellowArchitect Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var authority := property_entry.node.get_multiplayer_authority() doesn't explain what type it is, and the reader cannot know unless he knows the return type of the function get_multiplayer_authority (either by clicking or by memory), so I suggest renaming to authority_id (or var authority: int)

If the rename happens to authority_id, it would be ideal to also rename sender: int parameter to sender_id: int, so the condition if authority == sender becomes if authority_id == sender_id which is intuitive without even looking at the codeblock above

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable to expect from people working with netcode to know the concept of authority in this context, or to look it up 🙂

Also I don't see the benefit of knowing the type of these variables. Reading out the condition, "if the authority is the sender" makes sense without types.

Copy link
Copy Markdown
Contributor Author

@TheYellowArchitect TheYellowArchitect Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, if I see any variable operation (e.g. + or ==) without knowing the type, I am not certain what I read will work as I assume it to work (as a result, I have to spend time to look it up, in this case ctrl+click get_multiplayer_authority and then scroll up to parameters for sender variable type), because after all, authority could be some custom class, or have an operator overload.

Having offered the benefit of knowing the type of these variables, I suggest adding _id to these 2 variables to make these variables more clear to the reader, because not explicitly having type declaration creates ambiguity, and adding the suffix to the variable name clears the ambiguity

var _latest_state_tick: int = -1
var _states: Dictionary = {}
var _inputs: Dictionary = {}
var _latest_state: int = -1
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why rename _latest_state_tick to _latest_state?
The type declaration is not visible at any line below the declaration, so a user seeing the variable doesn't know if its a dictionary, integer, or a class, unless he scrolls to the top or ctrl+clicks for declaration

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't confident in the change, but your tick vs. state note on another comment was convincing. Renamed.

Though I wouldn't base my argument on the type info 🙂 If someone wants to know the type, the variable is statically typed so they can check it from autocomplete, or just look at the definition. The missing info here is not the language type ( int ) itself, but the semantic info that this marks a tick and not a full state.

# Also fixes packet loss (e.g. a state missing)
if (NetworkRollback.enable_state_diffs && _latest_state_tick != -1):
for picked_tick in range(_latest_state_tick, received_tick - 1):
_states[picked_tick + 1] = _states[picked_tick].duplicate(true)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling diff states, it often gets inside this condition with any packet loss (kinda often), and with this, you can gap missing states of the past. There is no reason to not have this, as it ensures smoothness imo.

# Duplicates the previous state(s), including the current one
# Also fixes packet loss (e.g. a state missing)
for picked_tick in range(_latest_state_tick, received_tick):
_states[picked_tick + 1] = _states[picked_tick].duplicate(true)
Copy link
Copy Markdown
Contributor Author

@TheYellowArchitect TheYellowArchitect Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test this tomorrow (going to sleep now, its 00:58)
I remember that without this, packet losses lead to jittering and artifacts. It has been 2 weeks since I last updated this (how time flies) but I am certain that I had this codeblock which duplicates past states in order to prevent problem from packet losses (and even had some comments in the conversation tab, as to why I did this and also changed the latest_state_tick to -1)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that without this, packet losses lead to jittering and artifacts.
I think that's because you assumed that state history was contiguous, so whenever a state was missing, you had to fill the gaps.

I've updated the PR to always diff from the last received full state, not the previous tick. So history gaps are not an issue.

Copy link
Copy Markdown
Contributor Author

@TheYellowArchitect TheYellowArchitect Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change removes a big benefit of this PR, because this means a full state is sent for example every 0 to 128 ticks, but if I set it to 0 (that is the configuration I want for my game), this PR will bug because no full states are sent again and the diff state will try to compare from latest full state sent, according to what you said (I am still reading and testing, haven't confirmed yet, give me an hour or so)

Generally, the choice to send exclusively diff states after the ack should be offered (as was with the inclusion of this codeblock)


For a complete review of the the additions, give me a few hours to confirm the above and test for artifacts/jitters etc

Copy link
Copy Markdown
Contributor Author

@TheYellowArchitect TheYellowArchitect Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to always diff from the last received full state, not the previous tick. So history gaps are not an issue.

I want to offer a use-case example which is no longer possible if a missing diff property requires exclusively a full state property tick

Use-case

Our only property is position (Vector2) and let's say some boolean which constantly changes each tick and is irrelevant. We have ticks 50 to 60 for this example. Time from host to client is 2 ticks. And 2 ticks from client to host.

The server sends full state position [0,0] at tick 50.
It reaches the client at tick 52 (via rpc receive full state)
The client sends ack at tick 52, and it reaches the server at tick 54, which allows the server to start sending diffs.

Until the server receives the ack, it has sent full state positions [0,0] at tick 51, 52, 53, 54. Usual stuff so far.

The server sends diff state position [9,0] at tick 55 (reaches the client at tick 57)
The server sends diff state position [11,0] at tick 56 ( the packet is lost )
The server sends diff state position [13,0] at tick 57 ( the packet is lost )
The server sends empty diff state position at tick 58 (reaches client at tick 60)

The client at tick 60, will then set the position [0,0] it had received at tick 54 (full state tick) instead of what it had received at tick 55 aka [9,0]


Also, there is some bandwidth gain by being able to have exclusively diff states aka not sending full states between intervals

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I understand why gaps in the state buffer would not work with always diffing with the previous state. That's why I've updated the PR to not do that.

In the example above, even if you fill the gaps, you'd just end up copying the last known state, so I doubt the empty diff state would result in the expected state.

I'm not convincet at all that diffing from the previous state would be reliable with anything but lossless network conditions. The best option I see is to ack diff states too. This way, only the initial full state send is required, the regular one can be disabled.

I'm open to implementing this in a separate PR. For now, I'd like to get the first version of diff states out so we can see how it performs.

Copy link
Copy Markdown
Contributor Author

@TheYellowArchitect TheYellowArchitect Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example above, even if you fill the gaps, you'd just end up copying the last known state, so I doubt the empty diff state would result in the expected state.

The previous implementation would result in the expected state, because it bridged over the gaps gaps using the diffs. So for ticks 56, 57, 58, it would be a duplicate of state of tick 55 (which was a diff using full state of tick 54)

With this design choice implemented, I would argue against this PR because it makes the code more complicated, for negligible bandwidth gain, and also makes non-empty diffs followed up by empty diffs useless, since empty diffs will use full states (as they ignore the non-empty diffs already received)
In my tests (of previous implementation) at forest brawl, most diff states sent were empty. With the new changes, these empty states would get the properties of the full state tick of the interval, instead of the latest diff.

I think the final result with state diffs enabled (project settings) will have more jitters/artifacts than vanilla sending full states at all times, simply because some information is lost/ignored. This wasn't the case in the previous implementation, as both project settings had the same result

Copy link
Copy Markdown
Contributor Author

@TheYellowArchitect TheYellowArchitect Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This becomes more apparent, the more properties a state has:

Use-Case 2

State of 4 properties (A,B,C,D)
At tick 50, full state tick is sent
At tick 51 diff state is sent with only A updated
At tick 52 diff state is sent with only B updated
At tick 53 diff state is sent with only C updated
At tick 54, 55, 56, empty states are sent, but packet loss happens
At tick 57 empty state is sent, but without packet loss

The received client on receiving the tick 57, should get a state with the following properties:
A of tick 51
B of tick 52
C of tick 53
D of tick 50

In current implementation

#reference_tick here is 50
var reference_state = _states.get(reference_tick, {})

if (diff_state.is_empty()):
		_latest_state_tick = tick
		_states[tick] = reference_state

the received client on receiving the tick 57, gets a state with the following properties:
A,B,C,D of tick 50

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still do not feel you're familiar with this new implementation. Please make sure that you are before presenting any arguments.

Use-case 2
The received client on receiving the tick 57, should get a state with the following properties:

No. You've projected the data of the original implementation and ran it through the logic of the new implementation. That gets wrong results, but also doesn't lead anywhere.

An empty state in the new implementation means that the actual state is the same as the reference state.


Let's consider an example instead, with the following states:

tick 50: { a: 1, b: 2, c: 3 }
tick 51: { a: 8, b: 2, c: 3 }
tick 52: { a: 8, b: 9, c: 3 }
tick 53: { a: 8, b: 9, c: 7 }
tick 54: { a: 8, b: 9, c: 7 }
tick 55: { a: 8, b: 9, c: 7 }
tick 56: { a: 8, b: 9, c: 7 }

This would yield the following states sent:

tick 50, full state: { a: 1, b: 2, c: 3 }
tick 51, diff state from 50: { a: 8 }
tick 52, diff state from 50: { a: 8, b: 9 }
tick 53, diff state from 50: { a: 8, b: 9, c: 7 }
tick 54, diff state from 50: { a: 8, b: 9, c: 7 }
tick 55, diff state from 50: { a: 8, b: 9, c: 7 }
tick 56, diff state from 50: { a: 8, b: 9, c: 7 }

Remember, diffs are always calculated from the last full state sent. If let's say states 54 and 55 are lost due to packet loss, until state 56 arrives, the latest known state of 53 will be used.


Let's consider the same states, but now with always diffing with the previous state, as in the original implementation. The following states would be sent:

tick 50, full state: { a: 1, b: 2, c: 3 }
tick 51, diff state from 50: { a: 8 }
tick 52, diff state from 51: { b: 9 }
tick 53, diff state from 52: { c: 7 }
tick 54, diff state from 53: { }
tick 55, diff state from 54: { }
tick 56, diff state from 55: { }

If ticks 54 to 56 are lost, that's fine because nothing has changed. Instead, let's consider what happens when tick 52 is lost.

  • Tick 50 arrives, state is now { a: 1, b: 2, c: 3 }
  • Tick 51 arrives, state is now { a: 8, b: 2, c: 3 }
  • Tick 52 is lost, state will be backfilled as { a: 8, b: 2, c: 3 }
  • Tick 53 arrives, state is now { a: 8, b: 2, c: 7 }

The expected state for tick 53 is { a: 8, b: 9, c: 7 }, however, due to a lost message, it is now { a: 8, b: 2, c: 7 }. This is my issue with always diffing from the previous state, as I have mentioned earlier.


I think the final result with state diffs enabled (project settings) will have more jitters/artifacts than vanilla sending full states at all times, simply because some information is lost/ignored.

Strong disagree. States were already sent unreliably, gaps in the history buffer were always expected. If a state is unknown for a given tick, the latest known state is used.


I'm inclined to merge this PR in its current state. One improvement we could do in the future is to ACK states more frequently over unreliable, so there's an option for doing smaller diffs.

Copy link
Copy Markdown
Contributor Author

@TheYellowArchitect TheYellowArchitect Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the smooth explanation via the examples. You are right that I projected the data of the original implementation and ran it through the logic of the new implementation. Hence my use-case with empty diffs. So yes, my use-case 2 is outdated (assuming the code runs as advertised)

I haven't tested this new code yet, but from what you have written and from skimming the new code, this alternative implementation has the same result, but saves memory as it avoids state duplicating and filling the gaps, though it does increase bandwidth a bit. But bandwidth is negligible at this point when compared to the fact strings are sent for properties, so when #159 is implemented, there will be no bandwidth problems and huge game states will consume little bandwidth.

The expected state for tick 53 is { a: 8, b: 9, c: 7 }, however, due to a lost message, it is now { a: 8, b: 2, c: 7 }. This is my issue with always diffing from the previous state, as I have mentioned earlier.

It would be corrected by a future full state. For a workaround, I would make a feature where there is "state property redundancy" by default for 3 ticks, or a state property can be marked as "vital" or not (not being position for example, so it doesn't re-send multiple times)
That said, your implementation does solve this edge-case 👍

Copy link
Copy Markdown
Contributor Author

@TheYellowArchitect TheYellowArchitect Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One improvement we could do in the future is to ACK states more frequently over unreliable, so there's an option for doing smaller diffs.

You mean full state acks to be unreliable_ordered? Because I see diff states have unreliable_ordered acks implemented in your last commit

@elementbound
Copy link
Copy Markdown
Contributor

Added some metrics so we can track how much data we save by diff states. Only reports the number of properties, for actual data usage I don't see any other way than serializing stuff into a buffer, which is too much work for a simple metric.

image

@elementbound elementbound merged commit ba34da4 into foxssake:main Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants