Skip to content

Arbitrary Counts and Lists for GameServers, SDKs and Allocation #2716

@markmandel

Description

@markmandel

Objective

With the recent work with Player Tracking, as well as it’s cross over into High Density Game Server support / re-allocation of Allocated GameServers, it seems that to be able to provide arbitrary count values and/or lists of values that are tied to GameServers, much like Player Tracking values are right now, is very useful for a wide variety of use cases.

This feature design’s contention is to replace Player Tracking with a generic way to track general counts as well as lists against a GameServer by an user provided key, as well as with integrated allocation, Fleet scheduling and SDK support, such that it can support the use case of player tracking as it currently stands, but also use cases like multi-tenant room server counting, or any other game specific value that could be utilised for a custom integration.

An added benefit would be that simple gauge data as metrics would be exposed as well, although we may not want to advocate this as a blessed path for only exporting metrics, if not taking advantage of other functionality.

This feature would be built behind the GameServerCountsAndLists feature gate and should be on-par with PlayerTracking before the PlayerTracking functionality is removed.

Requirements

  • Define on a GameServer a set of attached lists and/or counters attached to an arbitrary, user supplied key
  • Out of scope: The ability to add/edit or delete GameServer keys for counters and list at runtime. Keys should be explicitly predefined with the GameServer definition to put some limits on what can be stored against fuetcd and ideally avoid overloading the Kubernetes API control plane (although we will need strong documentation about this, as this will definitely put extra load on the control plane).

Counters

  • Counters can have an initial value (0 is the default).
  • Counters can have a set capacity (maximum value), but by default are 0 (max of int64).
    • We deliberately are using the term “capacity” across both list and counters to be consistent between the two pieces of functionality..
  • Incrementation / decrementation below 0 or above set capacity will be a no-op. I.e. No operations to increment/decrement a counter will error.
  • Counters must be >= 0
  • SDK capability to atomically get, increment, decrement and set a counter local value, which is then set to the backing GameServer CRD status.
    • Note: There are race conditions we can’t avoid between SDK updates and Allocation or external updates.
  • SDK capability to change the maximum value for a counter.
  • The ability to atomically increment or decrement counts on allocation
    • If a user wants to ensure there is room for the increment or decrement, that should be explicitly included in the filter options (i.e. decrement by one, but filter for counts that are > 0 so that there is something to decrement).
    • If an attempt is made to increment/decrement a GameServer that does have the specified counter (e.g. through an allocation), the operation is ignored.

Lists

  • Can set a capacity. Defaults to 1000. dsa
  • Capacity can be no longer than 1000 items. This could possibly be expanded in the future depending on use cases and/or performance.
  • SDK capability to atomically add, remove and check if values are in a list’s local value, which is then set to the backing GameServer CRD status.
    • Note: There are race conditions we can’t avoid between SDK updates and Allocation or external updates.
  • SDK capability to change the capacity (local and backing CRD status value)
  • The ability to atomically add items to list on allocation
    • Attempts to add to a list that is at capacity, will silently fail, since all operations are asynchronous. If you need to ensure there is space for an append operation, check with filters and/or the SDK first.
    • If an attempt is made to append to a GameServer that does have the specified list (e.g. through an allocation), the operation is ignored.
  • The ability to change the capacity from an allocation
  • Lists are essentially sorted Set in the order of insertion, i.e. a List cannot contain more than one instance of a value. An attempt to insert a duplicate item into a List will result in a no-op.

Allocation filtering and sorting

  • Allocation filter on count value (min, max)
  • Allocation filter on count available capacity (min, max)
  • Allocation filter on list available capacity (min, max)
  • Allocation filter on if single value is contained in a list
  • Allocation sorting / preference by a count value/list length, ascending or descending.
    • Packed: Within the node.
    • Distributed: Across the entire set.

Fleets scheduling

  • Fleet scale down sorting by a count value/list length, ascending or descending.
    • Packed: Within the node.
    • Distributed: Across the entire set.
  • Fleet scale down sorting
    • Packed: Within the node.
    • Distributed: Across the entire set.

Metrics

  • Expose count values and list lengths as gauge metric, with a label for the key the count or list is set under.
  • Expose counts and list capacities as gauge metric, with a label for the key the count or list is set under.

Background

There have been a lot of discussions and issues about weighted allocation, being able to store “session room” counts to be used on allocation, and more (more on Slack as well), sorting on Fleet scale down.

We’ve also always had a desire to be able to set some level of metrics through Agones from a GameServer as well.

Design ideas

Configuration

GameServers

Being able to set arbitrary counts and lists on a GameServer instance.

apiVersion: "agones.dev/v1"
kind: GameServer
metadata:
  generateName: "simple-game-server-"
spec:
  ports:
    - name: default
      portPolicy: Dynamic
      containerPort: 7654
  template:
    spec:
      containers:
        - name: simple-game-server
          image: gcr.io/agones-images/simple-game-server:0.13
  counters: # list of counters. Key value below is the key for each counter.
    rooms: # key for the counter (room)
      default: 1 # initial value
      capacity: 100 # maximum possible count value
  lists: # list of lists.
    players: # key for this list (players)
      capacity: 100 # maximum number of items in a list
    frogs: # key for another list (frogs), with the default 1000 item capacity

GameServer Status

This is where current count and list value and capacity are stored against the CRD. The values in the spec do not change once they have been initially declared.

status:
  # .. usual status values
  counters: # count values
    rooms: 4 # Current count for "room" key
    capacity: 100 # maximum value for "room" key
  lists: # list values
    players: # values for key "players"
      capacity: 100 # the current capacity as it has been set.
      values: # list of values set against this list
        - xe9m
        - 9iuz
    frogs: # values for key "frogs"
      values:
        - blue
        - green
        - orange

Fleets

apiVersion: "agones.dev/v1"
kind: Fleet
metadata:
 name: simple-game-server
spec:
 replicas: 2
 priorities: # which gameservers in the Fleet are most important to keep around - impacts scale down logic
   - type: count # whether a count or a list. List uses the length as the value, count the current count value.
     key: room # The key to grab data from. If not found on the GameServer, those GameServer with the key will have priority over those that do not.
     order: ascending # default is "ascending" so bigger number is better. "descending" would be "smaller number is better".
 template:
   spec:
     ports:
       - name: default
         containerPort: 7654
     template:
       spec:
         containers:
           - name: simple-game-server
             image: gcr.io/agones-images/simple-game-server:0.13
     counters: # list of counters. Key value below is the key for each counter.
       rooms: # key for counter (room)
         default: 1
         capacity: 100
     lists: # list of lists.
       players: # key for this list (players)
         capacity: 100 # set capacity
       frogs: # key for another list (frogs), with the default 1000 item capacity

Status

status:
 # ... usual fleet status values
 counters: # aggregate counter values
   rooms:
     total: 43 # total of count values for key "rooms"
     capacity: 100 # total capacity count in all GameServers across the fleet "rooms" key
 lists: # aggregate list values
   players:
     count: 58 # total number of list items in all GameServers across the Fleet under "player" key
     capacity: 200 # total capacity count in all GameServers across the Fleet "player" key
   frogs:
     count: 12
     capacity: 88

FleetAutoscaling

Count based autoscaling

apiVersion: "autoscaling.agones.dev/v1"
kind: FleetAutoscaler
metadata:
 name: fleet-autoscaler-count
spec:
 fleetName: fleet-example
 policy:
   type: Count # count based autoscaling
   count:
     # The key for the count value.
     key: rooms
     # Size of a buffer of counted items that are available in the Fleet.
     # it can be specified either in absolute (i.e. 5) or percentage format (i.e. 5%)
     bufferCount: 5
     # minimum aggregate count capacity that can be provided by this FleetAutoscaler.
     # if not specified, the actual minimum capacity will be bufferCount
     minCount: 10
     # maximum aggregate count capacity that can be provided by this FleetAutoscaler.
     # required
     maxCount: 100

List based autoscaling

apiVersion: "autoscaling.agones.dev/v1"
kind: FleetAutoscaler
metadata:
  name: fleet-autoscaler-list
spec:
  fleetName: fleet-example
  policy:
    type: List # List based autoscaling.
    count:
      # The key for the count value.
      key: players
      # Size of a buffer based on the list capacity that is available over the current aggregate list length in the Fleet.
      # It can be specified either in absolute (i.e. 5) or percentage format (i.e. 5%)
      bufferLength: 5
      # minimum aggregate list capacity that can be provided by this FleetAutoscaler.
      # if not specified, the actual minimum capacity will be bufferLength
      minLength: 10
      # maximum aggregate list capacity that can be provided by this FleetAutoscaler.
      # required
      maxLength: 100

Allocations

kind: GameServerAllocation
spec:
 # Which gameservers in the selector set is most important to keep around - impacts which GameServer is checked first.
 # First item on the array of priorities is the most important for sorting.
 priorities:
   - type: count # whether a count or a list. List uses the length as the value, count the current count value.
     key: room # The key to grab data from. If not found on the GameServer, has no impact.
     order: ascending # default is "ascending" so bigger number is better. "descending" would be "smaller number is better".
 selectors:
   - matchLabels:
         agones.dev/fleet: simple-game-server
     counters: # filter on counter min and max values
       rooms: # use "room" key values
         min: 4 # filters on count values (optional, defaults to 0)
         max: 20 # (optional, defaults to max int)
         minAvailable: 0 # filters on the capacity left on a GameServer (optional, defaults to 0)
         maxAvailable: 99 # (optional, defaults to max int)
     lists: # filter on lists
       players:
         minAvailable: 0 # filters on the capacity left on a GameServer
         maxAvailable: 99
       frogs:
         contains: orange # filter on if this value is found in the list.
 counters: # apply an action to a counter
   rooms:
     action: increment # "increment" or "decrement" a count.
     amount: 1 # how much by. defaults to 1.
 lists: # apply an action to a list.
   players:
     append: # (optional) append these values to the list
       - x7un
       - 8inz
     capacity: 40 # (optional) change the capacity of the GameServer to this value.

SDK

The SDK will batch operations every 1 second for performance reasons, but changes made through the SDK will be atomically accurate through the SDK. Changes made through Allocation or the Kubernetes API will be eventually consistent when coming back to the SDK.

Question: In PlayerTracking, we told users to either use the K8s API or use the SDK commands. Can we do that here? Should we do that here? I’d like to avoid it with the strategy written above.

Counter

All functions will error if the key was not predefined in the GameServer resource on creation.

Alpha().CountGet(key): integer

Returns the current count under the provided key.

Alpha().CountIncrement(key, amount): boolean

Increment a counter by a given amount. Will max at max(int64).

Will execute the increment operation against the current CRD value.

Returns false if the count is at the current capacity (to the latest knowledge of the SDK), and no increment will occur.

Note: A potential race condition here is that if count values are set from both the SDK and through the K8s API (Allocation or otherwise), since the SDK append operation back to the CRD value is batched asynchronous any value incremented past the capacity will get silently truncated.

Alpha().CountDecrement(key, amount): boolean

Decrements the current count by the provided amount. Will not go below 0.

Will execute the decrement operation against the current CRD value.

Returns false if the count is at 0 (to the latest knowledge of the SDK), and no decrement will occur.

Alpha().CountSet(key, amount)

Sets a count at a given value. Use with care, as this will overwrite any previous invocations’ value.

Alpha().CountSetCapacity(key, capacity)

Update the capacity for a given count. A capacity of 0 is no capacity.

Alpha().CountGetCapacity(key): integer

Get the current capacity for this specific count.

Lists

All functions will error if the key was not predefined in the GameServer resource on creation.

Alpha().ListAppend(key, value): boolean

Appends the provided value to the list. If the list is already at capacity, it will return an error.

Will retrieve the current CRD value before executing the append operation.

Returns false, if the value already exists in the list, or if the list is already at capacity (to the latest knowledge of the SDK).

Note: A potential race condition here is that of list values are set from both the SDK and through the K8s API (Allocation or otherwise), since the SDK append operation back to the CRD value is batched asynchronous any value appended past the capacity will get silently truncated.

Alpha().ListDelete(key, value): boolean

Delete the specified value from the list.

Returns false if the value is not found in the list (to the latest knowledge of the SDK),

Alpha().ListSetCapacity(key, capacity)

Update the capacity for a given list. Capacity must be between 1 and 1000.

Alpha().ListGetCapacity(key): integer

Get the current capacity for this specific list.

Alpha().ListContains(key, value): boolean

Returns true if the given list contains a provided value.

Alpha().ListLength(key, value): integer

Returns the current length of the given list.

Alpha().ListGet(key): []string

Returns the contents of the given list.

Metrics

Metrics should be exported, using the key that the metric is stored under as a label on the metrics, in aggregate across all GameServers, giving us the ability to export basic numeric values as gauge metrics.

The Fleet name as a label attached to each metric.

Counters

Total of all counters on all GameServers, by key

agones_gameservers_counter_total[key=${key}]

Total count capacity of all GameServers, by key

agones_gameservers_counter_capacity_total[key=${key}]

Lists

Total number of items in each list, by key of all GameServers

agones_gameservers_list_length_total[key=${key}]

Total list capacity of all GameServers, by key

agones_gameservers_list_capacity_total[key=${key}]

Dashboards

Since we are using labels, we can create some generic dashboards with dropdowns for each fleet, and names for counts and lists.

Critical User Journeys

Some high level summaries for some user journeys that could be utilised with this new functionality.

Player Tracking

Player tracking could be implemented in essentially the same way that is possible now, but we could also take an approach that could reserve player connections at allocation time.

An end user could now add a player at allocation time to the GameServer, blocking that space for the player. A gameserver binary could watch for that addition, then wait a determined amount of time before removing it from a “players” list if that player has not yet connected.

For example:

kind: GameServerAllocation
spec:
 selectors:
   - matchLabels:
         agones.dev/fleet: simple-game-server
     lists: # filter on lists
       players:
         minAvailable: 0 # filters on the capacity left on a GameServer
         maxAvailable: 99
     gameServerState: Allocated
   - matchLabels:
       agones.dev/fleet: simple-game-server
     gameServerState: Ready
 lists: # apply an action to a list.
   players:
     append: # (optional) append these values to the list
       - x7un

Room based High Density Game Servers

This could now be handled as an integer value as a count, or as a list with individual room ids.

A count based Allocation could look something like:

kind: GameServerAllocation
spec:
 priorities: # which gameservers in the selector set is most important to keep around - impacts which GameServer is checked first.
   - type: count # whether a count or a list. List uses the length as the value, count the current count value.
     key: room # The key to grab data from. If not found on the GameServer, has no impact.
     order: ascending # default is "ascending" so bigger number is better. "descending" would be "smaller number is better".
 selectors:
   - matchLabels:
         agones.dev/fleet: simple-game-server
     counters: # filter on counter min and max values
       rooms: # one room available, against capacity
         minAvailable: 1
         maxAvailable: 1
     gameServerState: Allocated
   - matchLabels:
       agones.dev/fleet: simple-game-server
     gameServerState: Ready
 counters: # apply an action to a counter
   rooms:
     action: increment # "increment" or "decrement" a count.
     amount: 1 # how much by. defaults to 1.

This would prioritise allocation to server that have more rooms currently running, and increment the value of the room count at allocation time, which could be picked up on by SDK.WatchGameServer()

A list based Allocation could look something like:

kind: GameServerAllocation
spec:
 priorities: # which gameservers in the selector set is most important to keep around - impacts which GameServer is checked first.
   - type: list # whether a count or a list. List uses the length as the value, count the current count value.
     key: room # The key to grab data from. If not found on the GameServer, has no impact.
     order: ascending # default is "ascending" so bigger number is better. "descending" would be "smaller number is better".
 selectors:
   - matchLabels:
         agones.dev/fleet: simple-game-server
     lists: # filter on lists
       rooms:
         minAvailable: 1 # 1 room available, please
         maxAvailable: 1
     gameServerState: Allocated
   - matchLabels:
       agones.dev/fleet: simple-game-server
     gameServerState: Ready
 lists: # apply an action to a list.
   rooms:
     append: # (optional) append these values to the list
       - x7un

If you then wanted to allocate to a the GameServer with the specific Room session, you could do the following:

kind: GameServerAllocation
spec:
 selectors:
   - matchLabels:
         agones.dev/fleet: simple-game-server
     lists: # filter on lists
       rooms:
         contains: x7un # filter on if this value is found in the list.
     gameServerState: Allocated
   - matchLabels:
       agones.dev/fleet: simple-game-server
     gameServerState: Ready

Note: An end user could still use the “label locking” method for high density game servers as well / still. This just provides another way to solve the same problem that may be more applicable for some use cases.

Game Specific Weight allocation

With this new functionality, if you wanted to prioritise Allocation based on how many blueberries were available in your game server (or any arbitrary thing) , you could now do this as well. I’ve had conversations with people on how to preferentially “Allocate to the most interesting GameServer” - this would allow you to do exactly that, through an arbitrary counter tracking at the GameServer level.

For example:

kind: GameServerAllocation
spec:
 priorities: # which gameservers in the selector set is most important to keep around - impacts which GameServer is checked first.
   - type: count # whether a count or a list. List uses the length as the value, count the current count value.
     key: blueberries # The key to grab data from. If not found on the GameServer, has no impact.
     order: ascending # more blueberries, is better
 selectors:
   - matchLabels:
         agones.dev/fleet: simple-game-server
     gameServerState: Allocated
   - matchLabels:
       agones.dev/fleet: simple-game-server
     gameServerState: Ready

The blueberries key would then be incremental and decremental with Alpha().CountIncrement(key, amount) and Alpha().CountDecrement(key, amount) as necessary from within the game server binary as needed.

Alternatives considered

We could continue having specific integrations for each specific use case -- much like we did for player tracking. Personally, this is what often dissuaded me from adding more specific solutions to specific problems in many of the tickets above -- their specificity. i.e. “This solution works for this specific problem”. I personally prefer more generic solutions that can power a wide multitude of solutions. I genuinely believe that Agones’ power comes from its configurability and flexibility. That tradeoff does come with a higher cost for integration and greater overall complexity of the stack, but I don’t think the project would be as successful as it is without that flexibility.

I think the difference in player tracking was that it felt generic “enough” across use cases that it made sense. But I think this new approach is even more generic in its approach, and allows for a much wider set of use cases (probably ones we haven’t thought of yet), without need to build out yet another CRD and SDK implementation, and without sacrificing capability (in fact I think it adds capability). Which is also why I’m quite excited about it.


Work Items

List of individual work items on this design, so it doesn't seem so overwhelming 😃

API Surfaces

This is not implementation, this is creating placeholders for data, CRD structures, proto API definitions, and stubs for SDK methods.

  • Feature Flag creation
  • CRD Updates
    • GameServer CRD updates
    • GameServerSet CRD updates
    • Fleet CRD updates
    • FleetAutoscaling CRD updates
    • GameServerAllocation CRD Updates
  • .proto updates
    • Allocation .proto updates
    • Alpha SDK .proto updates and stub methods on SdkServer

Implementation

Building functionality on top of the API surfaces that have been
built out above.

  • Defaults
    • Defaults for counts on GameServerSpec
    • Defaults for lists on GameServerSpec
    • Population of GameServer -> Status on creation
  • Validation
    • Validation or counts on GameServerSpec
    • Validation for lists on GameServerSpec
  • Fleets
    • Fleet status aggregate values (also with GameServerSet)
    • Fleet scale down prioritisation
  • Autoscaling
    • FleetAutoscaling based on a count
    • FleetAutoscalong based on a list
  • GameServerAllocation
    • Conversion from .proto allocation to a GameServerAllocation
    • GameServer selection prioritisation
    • Allocation filtering on counts
    • Allocation filteirng on lists
    • Allocation actions on counts (increment / decrement)
    • Allocation actions on lists (append)
    • Allocation change capacity on counts
    • Allocation change capacity on lists
  • SDK Implementation
  • Metrics
    • Expose metrics
    • (Optional) Create a generic dashboard based on the labels we use with our metrics.
  • Other language SDKs
    • Rust SDK implementation and conformance tests
    • C# SDK implementation and conformance tests
    • node.js SDK implementation and conformance tests
    • REST conformance tests
    • CPP implementation and conformance tests
    • Unity implementation and conformance tests
    • Unreal implementation and conformance tests #3651

Metadata

Metadata

Assignees

Labels

awaiting-maintainerBlock issues from being stale/obsolete/closedkind/designProposal discussing new features / fixes and how they should be implementedkind/featureNew features for Agones

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions