Add Stats to DatastreamTaskImpl#855
Conversation
surajkn
left a comment
There was a problem hiding this comment.
Just curious, why do we want to save these stats in ZK instead of simply reporting this as a task level metric ? Is it because in Ingraphs its hard/not possible to identify a specific task's metrics?
There is a limitation of number of metrics that we can emit from the container. Also, if we want to build a diagnostics command to collect the information from large clusters and analyze the data, it is difficult with the metrics. Also, these metrics are emitted only by the leader and on leader switch, these metrics will not emitted until the datastream is restarted. |
| KeyBuilder.datastreamTaskState(_cluster, task.getConnectorType(), task.getDatastreamTaskName()); | ||
| _zkclient.ensurePath(taskStatePath); | ||
|
|
||
| // save the task stats. |
There was a problem hiding this comment.
Update the task node's directory structure in the method description above. This is a new subdirectory "stats" under the task, correct?
There was a problem hiding this comment.
No, "stats" directory will be inside "state" directory and will be conditional.
| DatastreamTaskImpl newTask) { | ||
| PartitionAssignmentStatPerTask stat = PartitionAssignmentStatPerTask.fromJson(((DatastreamTaskImpl) task).getStats()); | ||
| if (partitionInfoMap.isEmpty()) { | ||
| stat.isThroughputRateLatest = false; |
There was a problem hiding this comment.
Does it make sense to have a timestamp field here instead of having the latest flag, so that we get a sense of the last partition throughput distribution more accurately?
There was a problem hiding this comment.
Yes, we can add timestamp. We still need the latest flag, because not all the partition assignments will use Throughput based balancing.
There was a problem hiding this comment.
I will address it separately.
...rver/src/main/java/com/linkedin/datastream/server/assignment/LoadBasedPartitionAssigner.java
Outdated
Show resolved
Hide resolved
.../src/test/java/com/linkedin/datastream/server/assignment/TestLoadBasedPartitionAssigner.java
Outdated
Show resolved
Hide resolved
We frequently hear need to get some of the task level metrics for diagnostics that can be retrieved using the brooklin-service end-point. LoadBasedPartitionAssignmentStrategy distributes the partitions evenly based on the load. To be able to debug and validate the distribution, it is important to be able to pull out the metrics at task level and perform offline analytics on the data. This PR exposes a new knob stats that can used to save the task level stats on the zookeeper and can be used to retrieve similar to other end-points.
We frequently hear need to get some of the task level metrics for diagnostics that can be retrieved using the brooklin-service end-point.
LoadBasedPartitionAssignmentStrategy distributes the partitions evenly based on the load. To be able to debug and validate the distribution, it is important to be able to pull out the metrics at task level and perform offline analytics on the data.
This PR exposes a new knob stats that can used to save the task level stats on the zookeeper and can be used to retrieve similar to other end-points.