Add LastSyncTime to volume replication status#232
Conversation
| // remove the prefix keys in volume replication class parameters | ||
| parameters := filterPrefixedParameters(replicationParameterPrefix, vrcObj.Spec.Parameters) | ||
| schedulingTime := parameters["schedulingInterval"] | ||
| scheduleTime, _ := time.ParseDuration(schedulingTime) |
There was a problem hiding this comment.
describe the schedulingInterval parameter somewhere
8b345f5 to
13f6350
Compare
| message GetVolumeReplicationInfoResponse{ | ||
| // Holds the last sync time. | ||
| // This field is REQUIRED. | ||
| int64 last_sync_time = 1; |
There was a problem hiding this comment.
There was a problem hiding this comment.
This is still not implemented,
with this change, we can remove some code for conversion in other functions too.
| schedulingTime := parameters["schedulingInterval"] | ||
| scheduleTime, _ := time.ParseDuration(schedulingTime) |
There was a problem hiding this comment.
There was a problem hiding this comment.
Yes it can be empty and it's optional
| } | ||
|
|
||
| instance.Status.LastCompletionTime = getCurrentTime() | ||
| var last_sync_time int64 |
| instance.Status.LastCompletionTime = getCurrentTime() | ||
| var last_sync_time int64 | ||
| var requeueForInfo bool | ||
| if instance.Spec.ReplicationState == replicationv1alpha1.Primary { |
There was a problem hiding this comment.
Is it required only for primary state?
There was a problem hiding this comment.
Yes, we need to resync only when it is in primary state.
There was a problem hiding this comment.
The primary end has the status, that it reads from the secondary, and so we use that to update the lastSyncTime.
In cases where we demoted a Primary and are waiting for both Secondaries to be in sync, the timestamp is still useful. Tells us how far back the sync is, before either image can be gracefully promoted.
The demoted snapshot sync time is not critical at present, it would also just show that the lastSyncTime is in the past as compared to the demoted snapshot, and ideally once the demoted snapshot is synced the timestamps would be the same. So not sure if it is useful as such.
Needs more thought on how to use that information, hence sticking to when image is primary at one end is useful.
| // remove the prefix keys in volume replication class parameters | ||
| parameters := filterPrefixedParameters(replicationParameterPrefix, vrcObj.Spec.Parameters) | ||
| schedulingTime := parameters["schedulingInterval"] | ||
| scheduleTime, _ := time.ParseDuration(schedulingTime) |
There was a problem hiding this comment.
We want to update the last sync time after every interval and that is, scheduling interval and hence, it's required.
There was a problem hiding this comment.
from VolumeReplication CR schedulingInterval is optional
There was a problem hiding this comment.
okay, thanks for notifying, will check on that and update the PR.
| return &metav1NowTime | ||
| } | ||
|
|
||
| func getTime(time time.Time) *metav1.Time { |
There was a problem hiding this comment.
No need for a function for this one.
13f6350 to
23037e5
Compare
| parameters := filterPrefixedParameters(replicationParameterPrefix, vrcObj.Spec.Parameters) | ||
| schedulingTime := parameters["schedulingInterval"] | ||
| if schedulingTime == "" { | ||
| schedulingTime = "1m" |
There was a problem hiding this comment.
define it as a constant ?
and write elaborate how it will be used.
There was a problem hiding this comment.
If the schedule is not set for this VR, then the pool level schedule (if available) should be used for the reconcile requeue. If that is not available then we can either NOT requeue or requeue for a duration till the last sync time is not empty (as post that there would be no further syncs).
There is a possibility that a pool level schedule is added later, once we make the above not to requeue decision, but that would either require a restart of the plugin (so that all VRs are reconciled), or reconciling based on added pool level schedules, or waiting for the periodic reconcile (24h) at which time this will be picked up.
We possibly want to track the above for the future, for now if there is no schedule we can default to a something (1m/5m), as the current DR stack comes with a schedule and should not break the functionality.
| message GetVolumeReplicationInfoResponse{ | ||
| // Holds the last sync time. | ||
| // This field is REQUIRED. | ||
| int64 last_sync_time = 1; |
There was a problem hiding this comment.
This is still not implemented,
with this change, we can remove some code for conversion in other functions too.
6478936 to
817bea4
Compare
| return 0, resp.Error | ||
| } | ||
|
|
||
| infoResponse, ok := resp.Response.(*proto.GetVolumeReplicationInfoResponse) | ||
| if !ok { | ||
| err := fmt.Errorf("received response of unexpected type") | ||
| vr.logger.Error(err, "unable to parse response") | ||
|
|
||
| return 0, err |
There was a problem hiding this comment.
| return 0, resp.Error | |
| } | |
| infoResponse, ok := resp.Response.(*proto.GetVolumeReplicationInfoResponse) | |
| if !ok { | |
| err := fmt.Errorf("received response of unexpected type") | |
| vr.logger.Error(err, "unable to parse response") | |
| return 0, err | |
| return nil, resp.Error | |
| } | |
| infoResponse, ok := resp.Response.(*proto.GetVolumeReplicationInfoResponse) | |
| if !ok { | |
| err := fmt.Errorf("received response of unexpected type") | |
| vr.logger.Error(err, "unable to parse response") | |
| return nil, err |
| metav1Time := metav1.NewTime(lastSyncTime1) | ||
| instance.Status.LastSyncTime = &metav1Time | ||
| requeueForInfo = true | ||
|
|
| scheduleTime, err := time.ParseDuration(schedulingTime) | ||
| if err != nil { | ||
| logger.Error(err, "failed to parse time: %v", schedulingTime) | ||
| } |
There was a problem hiding this comment.
Take a look at these lines again.
| schedulingTime := parameters["schedulingInterval"] | ||
| if schedulingTime == "" { | ||
| schedulingTime = "1m" | ||
| } | ||
| scheduleTime, err := time.ParseDuration(schedulingTime) |
There was a problem hiding this comment.
use rawScheduleTime instead schedulingTime,
it is confusing.
| parameters := filterPrefixedParameters(replicationParameterPrefix, vrcObj.Spec.Parameters) | ||
| schedulingTime := parameters["schedulingInterval"] | ||
| if schedulingTime == "" { | ||
| schedulingTime = "1m" |
|
|
||
| var requeueForInfo bool | ||
| if instance.Spec.ReplicationState == replicationv1alpha1.Primary { | ||
| lastSyncTime, err := r.getVolumeReplicationInfo(vr) |
There was a problem hiding this comment.
maybe call this info instead of lastSyncTime
| } | ||
|
|
||
| // getVolumeReplicationInfo gets volume replication info. | ||
| func (r *VolumeReplicationReconciler) getVolumeReplicationInfo(vr *volumeReplicationInstance) (*timestamppb.Timestamp, error) { |
There was a problem hiding this comment.
this does not return an info kind of object? return something like what volumeReplication.GetInfo() does, or rename the function to return a LastSyncTimestamp?
There was a problem hiding this comment.
yeah for now it just returns last sync time, but this name is given just with the thought of future implementation, we may like to return other informations along with the last sync time, hence don't want it to be specific.
There was a problem hiding this comment.
The current state is confusing, so you will have to decide what this function does. It can be renamed once it gets extended, no need to prepare function names for things in the future. Leave a comment with the possibility if you think it is useful.
| if err != nil { | ||
| logger.Error(err, "Failed to get volume replication info") | ||
| } | ||
| lastSyncTime1 := lastSyncTime.AsTime() |
There was a problem hiding this comment.
if this can not fail, just place it in metav1Time := metav1.NewTime(lastSyncTime.AsTime()).
metav1Time sounds more like a package name, similar to metav1. This is just a variable, so why not call it time?
| instance.Status.LastCompletionTime = getCurrentTime() | ||
|
|
||
| var requeueForInfo bool | ||
| if instance.Spec.ReplicationState == replicationv1alpha1.Primary { |
There was a problem hiding this comment.
@Madhu-1 If an image is in the split-brain state, is there a TS that is usable?
There was a problem hiding this comment.
AFAIK no during split-brain no sync will happen this is not useful.
There was a problem hiding this comment.
In split-brain, I do not assume one of the images is marked Primary? If that is the case, the lastSyncTime should be the time of the last successful sync, maybe.
There was a problem hiding this comment.
since we are updating it in every interval, I guess if we reach split brain case, it would already have that, cc @ShyamsundarR
There was a problem hiding this comment.
The question really is what does RBD report, and then what should the proto report. In the case of split-brain we should report not-synced time (which IMHO is the "0" timestamp), or if available the last time it was synced...
There was a problem hiding this comment.
I agree, on a sync failure (like split-brain), no LastSyncTime should be reported, but ideally an error. Either the automatic rescheduling should stop, or be done with an exponential backoff.
There was a problem hiding this comment.
When the image is in replaying state with RBD we get the additional description, if stopped or split-brain the additional data is not available, and we will hence default to the "0" time stamp. We should be fine with that.
|
|
||
| var requeueForInfo bool | ||
| if instance.Spec.ReplicationState == replicationv1alpha1.Primary { | ||
| lastSyncTime, err := r.getVolumeReplicationInfo(vr) |
There was a problem hiding this comment.
Just a heads up!
The last sync time can be "0" or empty, i.e it was never synced even once, we need to determine how that value would be reflected. One option is to set it to the lower bound of UTC epoch (I am a bit confused on what Timestamp holds, UTC epoch (since 1970), or since 0001-01-01T00:00:00Z)
e9ad9e1 to
bccc4ef
Compare
bccc4ef to
7e294fc
Compare
| rawScheduleTime := parameters["schedulingInterval"] | ||
| if rawScheduleTime == "" { | ||
| rawScheduleTime = "1m" | ||
| } | ||
| scheduleTime, err := time.ParseDuration(rawScheduleTime) | ||
| if err != nil { | ||
| logger.Error(err, "failed to parse time: %v", rawScheduleTime) | ||
| return ctrl.Result{}, err | ||
| } |
There was a problem hiding this comment.
| rawScheduleTime := parameters["schedulingInterval"] | |
| if rawScheduleTime == "" { | |
| rawScheduleTime = "1m" | |
| } | |
| scheduleTime, err := time.ParseDuration(rawScheduleTime) | |
| if err != nil { | |
| logger.Error(err, "failed to parse time: %v", rawScheduleTime) | |
| return ctrl.Result{}, err | |
| } | |
| const defaultScheduleTime = time.Minute | |
| func getScheduleTime(parameters map[string]string) (time.Duration) { | |
| rawScheduleTime := parameters["schedulingInterval"] | |
| if rawScheduleTime == "" { | |
| return defaultScheduleTime | |
| } | |
| scheduleTime, err := time.ParseDuration(rawScheduleTime) | |
| if err != nil { | |
| logger.Error(err, "failed to parse time: %v", rawScheduleTime) | |
| return defaultScheduleTime | |
| } | |
| return scheduleTime | |
| } |
Create a constant and a function like above and use it .
ca61302 to
e4bc1d9
Compare
|
|
||
| scheduleTime := getScheduleTime(parameters, logger) | ||
| if requeueForInfo { | ||
| logger.Info("volume is primary, requeuing to get volume replication info") |
There was a problem hiding this comment.
logging is not much helpful. Remove it on add the next queue interval time. and also move line 388 inside the if condition
| } | ||
|
|
||
| // getVolumeReplicationInfo gets volume replication info. | ||
| func (r *VolumeReplicationReconciler) getLastSyncTime(vr *volumeReplicationInstance) (*timestamppb.Timestamp, error) { |
There was a problem hiding this comment.
IMO we should not be doing this. Currently, we have only one parameter in the GetInfo this looks fine, If we add more parameters we need to write a new function for it. My suggestion is to return a complete response infoResponse and use it in caller
| ResyncVolume(volumeID, replicationID string, force bool, secretName, secretNamespace string, parameters map[string]string) (*proto. | ||
| ResyncVolumeResponse, error) | ||
| // GetVolumeReplicationInfo RPC call to get volume replication info. | ||
| GetVolumeReplicationInfo(volumeID, replicationID string, secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error) |
There was a problem hiding this comment.
| GetVolumeReplicationInfo(volumeID, replicationID string, secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error) | |
| GetVolumeReplicationInfo(volumeID, replicationID , secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error)``` |
| func (rc *replicationClient) GetVolumeReplicationInfo(volumeID, replicationID string, | ||
| secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error) { |
There was a problem hiding this comment.
| func (rc *replicationClient) GetVolumeReplicationInfo(volumeID, replicationID string, | |
| secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error) { | |
| func (rc *replicationClient) GetVolumeReplicationInfo(volumeID, replicationID, | |
| secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error) { |
| } | ||
|
|
||
| lastsynctime := resp.GetLastSyncTime() | ||
| if lastsynctime == nil { |
There was a problem hiding this comment.
in case there is no local snapshot timestamp and the csi driver return nil.
d93d7e8 to
7418c6f
Compare
| pvcDataSource = "PersistentVolumeClaim" | ||
| volumeReplicationClass = "VolumeReplicationClass" | ||
| volumeReplication = "VolumeReplication" | ||
| defaultScheduleTime = time.Minute |
There was a problem hiding this comment.
Every minute sounds very often as a default, how about hourly? Consider that there can be many volumes, and the LastSyncTime may not be critical information. In environments with hundred (to name a random number) volume, users should not run into high resource consumption because of the constant reconciling.
For environments that require more strict updates on syncing, the parameter can be used to change it.
| instance.Status.LastCompletionTime = getCurrentTime() | ||
|
|
||
| var requeueForInfo bool | ||
| if instance.Spec.ReplicationState == replicationv1alpha1.Primary { |
There was a problem hiding this comment.
I agree, on a sync failure (like split-brain), no LastSyncTime should be reported, but ideally an error. Either the automatic rescheduling should stop, or be done with an exponential backoff.
| } | ||
| lastSyncTime := info.GetLastSyncTime() | ||
|
|
||
| lastsynctime := metav1.NewTime(lastSyncTime.AsTime()) |
There was a problem hiding this comment.
very confusing naming, there is lastSyncTime and lastsynctime. If you really need both, make sure to call them differently (maybe ts or timestamp and lastSyncTime or just t).
This commit adds lastsynctime to volume replicationstatus and updates the crds. Signed-off-by: yati1998 <ypadia@redhat.com>
This commit creates internal grpc to get volume replication information. Signed-off-by: yati1998 <ypadia@redhat.com>
this commit updates the vendor to get latest spec. Signed-off-by: yati1998 <ypadia@redhat.com>
42af0cf to
f0c7f15
Compare
|
@nixpanic @ShyamsundarR @Rakshith-R @Madhu-1 done with all the changes here as well. |
| if instance.Spec.ReplicationState == replicationv1alpha1.Secondary { | ||
| instance.Status.LastSyncTime = nil | ||
| } |
There was a problem hiding this comment.
This should go at line 382,
otherwise there will be no update to status.
f0c7f15 to
31c190b
Compare
nixpanic
left a comment
There was a problem hiding this comment.
Just the documentation for the new parameter needs to be added. Otherwise it looks good to me.
This commit adds reconcile logic to update the last sync time. Signed-off-by: yati1998 <ypadia@redhat.com>
31c190b to
07aff99
Compare
|
If the RPC is not implemented, this will be a breaking change. we need to address it later |
There was a problem hiding this comment.
@yati1998 ,can you open an issue to track handling NotImplemented Error ?
We can set lastSyncTime to nil if getvolrepInfo returns this error.
We'll need to fix this before another release of csi-addons operator.
Syncing latest changes from upstream main for kubernetes-csi-addons
This PR adds lastsynctime to volumereplication status and updates the reconcile logic to get the LastSynctime