Overview of the Issue
It has been noticed that VTOrc sometimes has spurious logs like - DiscoverInstance(10.10.10.10:3307) instance is nil in 0.002s (Backend: 0.002s, Instance: 0.000s), error=tablet alias is nil.
I have looked at the code and I know how this is happening. Let's say initially you have a vttablet with hostname h1, port p1, and alias a1. Then, in the VTOrc backend, you would have 1 row in vitess_tablet for this tablet having all the three values h1, p1 and a1 and you would have a record in database_instance for this tablet with the values h1, p1 in it.
Now, let's say that this tablet gets evicted by Kubernetes and it restarts on a different machine. The tablet's alias remains the same, but the host and port would change, let's say to h2 and p2.
When VTOrc tries to refresh the information from the topo-server it would see this new record for the vttablet and try to insert a row into vitess_tablet with the values h2, p2 and a1. Since there is a uniqueness constraint on alias we end up replacing the row and the first row is automatically removed. We also load the MySQL information for this tablet and populate the data in database_instance with the values h2, p2. We don't store the alias in this table, so no uniqueness constraint fails and we have both the rows in the table now!
Now, we run the check to see what all tablets we need to forget about. This check runs by looking at the tablet aliases only and since the tablet alias for the given tablet didn't change, we conclude we have nothing to forget about.
Overall, this sequence of steps leads to a row in the database_instance table that should have actually been removed and is in the table without having a corresponding row in vitess_tablet. ReadOutdatedInstanceKeys picks up on this record and tries to refresh its information, but this errors out with DiscoverInstance(10.10.10.10:3307) instance is nil in 0.002s (Backend: 0.002s, Instance: 0.000s), error=tablet alias is nil
Reproduction Steps
Described in the description.
Binary Version
Operating System and Environment details
Log Fragments
No response
Overview of the Issue
It has been noticed that VTOrc sometimes has spurious logs like -
DiscoverInstance(10.10.10.10:3307) instance is nil in 0.002s (Backend: 0.002s, Instance: 0.000s), error=tablet alias is nil.I have looked at the code and I know how this is happening. Let's say initially you have a vttablet with hostname
h1, portp1, and aliasa1. Then, in the VTOrc backend, you would have 1 row invitess_tabletfor this tablet having all the three valuesh1, p1 and a1and you would have a record indatabase_instancefor this tablet with the valuesh1, p1in it.Now, let's say that this tablet gets evicted by Kubernetes and it restarts on a different machine. The tablet's alias remains the same, but the host and port would change, let's say to
h2andp2.When VTOrc tries to refresh the information from the topo-server it would see this new record for the vttablet and try to insert a row into
vitess_tabletwith the valuesh2, p2 and a1. Since there is a uniqueness constraint onaliaswe end up replacing the row and the first row is automatically removed. We also load the MySQL information for this tablet and populate the data indatabase_instancewith the valuesh2, p2. We don't store the alias in this table, so no uniqueness constraint fails and we have both the rows in the table now!Now, we run the check to see what all tablets we need to forget about. This check runs by looking at the tablet aliases only and since the tablet alias for the given tablet didn't change, we conclude we have nothing to forget about.
Overall, this sequence of steps leads to a row in the
database_instancetable that should have actually been removed and is in the table without having a corresponding row invitess_tablet.ReadOutdatedInstanceKeyspicks up on this record and tries to refresh its information, but this errors out withDiscoverInstance(10.10.10.10:3307) instance is nil in 0.002s (Backend: 0.002s, Instance: 0.000s), error=tablet alias is nilReproduction Steps
Described in the description.
Binary Version
Operating System and Environment details
Log Fragments
No response