-
Notifications
You must be signed in to change notification settings - Fork 0
feat: implement failed mutations log #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mwalkiewicz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments, nitpicks mostly. I cannot see any prettier way of serializing the protobufs, but honsetly I'm not sure if dumping mutation contents to full json instead of base64 representation of protobuf bytes will be useful to the users. Dumping protobufs to JSON is the most complicated part of this code and I don't think that examining exact contents of mutations that have failed would be useful to anyone who just want to replay them onto the secondary database when problems are over.
Reviewable status: 0 of 9 files reviewed, 10 unresolved discussions (waiting on @dopiera and @prawilny)
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Appender.java, line 37 at r1 (raw file):
* <p>Implementations should be thread-safe. */ public abstract class Appender implements AutoCloseable {
It looks like if it should be an interface, not a class.
An interface is not permitted to have static methods, but I'd argue that we do not need them.
Currently CreateDefault is used to hide DefaultAppender as an implementation detail, in order to allow the users, i presume, to create loggers using new Logger(Appender.CreateDefault(), new MySerializer()). However I think that it won't work that way, because it would force users to pass such a Logger to a Connection somehow, what would prevent them from using connectionFactory. What we can do instead is to create a configuration parameter FAILLOG_APPENDER_CLASS_IMPL and use its value to create instances of appropriate class, by default our DefaultAppender, or any other user-defined class. I'd also argue that such a implementation should have a MyAppender(Configuration configuration) constructor that would read parameters from configuration file, because those parameters might be different for different implementations.
The same applies to Serializer.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Appender.java, line 49 at r1 (raw file):
* details. */ public static Appender CreateDefault(String pathPrefix, int maxBufferSize, boolean dropOnOverflow)
According to the style guide methods names should be written using camelCalse, not CamelCase.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Appender.java, line 63 at r1 (raw file):
* performed by a separate thread, which objects of this class create. */ class DefaultAppender extends Appender {
This class should be moved to a separate file. The style guide forbids multiple top-level classes in single file.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/JsonSerializer.java, line 67 at r1 (raw file):
return translated; } ArrayList<Descriptors.FileDescriptor> deps = new ArrayList<>();
nit: This method is currently responsible for handling cache entries and creating new translated. Creating new translated can be extracted to a separate method to make the code simpler.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/JsonSerializer.java, line 104 at r1 (raw file):
Descriptors.Descriptor translatedMessageDescriptor = translatedFileDescriptor.findMessageTypeByName(typeName); DynamicMessage translatedMessage = null;
Nit: Declaration and assignment in try block can be merged.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Logger.java, line 27 at r1 (raw file):
*/ public class Logger implements AutoCloseable { Serializer serializer;
private final?
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Logger.java, line 76 at r1 (raw file):
* @throws InterruptedException in case the appender thread was interrupted */ void mutationFailed(Mutation mutation, Throwable failureCause) throws InterruptedException {
RowMutations won't be accepted here, is this intended and users should pass each mutation separately?
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Serializer.java, line 23 at r1 (raw file):
import java.nio.charset.StandardCharsets; import java.util.Date; import org.apache.hadoop.hbase.client.*;
The style guide forbids * imports.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Serializer.java, line 69 at r1 (raw file):
@Override public byte[] serialize(Mutation mutation, Throwable failureCause) { ClientProtos.MutationProto.MutationType mutationType;
I think that embeding MutationType as a field into OperatoinType and moving those branches to a separate static method returning OperationType would increase readability.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/test/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/SerializerTest.java, line 20 at r1 (raw file):
import static org.hamcrest.CoreMatchers.containsString; import static org.hamcrest.CoreMatchers.equalTo; import static org.hamcrest.MatcherAssert.assertThat;
Is there any reason for using this library instead of Google Truth which supports assertThat(x).isEqualTo(y) constructs, which, by the way, you wanted me to use some time ago? :D
dopiera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. I think it has the benefit of being able to read the logs without a custom tool (and also be able to query the logs in e.g. kibana, so that the user can ignore the mutations they don't care about.
Reviewable status: 0 of 11 files reviewed, 8 unresolved discussions (waiting on @mwalkiewicz and @prawilny)
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Appender.java, line 37 at r1 (raw file):
Previously, mwalkiewicz (Mateusz Walkiewicz) wrote…
It looks like if it should be an interface, not a class.
An interface is not permitted to have static methods, but I'd argue that we do not need them.Currently
CreateDefaultis used to hideDefaultAppenderas an implementation detail, in order to allow the users, i presume, to create loggers usingnew Logger(Appender.CreateDefault(), new MySerializer()). However I think that it won't work that way, because it would force users to pass such aLoggerto aConnectionsomehow, what would prevent them from usingconnectionFactory. What we can do instead is to create a configuration parameterFAILLOG_APPENDER_CLASS_IMPLand use its value to create instances of appropriate class, by default ourDefaultAppender, or any other user-defined class. I'd also argue that such a implementation should have aMyAppender(Configuration configuration)constructor that would read parameters from configuration file, because those parameters might be different for different implementations.The same applies to
Serializer.
Done.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Appender.java, line 49 at r1 (raw file):
Previously, mwalkiewicz (Mateusz Walkiewicz) wrote…
According to the style guide methods names should be written using
camelCalse, notCamelCase.
Removed
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Appender.java, line 63 at r1 (raw file):
Previously, mwalkiewicz (Mateusz Walkiewicz) wrote…
This class should be moved to a separate file. The style guide forbids multiple top-level classes in single file.
Done.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/JsonSerializer.java, line 67 at r1 (raw file):
Previously, mwalkiewicz (Mateusz Walkiewicz) wrote…
nit: This method is currently responsible for handling cache entries and creating new
translated. Creating newtranslatedcan be extracted to a separate method to make the code simpler.
Done
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Logger.java, line 27 at r1 (raw file):
Previously, mwalkiewicz (Mateusz Walkiewicz) wrote…
private final?
Done.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Logger.java, line 76 at r1 (raw file):
Previously, mwalkiewicz (Mateusz Walkiewicz) wrote…
RowMutationswon't be accepted here, is this intended and users should pass each mutation separately?
Yes that was the intention. If we want some additional context (e.g. index of the mutation in a batch accompanied by some opaque batch ID, we could add that later.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Serializer.java, line 23 at r1 (raw file):
Previously, mwalkiewicz (Mateusz Walkiewicz) wrote…
The style guide forbids
*imports.
Done.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Serializer.java, line 69 at r1 (raw file):
Previously, mwalkiewicz (Mateusz Walkiewicz) wrote…
I think that embeding
MutationTypeas a field intoOperatoinTypeand moving those branches to a separate static method returningOperationTypewould increase readability.
OperationType is serializable. We could do this, but suddenly we'd be serializing two names of the same thing (awkward) or we'd have to make deserialization non-trivial (less readable than what we have, I think). Without this, we have the problem of returning a pair, so we'd have to have two mappings, yielding it also less readable IMO.
But maybe I'm missing some obvious solution?
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/test/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/SerializerTest.java, line 20 at r1 (raw file):
Previously, mwalkiewicz (Mateusz Walkiewicz) wrote…
Is there any reason for using this library instead of Google Truth which supports
assertThat(x).isEqualTo(y)constructs, which, by the way, you wanted me to use some time ago? :D
Not really :) Let me redo this.
prawilny
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 9 files at r1, 6 of 8 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @mwalkiewicz)
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Serializer.java, line 23 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Done.
Not in all of the files
mwalkiewicz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 9 files at r1, 4 of 8 files at r2, 1 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @dopiera and @prawilny)
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/DefaultSerializer.java, line 60 at r3 (raw file):
operationType = LogEntry.OperationType.PUT; } else { throw new RuntimeException("Invalid mutation type: " + mutation.getClass().toString());
nit: I think thatIllegalArgumentException would be more specific.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/DefaultSerializer.java, line 63 at r3 (raw file):
} try { ClientProtos.MutationProto mutationProto =
nit: I think that this moving this to a separate serializeMutationToProtobuf(mutationType, mutation) would make this method more readable.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Serializer.java, line 69 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
OperationType is serializable. We could do this, but suddenly we'd be serializing two names of the same thing (awkward) or we'd have to make deserialization non-trivial (less readable than what we have, I think). Without this, we have the problem of returning a pair, so we'd have to have two mappings, yielding it also less readable IMO.
But maybe I'm missing some obvious solution?
Maybe I'm wrong, but from what I have seen in your tests and somwhere in documentation, enums are serialized as enum constant name (e.g. "PUT"), not as their contents ({"name": "put"} etc.), unless explicitly annotated to be handled otherwise. Or are we talking about different serializations?
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/test/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/DefaultSerializerTest.java, line 31 at r3 (raw file):
import org.apache.hadoop.hbase.Cell; import org.apache.hadoop.hbase.CellScanner; import org.apache.hadoop.hbase.client.*;
* import.
This is two allow for easier testing and to make sure that the user can provide their `Appender` without having to modify the mirroring client.
dopiera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 11 files reviewed, 2 unresolved discussions (waiting on @mwalkiewicz and @prawilny)
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Serializer.java, line 23 at r1 (raw file):
Previously, prawilny (Adam Czajkowski) wrote…
Not in all of the files
Ooops, fixed now.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/Serializer.java, line 69 at r1 (raw file):
Previously, mwalkiewicz (Mateusz Walkiewicz) wrote…
Maybe I'm wrong, but from what I have seen in your tests and somwhere in documentation, enums are serialized as enum constant name (e.g.
"PUT"), not as their contents ({"name": "put"}etc.), unless explicitly annotated to be handled otherwise. Or are we talking about different serializations?
You're right, it is smarter than I thought.
bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/test/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/faillog/DefaultSerializerTest.java, line 31 at r3 (raw file):
Previously, mwalkiewicz (Mateusz Walkiewicz) wrote…
*import.
Done.
mwalkiewicz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r4, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @dopiera)
Failed secondary mutations are written to disk in JSON format, which the user can parse programmatically or inspect visually. Each failure is logged as a separate line, which makes it compatible with solutions like logstash.
* Initial implementation of hbase1.x Mirroring Client (#1) * Initial implementation of hbase1.x Mirroring Client * MirroringTable and AsyncTableWrapper (#4) * MirroringTable and AsyncTableWrapper - Initial implementation of MirroringTable. - Wrapper around HBase 1.x Table enabling scheduling asynchronous operations. - Asynchronous verification of get/exists results. * Add missing licence header to TestResultComparator (#6) * Setup running tests in GitHub Actions (#7) * Asynchronous ResultScanner (#5) * Asynchronous ResultScanner * Implement faillog.Appender (#2) `faillog.Appender` is the lowermost component of the infrastructure for dumping failed mutations in the `MirroringClient`. `faillog/README.md` explains the design decisions in a bit more detail. `faillog.Appender` essentially logs arbitrary data asynchronously, through a separate thread reading from a bounded buffer. * ListenableReferenceCounter (#8) * Count references in asynchronous tasks before closing Table/Scanner. * Flow controller (#10) * Flow Controller * Flow Control strategy based on single requests queue * Mirroring table: writes (#12) * Add missing condition in result comparator (#15) * Add more mirroring config opitons (#16) * MismatchDetector implementation. * FlowControllerStrategy implementation. * Maximal number of outstanding requests used by FlowControllerStrategy. * Primary/Secondary connection implementation option now accepts "default" value which can be used when default HBase Connection implementation should be used by MirroringConnection. * Make AsyncTableWrapper ListenableCloseable (#20) Implements our standard interface for objects that can run callbacks after asynchronous close to simplify reference counting of MirroringTable. * MirroringConnection: use new config options (#17) * MirroringResultScanner improvements (#18) * Count references to MirroringResultScanner Current implementation of MirroringResultScanner doesn't count verificaiton requests that it has scheduled and allows to close instances while verificaiton requests are in-flight. This causes lost verifications. This PR fixes this issue by counting references to MirroringResultScanner instances when scheduling verification requests. Moreover, ListenableClosable interface is implemented for consistency with other classes that use this scheme, because now the MirroringResultScanner instances will be closed asynchronously, when all scheduled requests have finished. * MirroringTable: count references to scanners and secondary wrapper (#21) Current MirroringTable implementaion does not count its references held by MirroringResultScanners and SecondaryAsyncWrapper, thus MirroringConnection consideres it closed before we are sure that all asynchronous operations have completed. This PR adds correct reference counting of MirroringTable based on work done in previously merged PRs. * Result Scanner - ensure verification ordering (#22) Current implementation assumes that next() operations on primary and secondary scanners are called in the same order and uses this assumption to match results for verification. However, next()s on secondary database are called asynchronusly and their order is not defined, which causes invalid mismatch reports. This PR fixes this problem by placing data to be verified - results of next()s called on primary scanner - and details of next() call - number of requested elements - call on a queue. Each asynchronous call to next() is synchonized and pops a single element from that queue. Appropriate next is called based on number of requested elements. Then results of that request and results from the queue are verified. This ensures that results of next()s passed to verification are correctly matched and ordered. * Integration tests (#23) HBase 1.x integration tests * Add trace logging. (#24) * Estimate memory overhead in RequestResourceDescription (#25) * Tests: extract executor service to TestRule (#26) Extract executor service utilities into TestRule to facilitate code reuse in other test classes. * Integration tests: read configuration from xml files * MirroringBufferedMutator * MirroringBufferedMutator: integration tests (#9) * Fix error introduced in rebase (#11) * Obtain resource reservation before scheduling secondary calls (#4) Fixes a bug when secondary database request Future was created before obtaining resources from FlowController. * Integration tests - MirroringTable operations (#10) * MirroringAsyncConfiguration (#5) Add configuration class to be used by MirroringAsyncConnection. * SecondaryWriteErrorConsumer in MirroringTable (#15) Use SecondaryWriteErrorConsumer to handle write errors in secondary database in MirroringTable's writes. * Use Put to implement Increment and Append (#16) * refactor: extract functions using reflection into package utils.reflection * refactor: extract BatchHelpers into utils Extract common part of batch() helpers into a class and add Predicate argument to nested classes' constructors making it possible to reuse the code in 2.x client. * feat: Initial implementation of a subset of asynchronous HBase 2.x Mirroring Client Contains basic implementation of MirroringAsyncConnection and MirroringAsyncTable. * refactor: extract FlowController's request cancellation into a method * fix: Increment ITs fail with Bigtable as primary (#21) We were setting timerange on Increment objects used in integration tests without any reason and Bigtable doesn't support this operation. Setting timerange in ITs was removed. * fix: RequestScheduling should handle rejected resource reservations (#24) Custom FlowControlerStrategy implementations might, contrary to the default implementation, resolve reservation requests with exception, what we should handle by not performing the action that had to acquire the resources. * feat: Add OpenCensus tracing and metrics. (#14) * fix: make BatchHelpers skip verification of empty read results BatchHelpers provides error handling of batch() when there may be some partial results. Before the commit, matching successful reads were redundantly verified if there were none of them. This commit brings back the behaviour from up to 5a29253: when there are no successful matching reads, a MismatchDetector isn't called on empty arrays. * refactor: make MirroringAsync{Connection,Table} use SecondaryWriteErrorConsumerWithMetrics BatchHelpers require using SecondaryWriteErrorConsumerWithMetrics API. * refactor: make AsyncRequestScheduling accept CompletableFuture<ResourceReservation> instead of ResourceReservation This change is split off from commit introducing MirroringAsyncTable#batch() * feat: implement batch() in MirroringAsyncTable Implementation of MirroringAsyncTable's batch() and MirroringAsyncTable's methods such as get(List<Get) and put(List<Put>) using it. * feat: implement failed mutations log (#19) Failed secondary mutations are written to disk in JSON format, which the user can parse programmatically or inspect visually. Each failure is logged as a separate line, which makes it compatible with solutions like logstash. * refactor: split SplitBatchResponse (#40) SplitBatchResponse was refactored into two parts: splitting into reads/writes and failed/successful. This makes the code simpler and easier to maintain. * refactor: extract helper methods in tests (#48) * refactor: remove redundant writeWithControlFlow argument * feat: copy HBase operations' input lists (#57) * refactor: remove redundant field from MirroringConnection (#55) * feat: verification with sampling (#28) * fix: mirror Increment/Append in batch() using Put. (#47) * refactor: Move HBaseOperation into WriteOperationInfo (#68) * refactor: remove redundant parameter from scheduleWriteWithControlFlow (#69) * fix: integration tests - fix build (#70) * fix: count references to batch operations (#63) * fix: close underlying connections when MirroringConnection is closed (#49) * refactor: fix IDE warnings in MirroringAsyncTable test (#64) * fix: integration tests - check if write errors were reported (#71) * feat: make SecondaryWriteErrorConsumer accept error cause and operation (#65) * fix: do not call callbacks with lock held (#53) * refactor: use AccumulatedExceptions where appropriate (#54) * fix: fix key used in verification sampling ITs (#77) * feat: use faillog for handling write errors (#66) * refactor: add utilities for Futures (FutureUtils) * feat: defer closing connections until async operations complete (#37) Mirroring client schedules asynchronous operations - to mirror the mutations and to verify reads. Before this PR, closing the MirroringAsyncConnection would result in closing the underlying connections immediately. This made the pending asynchronous operations fail. This PR defers closing the underlying connections until all pending operations complete. It is achieved by reference counting the operations. * feat: implement AsyncTableBuilder (#42) * feat: implement MirroringAsyncTable#checkAndMutate (#43) * fix: Implement single Append and Increment as Put (#38) * refactor: simplify SecondaryWriteErrorConsumer API (#78) * feat: concurrent writes in MirroringTable (#79) * test: fix failing concurrent write test (#120) * refactor: renames and moves in RequestScheduling (#87) * wip: handover session comments Co-authored-by: Mateusz Walkiewicz <mwalkiewicz@unoperate.com> Co-authored-by: Adam Czajkowski <prawilny@unoperate.com>
* Initial implementation of hbase1.x Mirroring Client (#1) * Initial implementation of hbase1.x Mirroring Client * MirroringTable and AsyncTableWrapper (#4) * MirroringTable and AsyncTableWrapper - Initial implementation of MirroringTable. - Wrapper around HBase 1.x Table enabling scheduling asynchronous operations. - Asynchronous verification of get/exists results. * Add missing licence header to TestResultComparator (#6) * Setup running tests in GitHub Actions (#7) * Asynchronous ResultScanner (#5) * Asynchronous ResultScanner * Implement faillog.Appender (#2) `faillog.Appender` is the lowermost component of the infrastructure for dumping failed mutations in the `MirroringClient`. `faillog/README.md` explains the design decisions in a bit more detail. `faillog.Appender` essentially logs arbitrary data asynchronously, through a separate thread reading from a bounded buffer. * ListenableReferenceCounter (#8) * Count references in asynchronous tasks before closing Table/Scanner. * Flow controller (#10) * Flow Controller * Flow Control strategy based on single requests queue * Mirroring table: writes (#12) * Add missing condition in result comparator (#15) * Add more mirroring config opitons (#16) * MismatchDetector implementation. * FlowControllerStrategy implementation. * Maximal number of outstanding requests used by FlowControllerStrategy. * Primary/Secondary connection implementation option now accepts "default" value which can be used when default HBase Connection implementation should be used by MirroringConnection. * Make AsyncTableWrapper ListenableCloseable (#20) Implements our standard interface for objects that can run callbacks after asynchronous close to simplify reference counting of MirroringTable. * MirroringConnection: use new config options (#17) * MirroringResultScanner improvements (#18) * Count references to MirroringResultScanner Current implementation of MirroringResultScanner doesn't count verificaiton requests that it has scheduled and allows to close instances while verificaiton requests are in-flight. This causes lost verifications. This PR fixes this issue by counting references to MirroringResultScanner instances when scheduling verification requests. Moreover, ListenableClosable interface is implemented for consistency with other classes that use this scheme, because now the MirroringResultScanner instances will be closed asynchronously, when all scheduled requests have finished. * MirroringTable: count references to scanners and secondary wrapper (#21) Current MirroringTable implementaion does not count its references held by MirroringResultScanners and SecondaryAsyncWrapper, thus MirroringConnection consideres it closed before we are sure that all asynchronous operations have completed. This PR adds correct reference counting of MirroringTable based on work done in previously merged PRs. * Result Scanner - ensure verification ordering (#22) Current implementation assumes that next() operations on primary and secondary scanners are called in the same order and uses this assumption to match results for verification. However, next()s on secondary database are called asynchronusly and their order is not defined, which causes invalid mismatch reports. This PR fixes this problem by placing data to be verified - results of next()s called on primary scanner - and details of next() call - number of requested elements - call on a queue. Each asynchronous call to next() is synchonized and pops a single element from that queue. Appropriate next is called based on number of requested elements. Then results of that request and results from the queue are verified. This ensures that results of next()s passed to verification are correctly matched and ordered. * Integration tests (#23) HBase 1.x integration tests * Add trace logging. (#24) * Estimate memory overhead in RequestResourceDescription (#25) * Tests: extract executor service to TestRule (#26) Extract executor service utilities into TestRule to facilitate code reuse in other test classes. * Integration tests: read configuration from xml files * MirroringBufferedMutator * MirroringBufferedMutator: integration tests (#9) * Fix error introduced in rebase (#11) * Obtain resource reservation before scheduling secondary calls (#4) Fixes a bug when secondary database request Future was created before obtaining resources from FlowController. * Integration tests - MirroringTable operations (#10) * MirroringAsyncConfiguration (#5) Add configuration class to be used by MirroringAsyncConnection. * SecondaryWriteErrorConsumer in MirroringTable (#15) Use SecondaryWriteErrorConsumer to handle write errors in secondary database in MirroringTable's writes. * Use Put to implement Increment and Append (#16) * refactor: extract functions using reflection into package utils.reflection * refactor: extract BatchHelpers into utils Extract common part of batch() helpers into a class and add Predicate argument to nested classes' constructors making it possible to reuse the code in 2.x client. * feat: Initial implementation of a subset of asynchronous HBase 2.x Mirroring Client Contains basic implementation of MirroringAsyncConnection and MirroringAsyncTable. * refactor: extract FlowController's request cancellation into a method * fix: Increment ITs fail with Bigtable as primary (#21) We were setting timerange on Increment objects used in integration tests without any reason and Bigtable doesn't support this operation. Setting timerange in ITs was removed. * fix: RequestScheduling should handle rejected resource reservations (#24) Custom FlowControlerStrategy implementations might, contrary to the default implementation, resolve reservation requests with exception, what we should handle by not performing the action that had to acquire the resources. * feat: Add OpenCensus tracing and metrics. (#14) * fix: make BatchHelpers skip verification of empty read results BatchHelpers provides error handling of batch() when there may be some partial results. Before the commit, matching successful reads were redundantly verified if there were none of them. This commit brings back the behaviour from up to 5a29253: when there are no successful matching reads, a MismatchDetector isn't called on empty arrays. * refactor: make MirroringAsync{Connection,Table} use SecondaryWriteErrorConsumerWithMetrics BatchHelpers require using SecondaryWriteErrorConsumerWithMetrics API. * refactor: make AsyncRequestScheduling accept CompletableFuture<ResourceReservation> instead of ResourceReservation This change is split off from commit introducing MirroringAsyncTable#batch() * feat: implement batch() in MirroringAsyncTable Implementation of MirroringAsyncTable's batch() and MirroringAsyncTable's methods such as get(List<Get) and put(List<Put>) using it. * feat: implement failed mutations log (#19) Failed secondary mutations are written to disk in JSON format, which the user can parse programmatically or inspect visually. Each failure is logged as a separate line, which makes it compatible with solutions like logstash. * refactor: split SplitBatchResponse (#40) SplitBatchResponse was refactored into two parts: splitting into reads/writes and failed/successful. This makes the code simpler and easier to maintain. * refactor: extract helper methods in tests (#48) * refactor: remove redundant writeWithControlFlow argument * feat: copy HBase operations' input lists (#57) * refactor: remove redundant field from MirroringConnection (#55) * feat: verification with sampling (#28) * fix: mirror Increment/Append in batch() using Put. (#47) * refactor: Move HBaseOperation into WriteOperationInfo (#68) * refactor: remove redundant parameter from scheduleWriteWithControlFlow (#69) * fix: integration tests - fix build (#70) * fix: count references to batch operations (#63) * fix: close underlying connections when MirroringConnection is closed (#49) * refactor: fix IDE warnings in MirroringAsyncTable test (#64) * fix: integration tests - check if write errors were reported (#71) * feat: make SecondaryWriteErrorConsumer accept error cause and operation (#65) * fix: do not call callbacks with lock held (#53) * refactor: use AccumulatedExceptions where appropriate (#54) * fix: fix key used in verification sampling ITs (#77) * feat: use faillog for handling write errors (#66) * refactor: add utilities for Futures (FutureUtils) * feat: defer closing connections until async operations complete (#37) Mirroring client schedules asynchronous operations - to mirror the mutations and to verify reads. Before this PR, closing the MirroringAsyncConnection would result in closing the underlying connections immediately. This made the pending asynchronous operations fail. This PR defers closing the underlying connections until all pending operations complete. It is achieved by reference counting the operations. * feat: implement AsyncTableBuilder (#42) * feat: implement MirroringAsyncTable#checkAndMutate (#43) * fix: Implement single Append and Increment as Put (#38) * refactor: simplify SecondaryWriteErrorConsumer API (#78) * feat: concurrent writes in MirroringTable (#79) * test: fix failing concurrent write test (#120) * refactor: renames and moves in RequestScheduling (#87) * wip: handover session comments Co-authored-by: Mateusz Walkiewicz <mwalkiewicz@unoperate.com> Co-authored-by: Adam Czajkowski <prawilny@unoperate.com>
This PR implements the failed mutation log. Its design is documented in the
README.mdin thefaillogpackage.It uses the existing
faillog.Appenderto write data to files. Serialization is performed by dumping the mutations and their context to JSON. The part of the JSON corresponding to the mutation itself is obtained by utilizing HBase client internals, which convert the relevantMutations to protocol buffers. These protocol buffers are then translated into JSON. The translation is tricky due to protocol buffers being shaded - please refer to the class description for more details.The
Loggerclass is to be plugged into theMirroringClientin another PR.This change is