Skip to content

data tokenization refactoring#10

Merged
ilya-kozyrev merged 1 commit intoProtegrityIntegrationTemplatefrom
ProtegrityIntegrationTemplate_refactoring
Mar 23, 2021
Merged

data tokenization refactoring#10
ilya-kozyrev merged 1 commit intoProtegrityIntegrationTemplatefrom
ProtegrityIntegrationTemplate_refactoring

Conversation

@Nuzhdina-Elena
Copy link
Copy Markdown

Refactoring moved from BeamFromDemo.
DataTokenizationTest fixed.

Copy link
Copy Markdown

@ilya-kozyrev ilya-kozyrev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ilya-kozyrev ilya-kozyrev merged commit 5362a0f into ProtegrityIntegrationTemplate Mar 23, 2021
* Logger for class.
*/
private static final Logger LOG = LoggerFactory.getLogger(DSGTokenizationFn.class);
private static final Logger LOG = LoggerFactory.getLogger(BigQueryIO.class);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it's a wrong class for logger

Comment on lines -301 to -306
if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK) {
LOG.error("Send to DSG '{}' failed with '{}'",
this.dsgURI,
response.getStatusLine());
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, you should try rebasing on master - this is a recent fix from Mikhail, we shouldn't delete it 🙂

return pipeline
/*
* Step 3: Write jsons to dead-letter gcs that were successfully processed.
* Step 1: Read CSV file(s) from Cloud Storage using {@link CsvConverters.ReadCsv}.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is redundant

Comment on lines +184 to +191
PCollection<GenericRecord> genericRecords = pipeline.apply(
"ReadAvroFiles",
AvroIO.readGenericRecords(avroSchema).from(options.getInputGcsFilePattern()));
return genericRecords
.apply(
"GenericRecordToRow", MapElements.into(TypeDescriptor.of(Row.class))
.via(AvroUtils.getGenericRecordToRowFunction(beamSchema)))
.setCoder(RowCoder.of(beamSchema));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge these 2 parts into 1 the same way it is done in writeAvro function.

return writeCsv(input, schema.getFieldNames());
default:
throw new IllegalStateException(
"No valid format for output data is provided. Please, choose JSON or CSV or AVRO.");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, change the message to this:
"No valid format for output data is provided. Please, choose JSON, CSV, or AVRO."


}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, also don't delete log4j.properties file

Amar3tto pushed a commit that referenced this pull request Mar 27, 2025
* Type handler (#10)

* Added Type Handler for Cassandra
* Added UT

* PR review comments fixes (#13)

* Refectored the code

* Handle Minor fixes

* Casting fixes

PR review comments fixed

* Type handler review cm (#14)

* updated comments

* Add additional missing case as well (#16)

* Resolve review comments and added test cases for null, min, max and incorrect value format

* Handle Exception

* Handle INet

* Handle Code Formate

* Added format and removeed raw import

* Added Null handling

* Added test cases for all remaining of cassandra datatype

* Address the formatter case

* Updated Comment

* mvn spotless:apply fixes

* Type handler ut fixes (#17)

* removed unwanted <> from comments

* Added Fixes for String formate

* Fix spotapply

* Type handler ut fixes (#18)

* removed unwanted <> from comments

* Added Fixes for String formate

* Fix spotapply

* Fixed the review comments and add Test with a non-zero offset and happy test

* Fixed the review comments and add Test with a non-zero offset and happy test

* Handle Sppoty fixes

---------

Co-authored-by: Narendra Rajput <narendra.rajput@ollion.com>

* Handle Inet from GCP package and also added Long Cast for List of Long

* Fix UT and Checklist style

* Handle Timestamp to Instant

* Fix UT and consolidate Instant Parsing

* Handle Fix related 2 date conversion

* improved date parsing

* Fixed review comments and coverage (#23)

* Fixed review comments and increased coverage

* ut_coverage_fixes

* Fixed UT for Inetaddress

---------

Co-authored-by: taherkl <taher.lakdawala@ollion.com>
Co-authored-by: pawankashyapollion <v-pawan.kumar@ollion.com>
Co-authored-by: Narendra Rajput <narendra.rajput@ollion.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants