feat(cassandra): Enable case-sensitive support#25690
Conversation
44c3dcc to
73ec9cd
Compare
agrawalreetika
left a comment
There was a problem hiding this comment.
Thanks for making the changes and adding the support.
I think currently, schemas are still getting converted to lowercase? https://github.com/bibith4/presto/blob/73ec9cd4a49982120eb3af5e781a943baf755205/presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraMetadata.java#L101
Please verify and make the changes along with the tests accordingly.
5dd62cf to
c81c097
Compare
@agrawalreetika Added the requested changes. Please check |
| session.execute("CREATE KEYSPACE test_keyspace WITH REPLICATION = {'class':'SimpleStrategy', 'replication_factor': 1}"); | ||
| assertContainsEventually(() -> getQueryRunner().execute("SHOW SCHEMAS FROM cassandra"), resultBuilder(getSession(), createUnboundedVarcharType()) | ||
| .row("test_keyspace") | ||
| .build(), new Duration(1, MINUTES)); |
There was a problem hiding this comment.
Lets add schema creation for same KEYSPACE in upper case as well and then check for lowercase and upper cae for in testShowSchemas
There was a problem hiding this comment.
Please add the documenation details as well for new config - https://prestodb.io/docs/current/connector/cassandra.html#configuration-properties
There was a problem hiding this comment.
Please add the documenation details as well for new config - https://prestodb.io/docs/current/connector/cassandra.html#configuration-properties
Yes, please add documentation. You can use the MySQL doc as a model where the case-sensitive-name-matching property is documented in the last row of the table.
There was a problem hiding this comment.
@agrawalreetika @steveburnett updated the document with new config property. Please check
There was a problem hiding this comment.
Lets add schema creation for same KEYSPACE in upper case as well and then check for lowercase and upper cae for in testShowSchemas
@agrawalreetika Added the required changes. Please check
c81c097 to
6b547ef
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Thanks for the docs! Minor nits.
6b547ef to
58971f4
Compare
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thanks!
|
@jaystarshot Could you please review this? |
imjalpreet
left a comment
There was a problem hiding this comment.
Thanks, @bibith4. I had a first look and have a few suggestions/clarifications.
Overall, I have one question. In the NativeCassandraSession class, I still see some case-insensitive matching for schema names present. Is that expected? If not, we should fix that.
| keyspace.getTables().stream(), | ||
| keyspace.getMaterializedViews().stream()) | ||
| .filter(table -> table.getName().equalsIgnoreCase(caseInsensitiveTableName)) | ||
| .filter(table -> caseSensitiveNameMatchingEnabled ? table.getName().equals(caseInsensitiveTableName) : table.getName().equalsIgnoreCase(caseInsensitiveTableName)) |
There was a problem hiding this comment.
The variable name caseInsensitiveTableName is confusing and doesn't sound accurate now. I am assuming it will not be case-insensitive, only then can we match it case-sensitively. Please update it accordingly wherever it is used.
There was a problem hiding this comment.
Updated the variable name. Please check
|
|
||
| private static void checkColumnNames(List<ColumnMetadata> columns) | ||
| { | ||
| Map<String, ColumnMetadata> lowercaseNameToColumnMap = new HashMap<>(); |
There was a problem hiding this comment.
Same here, we need to update the map name, it is not just lowercaseNameToColumnMap anymore.
There was a problem hiding this comment.
updated the variable name . Please check
5756fdf to
44e380d
Compare
44e380d to
6fad5f5
Compare
@imjalpreet Updated the logic to support case sensitivity for schemas as well. Please check |
imjalpreet
left a comment
There was a problem hiding this comment.
@bibith4, some changes appear to be in the wrong commit. Please check and update the commits to include only relevant changes. It would really help in the review process. Thank you!
@imjalpreet As discussed created a seperate PR for Enable and fix all Cassandra connector tests in CI(#26022 (comment)). |
aa7e9d7 to
a1fbb34
Compare
|
@imjalpreet Updated the PR to include only mixed-case support for the Cassandra change. Can you please take a look. |
| session.execute("DROP KEYSPACE keyspace_1"); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
@bibith4 Can you add a test method for ALTER operations as well?
There was a problem hiding this comment.
@agrawalreetika Cassandra does not support ALTER operations
| try { | ||
| for (String tableName : cassandraSession.getCaseSensitiveTableNames(schemaName)) { | ||
| tableNames.add(new SchemaTableName(schemaName, tableName.toLowerCase(ENGLISH))); | ||
| String finalTableName = normalizeIdentifier(session, tableName); |
There was a problem hiding this comment.
nit:
| String finalTableName = normalizeIdentifier(session, tableName); | |
| String normalizedTableName = normalizeIdentifier(session, tableName); |
There was a problem hiding this comment.
@imjalpreet Changed as per suggestion. Please check
| for (CassandraColumnHandle columnHandle : table.getColumns()) { | ||
| columnHandles.put(CassandraCqlUtils.cqlNameToSqlName(columnHandle.getName()).toLowerCase(ENGLISH), columnHandle); | ||
| String columnName = CassandraCqlUtils.cqlNameToSqlName(columnHandle.getName()); | ||
| String finalColumnName = normalizeIdentifier(session, columnName); |
There was a problem hiding this comment.
| String finalColumnName = normalizeIdentifier(session, columnName); | |
| String normalizedColumnName = normalizeIdentifier(session, columnName); |
There was a problem hiding this comment.
@imjalpreet Changed as per suggestion. Please check
| ImmutableMap.Builder<String, ColumnHandle> columnHandles = ImmutableMap.builder(); | ||
| for (CassandraColumnHandle columnHandle : table.getColumns()) { | ||
| columnHandles.put(CassandraCqlUtils.cqlNameToSqlName(columnHandle.getName()).toLowerCase(ENGLISH), columnHandle); | ||
| String columnName = CassandraCqlUtils.cqlNameToSqlName(columnHandle.getName()); |
There was a problem hiding this comment.
nit: static import for cqlNameToSqlName
There was a problem hiding this comment.
@imjalpreet Changed as per suggestion. Please check
| if (result != null) { | ||
| throw new PrestoException( | ||
| NOT_SUPPORTED, | ||
| format("More than one keyspace has been found for the case insensitive schema name: %s -> (%s, %s)", | ||
| caseInsensitiveSchemaName, result.getName(), keyspace.getName())); | ||
| format("More than one keyspace has been found for the schema name: %s -> (%s, %s)", | ||
| caseSensitiveSchemaName, result.getName(), keyspace.getName())); | ||
| } |
There was a problem hiding this comment.
Do we need this check when caseSensitiveNameMatchingEnabled is true? I think it can only happen if we match case-insensitively.
There was a problem hiding this comment.
@imjalpreet Changed to check only if caseSensitiveNameMatchingEnabled is false
| keyspace.getTables().stream(), | ||
| keyspace.getMaterializedViews().stream()) | ||
| .filter(table -> table.getName().equalsIgnoreCase(caseInsensitiveTableName)) | ||
| .filter(table -> caseSensitiveNameMatchingEnabled ? table.getName().equals(caseSensitiveTableName) : table.getName().equalsIgnoreCase(caseSensitiveTableName)) |
There was a problem hiding this comment.
nit: maybe we can make the matchesSchemaName method generic and use it for both table and schema checks.
There was a problem hiding this comment.
@imjalpreet Changed to make the method generic
| if (columnMap.containsKey(columnNameKey)) { | ||
| throw new PrestoException( | ||
| NOT_SUPPORTED, | ||
| format("More than one column has been found for the case insensitive column name: %s -> (%s, %s)", | ||
| lowercaseName, lowercaseNameToColumnMap.get(lowercaseName).getName(), column.getName())); | ||
| columnNameKey, columnMap.get(columnNameKey).getName(), column.getName())); |
There was a problem hiding this comment.
Same question here. Do we need this when caseSensitiveNameMatchingEnabled is true?
There was a problem hiding this comment.
@imjalpreet Changed to check only if caseSensitiveNameMatchingEnabled is false
| { | ||
| private CassandraServer server; | ||
| private CassandraSession session; | ||
| private static final String KEYSPACE = "test_connetor"; |
There was a problem hiding this comment.
| private static final String KEYSPACE = "test_connetor"; | |
| private static final String KEYSPACE = "test_connector"; |
| assertQueryFails("CREATE TABLE test_connector.TEST_CREATEAS_FAIL_Join AS SELECT c.custkey, o.orderkey FROM " + | ||
| "tpch.customer c INNER JOIN tpch.ORDERS1 o ON c.custkey = o.custkey WHERE c.mktsegment = 'BUILDING'", "Table cassandra.tpch.ORDERS1 does not exist"); //failure scenario since tpch.ORDERS1 doesn't exist |
There was a problem hiding this comment.
What is the significance of this test? Shouldn't we test with just ORDERS rather than ORDERS1 as lowercase orders exists?
There was a problem hiding this comment.
@imjalpreet Corrected to use ORDERS. Please check
| "tpch.customer Cus INNER JOIN tpch.orders Ord ON Cus.custkey = Ord.custkey WHERE Cus.mktsegment = 'BUILDING'"); | ||
| assertTrue(getQueryRunner().tableExists(session, "Test_CreateAs_Mixed_Join")); | ||
| } | ||
| finally { |
There was a problem hiding this comment.
nit: you missed dropping TEST_CREATEAS_Join
There was a problem hiding this comment.
@imjalpreet added drop statement for TEST_CREATEAS_Join
| assertQuery("SELECT COUNT(*) FROM test_connetor.test_select", "VALUES 2"); | ||
| } | ||
| finally { | ||
| getQueryRunner().execute(session, "DROP TABLE IF EXISTS TEST_SELECT"); |
There was a problem hiding this comment.
We created lowercase TEST_SELECT
a1fbb34 to
6dfb570
Compare
imjalpreet
left a comment
There was a problem hiding this comment.
Thank you, @bibith4.
Changes LGTM now, last few nits.
| format("More than one keyspace has been found for the case insensitive schema name: %s -> (%s, %s)", | ||
| caseInsensitiveSchemaName, result.getName(), keyspace.getName())); | ||
| format("More than one keyspace has been found for the schema name: %s -> (%s, %s)", | ||
| caseSensitiveSchemaName, result.getName(), keyspace.getName())); |
There was a problem hiding this comment.
nit: I think we should print the lowercased schema name in the error message
| format("More than one table has been found for the case insensitive table name: %s -> (%s)", | ||
| caseInsensitiveTableName, tableNames)); | ||
| caseSensitiveTableName, tableNames)); | ||
| } |
There was a problem hiding this comment.
Missed this the last time, similar to schema and column names, this is also only needed when caseSensitiveNameMatchingEnabled is false
There was a problem hiding this comment.
Sorry, ignore this. Its already handled.
There was a problem hiding this comment.
But print the lowercased table name in error message
| private static void checkColumnNames(List<ColumnMetadata> columns) | ||
| { | ||
| Map<String, ColumnMetadata> lowercaseNameToColumnMap = new HashMap<>(); | ||
| Map<String, ColumnMetadata> columnMap = new HashMap<>(); |
There was a problem hiding this comment.
nit: maybe this line change can now be reverted
imjalpreet
left a comment
There was a problem hiding this comment.
Thank you, @bibith4.
LGTM.
| connectorProperties = new HashMap<>(ImmutableMap.copyOf(connectorProperties)); | ||
| connectorProperties.putIfAbsent("cassandra.contact-points", server.getHost()); | ||
| connectorProperties.putIfAbsent("cassandra.native-protocol-port", Integer.toString(server.getPort())); | ||
| connectorProperties.putIfAbsent("cassandra.allow-drop-table", "true"); | ||
|
|
There was a problem hiding this comment.
Are these related to the case sensitivity changes?
There was a problem hiding this comment.
These were already being used. The change is to add them to the newly introduced connectorProperties map.
Description
Enable case sensitivity support in Cassandra connector
Motivation and Context
Cassandra connector always treated table and column names as lowercase. With this change, the connector can preserve case sensitivity, allowing the creation of tables and columns with the same name but different casing, depending on the configuration setting.
Impact
Supports creating tables and columns with the same name in different cases
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.