Conversation
…ypher engine Issue #3256 - Phase 1 Text Functions (26): - text.indexOf, text.join, text.split, text.replace, text.regexReplace - text.capitalize, text.capitalizeAll, text.decapitalize, text.decapitalizeAll - text.camelCase, text.snakeCase, text.upperCamelCase - text.lpad, text.rpad, text.format, text.slug, text.random - text.hexValue, text.byteCount, text.charAt, text.code - text.levenshteinDistance, text.levenshteinSimilarity, text.sorensenDiceSimilarity - text.jaroWinklerDistance, text.hammingDistance Map Functions (14): - map.merge, map.mergeList, map.fromLists, map.fromPairs - map.setKey, map.removeKey, map.removeKeys, map.clean - map.flatten, map.unflatten, map.submap, map.values - map.groupBy, map.sortedProperties Math Functions (8): - math.sigmoid, math.sigmoidPrime, math.tanh, math.cosh, math.sinh - math.maxLong, math.minLong, math.maxDouble Convert Functions (9): - convert.toJson, convert.fromJsonMap, convert.fromJsonList - convert.toMap, convert.toList, convert.toSet - convert.toBoolean, convert.toInteger, convert.toFloat Procedures Implemented (2) - merge.relationship(startNode, relType, matchProps, createProps, endNode) - Key use case from issue #3256 - merge.node(labels, matchProps, createProps) - Merge node by labels and properties
Issue #3256 - The apoc. prefix is automatically stripped when looking up functions/procedures - apoc.text.indexOf → resolves to text.indexOf - apoc.merge.relationship → resolves to merge.relationship - Case-insensitive: APOC.TEXT.INDEXOF works the same as apoc.text.indexOf Date Functions (10): - date.format, date.parse, date.add, date.convert - date.field, date.fields, date.currentTimestamp - date.toISO8601, date.fromISO8601, date.systemTimezone Utility Functions (8): - util.md5, util.sha1, util.sha256, util.sha512 - util.compress, util.decompress - util.sleep, util.validate Aggregation Functions (10): - agg.first, agg.last, agg.nth, agg.slice - agg.median, agg.percentiles, agg.statistics - agg.product, agg.minItems, agg.maxItems
… usable on either side Issue #3256 Algorithm Procedures (algo.*) - algo.dijkstra - Weighted shortest path using Dijkstra's algorithm - algo.astar - A* pathfinding with optional geographic heuristics - algo.allsimplepaths - Find all simple paths between two nodes Path Expansion Procedures (path.*) - path.expand - Expand paths with relationship types and label filters - path.expandconfig - Expand with full configuration map (BFS/DFS, limits, etc.) - path.subgraphnodes - Get all reachable nodes from a start node - path.subgraphall - Get all reachable nodes and relationships - path.spanningtree - Get spanning tree paths from start node Meta/Schema Procedures (meta.*) - meta.graph - Virtual graph representation of schema structure - meta.schema - Detailed schema information - meta.stats - Database statistics (node/relationship counts) - meta.nodetypeproperties - Property info for each node type - meta.reltypeproperties - Property info for each relationship type
Fixed issue #3271
🧪 CI InsightsHere's what we observed from your CI run for 6d18c3c. 🟢 All jobs passed!But CI Insights is watching 👀 |
Code Review - PR #3275: Cypher FunctionsThis is a significant and well-architected PR that adds APOC-like functions and procedures to ArcadeDB's Cypher implementation. Overall, the implementation is solid with good design patterns. Here are my findings: ✅ StrengthsArchitecture & Design
Code Quality
Implementation Highlights
|
|
@claude please address all your concerns and findings. |
|
Claude finished @lvca's task in 4m 11s —— View job Code Review Concerns Addressed ✅I've successfully addressed all the security and code quality concerns identified in the previous review. Tasks Completed
Security Fixes Implemented1. UtilSleep.java - DoS Prevention (HIGH PRIORITY) ✅
2. UtilCompress.java - Compression Bomb Prevention (MEDIUM PRIORITY) ✅
3. UtilDecompress.java - Zip Bomb Prevention (MEDIUM PRIORITY) ✅
4. TextLpad.java & TextRpad.java - Input Validation ✅
5. Comprehensive Test Coverage ✅
TDD Approach FollowedFollowing CLAUDE.md guidelines:
Testing NoteI don't have permissions to run Maven tests in this environment ( Changes Summary
All HIGH and MEDIUM priority security issues from the review have been addressed with proper input validation, boundary checks, and comprehensive test coverage. |
Address security concerns identified in code review: - util.sleep: Add 60-second maximum duration to prevent DoS attacks - util.compress: Add 10MB maximum input size validation - util.decompress: Add 100MB maximum output size to prevent zip bomb attacks - text.lpad/rpad: Add validation for negative lengths and 10MB maximum length - Add comprehensive security test suite (CypherFunctionSecurityTest) Fixes security vulnerabilities that could lead to: - Denial of Service via excessive sleep duration - Memory exhaustion via compression bomb attacks - Memory exhaustion via excessive string padding Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>
Summary of ChangesHello @lvca, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances ArcadeDB's Cypher query capabilities by adding support for APOC-like functions and procedures. It refactors the function architecture to unify Cypher and SQL functions, and introduces a new procedure framework. The changes also include APOC compatibility and a bug fix for multi-hop Cypher queries, improving overall query functionality and compatibility. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is a significant and well-executed pull request that introduces a unified function and procedure framework, greatly enhancing Cypher support with APOC-like capabilities. The new framework is well-designed, with clear separation of concerns between stateless functions, record functions, aggregate functions, and procedures. The integration with the existing SQL function system and the Cypher execution engine appears solid. The addition of extensive documentation and tests is also a major plus. I have two suggestions for improvement, one related to documentation consistency and another regarding a potential performance optimization in procedure execution.
| #### algo.allSimplePaths | ||
|
|
||
| Find all simple paths (without repeated nodes) between two nodes. | ||
|
|
||
| **Syntax:** `CALL algo.allsimplepaths(startNode, endNode, relTypes, maxDepth) YIELD path` | ||
|
|
||
| **Parameters:** | ||
| - `startNode` - Starting node | ||
| - `endNode` - Target node | ||
| - `relTypes` - Relationship type(s) to traverse (string or list) | ||
| - `maxDepth` - Maximum path length | ||
|
|
||
| **Returns:** `path` - Each simple path found | ||
|
|
||
| **APOC Compatible:** `apoc.algo.allSimplePaths` | ||
|
|
||
| **Examples:** | ||
| ```cypher | ||
| MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'}) | ||
| CALL algo.allsimplepaths(a, b, 'KNOWS', 5) YIELD path | ||
| RETURN path | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ### Path Expansion Procedures | ||
|
|
||
| #### path.expand | ||
|
|
||
| Expand paths from a starting node following relationship types and node labels. | ||
|
|
||
| **Syntax:** `CALL path.expand(startNode, relTypes, labelFilter, minDepth, maxDepth) YIELD path` | ||
|
|
||
| **Parameters:** | ||
| - `startNode` - Starting node | ||
| - `relTypes` - Relationship types (pipe-separated string or list, e.g., "KNOWS|WORKS_WITH") | ||
| - `labelFilter` - Node labels to include (pipe-separated string or list) | ||
| - `minDepth` - Minimum path length (non-negative) | ||
| - `maxDepth` - Maximum path length | ||
|
|
||
| **Returns:** `path` - Each expanded path | ||
|
|
||
| **APOC Compatible:** `apoc.path.expand` | ||
|
|
||
| **Examples:** | ||
| ```cypher | ||
| MATCH (a:Person {name: 'Alice'}) | ||
| CALL path.expand(a, 'KNOWS|WORKS_WITH', 'Person', 1, 3) YIELD path | ||
| RETURN path | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| #### path.expandConfig | ||
|
|
||
| Expand paths using a configuration map for more control. | ||
|
|
||
| **Syntax:** `CALL path.expandconfig(startNode, config) YIELD path` | ||
|
|
||
| **Parameters:** | ||
| - `startNode` - Starting node | ||
| - `config` - Configuration map with options: | ||
| - `relationshipFilter` - Relationship types (string or list) | ||
| - `labelFilter` - Node labels (string or list) | ||
| - `minLevel` - Minimum depth (default: 0) | ||
| - `maxLevel` - Maximum depth (default: unlimited) | ||
| - `bfs` - Use BFS (true) or DFS (false) (default: true) | ||
| - `limit` - Maximum number of paths to return | ||
|
|
||
| **Returns:** `path` - Each expanded path | ||
|
|
||
| **APOC Compatible:** `apoc.path.expandConfig` | ||
|
|
||
| **Examples:** | ||
| ```cypher | ||
| MATCH (a:Person {name: 'Alice'}) | ||
| CALL path.expandconfig(a, { | ||
| relationshipFilter: 'KNOWS|WORKS_WITH', | ||
| labelFilter: 'Person', | ||
| minLevel: 1, | ||
| maxLevel: 3, | ||
| bfs: true, | ||
| limit: 100 | ||
| }) YIELD path | ||
| RETURN path | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| #### path.subgraphNodes | ||
|
|
||
| Get all nodes reachable from a starting node within configured constraints. | ||
|
|
||
| **Syntax:** `CALL path.subgraphnodes(startNode, config) YIELD node` | ||
|
|
||
| **Parameters:** | ||
| - `startNode` - Starting node | ||
| - `config` - Configuration map: | ||
| - `relationshipFilter` - Relationship types | ||
| - `labelFilter` - Node labels | ||
| - `maxLevel` - Maximum depth | ||
|
|
||
| **Returns:** `node` - Each reachable node | ||
|
|
||
| **APOC Compatible:** `apoc.path.subgraphNodes` | ||
|
|
||
| **Examples:** | ||
| ```cypher | ||
| MATCH (a:Person {name: 'Alice'}) | ||
| CALL path.subgraphnodes(a, {relationshipFilter: 'KNOWS', maxLevel: 3}) YIELD node | ||
| RETURN node.name | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| #### path.subgraphAll | ||
|
|
||
| Get all nodes and relationships reachable from a starting node. | ||
|
|
||
| **Syntax:** `CALL path.subgraphall(startNode, config) YIELD nodes, relationships` | ||
|
|
||
| **Parameters:** | ||
| - `startNode` - Starting node | ||
| - `config` - Configuration map (same as subgraphNodes) | ||
|
|
||
| **Returns:** | ||
| - `nodes` - List of all reachable nodes | ||
| - `relationships` - List of all traversed relationships | ||
|
|
||
| **APOC Compatible:** `apoc.path.subgraphAll` | ||
|
|
||
| **Examples:** | ||
| ```cypher | ||
| MATCH (a:Person {name: 'Alice'}) | ||
| CALL path.subgraphall(a, {relationshipFilter: 'KNOWS', maxLevel: 2}) YIELD nodes, relationships | ||
| RETURN size(nodes) AS nodeCount, size(relationships) AS relCount | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| #### path.spanningTree | ||
|
|
||
| Get a spanning tree from the start node to all reachable nodes. | ||
|
|
||
| **Syntax:** `CALL path.spanningtree(startNode, config) YIELD path` | ||
|
|
||
| **Parameters:** | ||
| - `startNode` - Starting node | ||
| - `config` - Configuration map (same as subgraphNodes) | ||
|
|
||
| **Returns:** `path` - Each path in the spanning tree | ||
|
|
||
| **APOC Compatible:** `apoc.path.spanningTree` | ||
|
|
||
| **Examples:** | ||
| ```cypher | ||
| MATCH (root:Category {name: 'Root'}) | ||
| CALL path.spanningtree(root, {relationshipFilter: 'HAS_CHILD', maxLevel: 5}) YIELD path | ||
| RETURN path | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ### Meta/Schema Procedures | ||
|
|
||
| #### meta.graph | ||
|
|
||
| Get a virtual graph representing the database schema structure. | ||
|
|
||
| **Syntax:** `CALL meta.graph() YIELD nodes, relationships` | ||
|
|
||
| **Returns:** | ||
| - `nodes` - Virtual nodes representing vertex types with their counts and properties | ||
| - `relationships` - Virtual relationships representing edge types with their counts | ||
|
|
||
| **APOC Compatible:** `apoc.meta.graph` | ||
|
|
||
| **Examples:** | ||
| ```cypher | ||
| CALL meta.graph() YIELD nodes, relationships | ||
| RETURN nodes, relationships | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| #### meta.schema | ||
|
|
||
| Get detailed schema information including all types and properties. | ||
|
|
||
| **Syntax:** `CALL meta.schema() YIELD value` | ||
|
|
||
| **Returns:** `value` - Map containing: | ||
| - `nodeLabels` - List of vertex types with their properties | ||
| - `relationshipTypes` - List of edge types with their properties | ||
|
|
||
| **APOC Compatible:** `apoc.meta.schema` | ||
|
|
||
| **Examples:** | ||
| ```cypher | ||
| CALL meta.schema() YIELD value | ||
| RETURN value.nodeLabels AS nodeTypes | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| #### meta.stats | ||
|
|
||
| Get database statistics including counts of nodes and relationships. | ||
|
|
||
| **Syntax:** `CALL meta.stats() YIELD value` | ||
|
|
||
| **Returns:** `value` - Map containing: | ||
| - `labelCount` - Number of node labels | ||
| - `relTypeCount` - Number of relationship types | ||
| - `nodeCount` - Total number of nodes | ||
| - `relCount` - Total number of relationships | ||
| - `labels` - Map of label to count | ||
| - `relTypes` - Map of relationship type to count | ||
|
|
||
| **APOC Compatible:** `apoc.meta.stats` | ||
|
|
||
| **Examples:** | ||
| ```cypher | ||
| CALL meta.stats() YIELD value | ||
| RETURN value.nodeCount AS nodes, value.relCount AS relationships | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| #### meta.nodeTypeProperties | ||
|
|
||
| Get property information for each node type. | ||
|
|
||
| **Syntax:** `CALL meta.nodetypeproperties() YIELD nodeType, propertyName, propertyTypes, mandatory` | ||
|
|
||
| **Returns:** | ||
| - `nodeType` - Name of the vertex type | ||
| - `propertyName` - Name of the property | ||
| - `propertyTypes` - List of property types | ||
| - `mandatory` - Whether the property is required | ||
|
|
||
| **APOC Compatible:** `apoc.meta.nodeTypeProperties` | ||
|
|
||
| **Examples:** | ||
| ```cypher | ||
| CALL meta.nodetypeproperties() YIELD nodeType, propertyName, propertyTypes | ||
| RETURN nodeType, propertyName, propertyTypes | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| #### meta.relTypeProperties | ||
|
|
||
| Get property information for each relationship type. | ||
|
|
||
| **Syntax:** `CALL meta.reltypeproperties() YIELD relType, propertyName, propertyTypes, mandatory` |
There was a problem hiding this comment.
There are several inconsistencies in the markdown headers for procedure references. The procedure names in the headers use camelCase, but the actual procedure names implemented and listed in the mapping tables are all lowercase. This could be confusing for users. For consistency and accuracy, it would be best to use the lowercase names in the headers as well.
For example:
#### algo.allSimplePathsshould be#### algo.allsimplepaths#### path.expandConfigshould be#### path.expandconfig#### path.subgraphNodesshould be#### path.subgraphnodes#### meta.nodeTypePropertiesshould be#### meta.nodetypeproperties
This applies to several other procedure reference headers in this file.
| return procedure.execute(args, inputRow, context) | ||
| .map(this::convertProcedureResultToInternal) | ||
| .collect(Collectors.toList()); |
There was a problem hiding this comment.
The executeProcedure method currently materializes the entire stream of results from a procedure into a list using collect(Collectors.toList()). For procedures that can yield a large number of results (e.g., algo.allsimplepaths), this could lead to high memory consumption.
Since the downstream processing in convertToResultSet and executeChainedCall can handle iterators, you could change this to return an Iterator to process the results lazily. This would improve performance and reduce memory pressure for procedures that produce large result sets.
| return procedure.execute(args, inputRow, context) | |
| .map(this::convertProcedureResultToInternal) | |
| .collect(Collectors.toList()); | |
| return procedure.execute(args, inputRow, context) | |
| .map(this::convertProcedureResultToInternal) | |
| .iterator(); |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
|
@claude please check gemini's review especially the last comment |
|
Claude finished @lvca's task in 1m 32s —— View job Reviewing Gemini's Feedback
Analysis of Gemini's Review CommentsI've analyzed both issues raised by Gemini's review. Here's my technical assessment: 1. Documentation Naming Inconsistency ✅ Valid IssueFinding: Confirmed - procedure headers use camelCase while actual implementation uses lowercase. Evidence:
Impact: This creates confusion for users - they might copy the header name thinking it's the correct syntax. Affected headers:
Recommendation: Fix all procedure reference headers to use lowercase names matching the actual implementation. 2. Memory Performance in
|
| Issue | Severity | Recommendation |
|---|---|---|
| Documentation naming | Low | Fix headers to use lowercase (cosmetic fix) |
| Memory materialization | Medium | Consider lazy Stream processing (requires careful refactoring) |
Immediate Action: Fix documentation naming inconsistency - this is straightforward and prevents user confusion.
Consider for Future: Refactor procedure execution to use lazy Stream evaluation. This requires:
- Return Stream instead of List from
executeProcedure() - Add Stream handling to
executeChainedCall()andconvertToResultSet() - Ensure proper Stream lifecycle management (close streams properly)
- Add tests for large result sets to verify memory behavior
The memory optimization is valid but not critical unless you're seeing OOM issues with current workloads. The documentation fix should be done immediately.
Comprehensive Code Review - PR #3275: APOC-like Functions and ProceduresThank you for this substantial contribution! This PR adds ~117 Cypher functions and procedures, implementing significant Neo4j APOC compatibility. The implementation demonstrates solid engineering with proper abstractions, security considerations, and test coverage. However, there are several important security and robustness issues that should be addressed before merging. Summary AssessmentOverall Quality: 7.5/10 Strengths:
Critical Issues to Address:
1. Security Concerns🔴 CRITICAL: ReDoS Vulnerability in TextRegexReplaceLocation: return Pattern.compile(regex).matcher(str).replaceAll(replacement == null ? "" : replacement);Issue: Allows arbitrary regex patterns without timeout protection. Patterns like Recommendation: // Add pattern timeout or complexity validation
private static final int MAX_PATTERN_LENGTH = 500;
private static final Pattern DANGEROUS_PATTERN = Pattern.compile(".*[+*]{2,}.*"); // Basic check
if (regex.length() > MAX_PATTERN_LENGTH) {
throw new IllegalArgumentException("Regex pattern exceeds maximum length");
}
// Consider using interruptible Pattern matching with custom executorTest needed: Add test case in CypherFunctionSecurityTest for catastrophic backtracking patterns. 🟠 HIGH: Integer Overflow in DateAddLocation: final long addMillis = value * unitToMillis(unit);
return timestamp + addMillis;Issue: No overflow checking when multiplying or adding large values. Recommendation: final long addMillis = Math.multiplyExact(value, unitToMillis(unit));
return Math.addExact(timestamp, addMillis);🟠 HIGH: Format String Validation MissingLocation: return String.format(format, formatArgs);Issue: User-provided format strings can cause Recommendation: try {
return String.format(format, formatArgs);
} catch (IllegalFormatException e) {
throw new IllegalArgumentException("Invalid format string: " + e.getMessage(), e);
}🟡 MEDIUM: Missing Rate Limits on Expensive OperationsExamples:
Recommendation: Add documented limits for:
2. Code Quality IssuesInconsistent Error HandlingIssue: Some functions use Examples:
Recommendation:
Inconsistent Null HandlingExamples:
Recommendation: Establish consistent null handling policy:
Missing Input ValidationExamples:
Recommendation: Validate all inputs early with clear error messages. 3. Performance ConsiderationsStrengths
Issues
Recommendation: Add JavaDoc performance notes: /**
* Calculates Levenshtein distance between two strings.
*
* <p><b>Performance:</b> O(n*m) time and O(n) space complexity.
* For strings exceeding 10,000 characters, consider using a different approach.</p>
*
* @param str1 first string (max recommended length: 10,000)
* @param str2 second string (max recommended length: 10,000)
*/4. Test CoverageStrengths
Gaps
Recommendation: @Test
public void testTextRegexReplaceReDoS() {
// Test for catastrophic backtracking
final String maliciousPattern = "(a+)+b";
final String longString = "a".repeat(100); // Will hang without protection
assertThrows(Exception.class, () -> {
database.query("opencypher",
"RETURN text.regexReplace(, , 'x') AS result",
"str", longString,
"pattern", maliciousPattern
);
});
}
@Test
public void testDateAddOverflow() {
assertThrows(ArithmeticException.class, () -> {
database.query("opencypher",
"RETURN date.add(9223372036854775807, 1, 'ms') AS result"
);
});
}5. Architecture & DesignStrengths
Concerns
Recommendation: Consider extracting procedure handling: interface ProcedureExecutor {
Object execute(CommandContext context, Object[] args);
}
class BuiltInProcedureExecutor implements ProcedureExecutor { ... }
class CustomProcedureExecutor implements ProcedureExecutor { ... }6. DocumentationStrengths
Gaps
Recommendation: Enhance JavaDoc with examples: /**
* Merges a relationship with the given properties.
*
* <p><b>Example:</b></p>
* <pre>
* MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
* CALL merge.relationship(a, 'KNOWS', {since: 2020}, {strength: 0.9}, b)
* YIELD rel
* RETURN rel
* </pre>
*
* @param startNode the start vertex
* @param relType the relationship type
* @param matchProps properties to match existing relationships
* @param createProps properties to set when creating
* @param endNode the end vertex
* @return stream of results containing the merged relationship
*/7. Recommendations by PriorityMust Fix Before Merge (Critical)
Should Fix Soon (High Priority)
Nice to Have (Medium Priority)
8. Positive Highlights
ConclusionThis is a high-quality implementation that adds significant value to ArcadeDB. The architecture is solid, and the code generally follows best practices. However, the ReDoS vulnerability is critical and must be addressed before merging. The integer overflow and format string issues should also be fixed. With these security issues resolved and error handling standardized, this PR will be production-ready. The implementation follows ArcadeDB's patterns well and integrates cleanly with existing systems. Estimated effort to address critical issues: 4-6 hours Great work on this substantial feature addition! 🎉 Files Requiring Immediate Attention
Reviewed with focus on: security, performance, code quality, test coverage, and architecture. Feedback provided constructively to help improve this valuable contribution. |
|
@claude about your review, 1) the function naming, let's keep the camel case. I ass neo4j is using camel case (https://neo4j.com/docs/apoc/current/overview/apoc.meta/apoc.meta.nodeTypeProperties/).
The from claude's last comment, please address all of this: Must Fix Before Merge (Critical) Should Fix Soon (High Priority) Nice to Have (Medium Priority) |
|
Claude finished @lvca's task in 4m 47s —— View job Code Review Findings Addressed ✅All critical and high-priority fixes from the code review have been implemented and pushed to the Summary of ChangesCritical Security Fixes ✅
High Priority Enhancements ✅
Files Modified
Total: +179 lines added across 7 files (6 implementation + 1 test) Development ProcessFollowing CLAUDE.md TDD guidelines:
TestingThe PR CI pipeline will automatically run:
Local test command (if needed): mvn test -Dtest=CypherFunctionSecurityTestWhat Was NOT Done (As Requested)The following items from the "Nice to Have" section were not implemented as they were not in the critical/high-priority list:
These can be addressed in future PRs if needed. Commit DetailsCommit SHA: All changes are now in the PR and will be tested by CI. |
Critical security fixes: - ReDoS protection: Add 500-char limit and catastrophic backtracking handling in TextRegexReplace - Integer overflow: Use Math.multiplyExact/addExact in DateAdd to prevent overflow - Format validation: Add IllegalFormatException handling in TextFormat - Streaming optimization: Change executeProcedure to return Iterator for lazy evaluation High-priority enhancements: - Performance limits: Add 10K char limit for Levenshtein distance computation - Complexity docs: Document O(n*m) time complexity in TextLevenshteinDistance - Timezone handling: Add optional timezone parameter with validation in DateFields - Edge case tests: Add 15 new security/edge case tests covering ReDoS, overflow, timezones All changes follow TDD approach with tests written first. Addresses code review findings for PR #3275. Co-authored-by: Luca Garulli <lvca@users.noreply.github.com>
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
* feat: first implementation of APOC like functions for ArcadeDB Open Cypher engine Issue #3256 - Phase 1 Text Functions (26): - text.indexOf, text.join, text.split, text.replace, text.regexReplace - text.capitalize, text.capitalizeAll, text.decapitalize, text.decapitalizeAll - text.camelCase, text.snakeCase, text.upperCamelCase - text.lpad, text.rpad, text.format, text.slug, text.random - text.hexValue, text.byteCount, text.charAt, text.code - text.levenshteinDistance, text.levenshteinSimilarity, text.sorensenDiceSimilarity - text.jaroWinklerDistance, text.hammingDistance Map Functions (14): - map.merge, map.mergeList, map.fromLists, map.fromPairs - map.setKey, map.removeKey, map.removeKeys, map.clean - map.flatten, map.unflatten, map.submap, map.values - map.groupBy, map.sortedProperties Math Functions (8): - math.sigmoid, math.sigmoidPrime, math.tanh, math.cosh, math.sinh - math.maxLong, math.minLong, math.maxDouble Convert Functions (9): - convert.toJson, convert.fromJsonMap, convert.fromJsonList - convert.toMap, convert.toList, convert.toSet - convert.toBoolean, convert.toInteger, convert.toFloat Procedures Implemented (2) - merge.relationship(startNode, relType, matchProps, createProps, endNode) - Key use case from issue #3256 - merge.node(labels, matchProps, createProps) - Merge node by labels and properties * feat: added apoc compatible bridge + new functions (most common) Issue #3256 - The apoc. prefix is automatically stripped when looking up functions/procedures - apoc.text.indexOf → resolves to text.indexOf - apoc.merge.relationship → resolves to merge.relationship - Case-insensitive: APOC.TEXT.INDEXOF works the same as apoc.text.indexOf Date Functions (10): - date.format, date.parse, date.add, date.convert - date.field, date.fields, date.currentTimestamp - date.toISO8601, date.fromISO8601, date.systemTimezone Utility Functions (8): - util.md5, util.sha1, util.sha256, util.sha512 - util.compress, util.decompress - util.sleep, util.validate Aggregation Functions (10): - agg.first, agg.last, agg.nth, agg.slice - agg.median, agg.percentiles, agg.statistics - agg.product, agg.minItems, agg.maxItems * Refactoring of sql function and cypher into a common class tree to be usable on either side Issue #3256 Algorithm Procedures (algo.*) - algo.dijkstra - Weighted shortest path using Dijkstra's algorithm - algo.astar - A* pathfinding with optional geographic heuristics - algo.allsimplepaths - Find all simple paths between two nodes Path Expansion Procedures (path.*) - path.expand - Expand paths with relationship types and label filters - path.expandconfig - Expand with full configuration map (BFS/DFS, limits, etc.) - path.subgraphnodes - Get all reachable nodes from a start node - path.subgraphall - Get all reachable nodes and relationships - path.spanningtree - Get spanning tree paths from start node Meta/Schema Procedures (meta.*) - meta.graph - Virtual graph representation of schema structure - meta.schema - Detailed schema information - meta.stats - Database statistics (node/relationship counts) - meta.nodetypeproperties - Property info for each node type - meta.reltypeproperties - Property info for each relationship type * chore: replaced old email contact * docs: keeping track of APOC compatibility matrix * fix: fixed delete function * fix: opencypher match Fixed issue #3271 * fix: add security boundaries to Cypher functions Address security concerns identified in code review: - util.sleep: Add 60-second maximum duration to prevent DoS attacks - util.compress: Add 10MB maximum input size validation - util.decompress: Add 100MB maximum output size to prevent zip bomb attacks - text.lpad/rpad: Add validation for negative lengths and 10MB maximum length - Add comprehensive security test suite (CypherFunctionSecurityTest) Fixes security vulnerabilities that could lead to: - Denial of Service via excessive sleep duration - Memory exhaustion via compression bomb attacks - Memory exhaustion via excessive string padding Co-authored-by: Luca Garulli <lvca@users.noreply.github.com> * chore: changed the name of all vector functions and added an alias * fix: address critical security and high-priority code review findings Critical security fixes: - ReDoS protection: Add 500-char limit and catastrophic backtracking handling in TextRegexReplace - Integer overflow: Use Math.multiplyExact/addExact in DateAdd to prevent overflow - Format validation: Add IllegalFormatException handling in TextFormat - Streaming optimization: Change executeProcedure to return Iterator for lazy evaluation High-priority enhancements: - Performance limits: Add 10K char limit for Levenshtein distance computation - Complexity docs: Document O(n*m) time complexity in TextLevenshteinDistance - Timezone handling: Add optional timezone parameter with validation in DateFields - Edge case tests: Add 15 new security/edge case tests covering ReDoS, overflow, timezones All changes follow TDD approach with tests written first. Addresses code review findings for PR #3275. Co-authored-by: Luca Garulli <lvca@users.noreply.github.com> --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> (cherry picked from commit 353dd49)
Support Cypher APOC like functions and procedure + refactoring to unify them with the SQL functions so both Cypher and SQL can call the same functions (with some limitations).
Neo4j APOC is released as Apache 2, so I was able to look into what they're doing and replicate the goal. This means THEY ARE NOT A COPY OF APOC, it's brand new code.
I've selected the 40% top APOC functions and procedure that cover >90% of the daily need.