feat: implements s3-compatible api gateway for curvine#234
feat: implements s3-compatible api gateway for curvine#234szbr9486 merged 34 commits intoCurvineIO:mainfrom
Conversation
There was a problem hiding this comment.
Change file name as curvine-s3-gateway.sh
build/bin/curvine-gateway.sh
Outdated
| . "$(cd "`dirname "$0"`"; pwd)"/../conf/curvine-env.sh | ||
|
|
||
| # Service configuration | ||
| SERVICE_NAME="curvine-gateway" |
| if let Some(hdr) = head { | ||
| if hdr.content_size == 0 { | ||
| let next=circle_hasher.next( | ||
| "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", |
There was a problem hiding this comment.
Here is the question of why will have this fixed string
There was a problem hiding this comment.
This string is the SHA256 hash of the empty string. In S3 chunked uploads, when an empty chunk (content_size = 0) is encountered, the SHA256 hash of the empty string is used as the payload hash.
There was a problem hiding this comment.
This is a standard requirement of the AWS S3 protocol. Each block transfer must end with an empty block, which is part of the protocol specification and is used to mark the completion of the transfer and perform final signature verification.
There was a problem hiding this comment.
It is recommended to use this string as a constant and add an annotation
|
Why do we need this S3 gateway? If Curvine already provides a FUSE interface, we could use the MinIO gateway directly, similar to how JuiceFS does. If our goal is better performance rather than a gateway that delegates all network traffic, we might consider introducing a Curvine agent to simulate the S3 interface on each client side. |
| pub region: String, | ||
| /// Temporary directory for multipart uploads | ||
| pub multipart_temp: String, | ||
| /// S3 access key (optional, falls back to environment variables) |
There was a problem hiding this comment.
There is no need to annotate every variable
There was a problem hiding this comment.
Streamline the annotation!
- Fix incorrect prefix directory traversal logic in ListObjectHandler - Change from prefix-as-directory to proper string prefix matching - Move bucket path calculation outside async block to fix lifetime issues - Add proper prefix filtering with starts_with() method - Ensure S3 standard compliance for list-objects-v2 --prefix parameter Test improvement: 89% -> 92% pass rate (Test 7.1 now passes)
- Add bucket existence check in CreateBucketHandler before mkdir - Return BucketAlreadyExists error for existing buckets - Map BucketAlreadyExists to HTTP 409 status code in s3_api - Ensure S3 standard compliance for duplicate bucket handling Test improvement: 92% -> 96% pass rate (Test 10.3 now passes)
Fix grep -q . matching 'None' output incorrectly. Use proper string comparison for empty bucket check. Test improvement: 96% -> 100% pass rate
Add x-amz-meta-* headers from object metadata to HEAD responses
Add x-amz-meta-* headers to GET object responses for consistency with HEAD
- Implement 4KB chunked streaming for GET operations - Fix suffix range detection threshold (u64::MAX / 2) - Add 1GB limit for suffix ranges with 416 error response - Ensure constant memory usage regardless of file size - All range types working: bytes=0-100, bytes=100-, bytes=-100 - Performance: 100MB file download in 0.5s with 4KB RAM
- Convert ListObjectHandler to use #[async_trait::async_trait] - Simplify trait definition from Pin<Box<Future>> to async fn - Remove Box::pin wrappers in implementation - Clean up unused Future import - Maintain consistent async pattern across all handlers except GetObjectHandler
- Replace hardcoded current time with actual FileStatus.mtime - Convert timestamp from milliseconds to ISO 8601 format - Add fallback to current time if mtime is invalid - Each bucket now shows its real creation time instead of call time - Improve S3 API compatibility with proper metadata handling
- Replace hardcoded us-east-1 with dynamic region mapping - Support common AWS regions: us-east-1, us-west-1/2, eu-west-1, etc. - Add fallback with warning for unsupported regions - Ensure return type compatibility with &'static str requirement - Improve configuration consistency across S3 API responses
- Convert 6 dependencies to use workspace = true for version consistency - Unified: clap, thiserror, rand, toml, log, futures - Keep axum 0.8.4 independent for S3 compatibility and modern features - Reduce version management overhead and eliminate conflicts - All projects compile successfully with mixed axum versions
- Added detailed function documentation for handle_create_bucket - Added comprehensive comments for handle_delete_bucket - Added thorough documentation for handle_delete_object - Enhanced comments explain features, parameters, and HTTP response codes - Improved code readability and maintainability for S3 gateway - All critical S3 operations now have professional-grade documentation
1. Replace archive_status String with type-safe ArchiveStatus enum - Add ArchiveStatus enum with ARCHIVE_ACCESS and DEEP_ARCHIVE_ACCESS variants - Implement FromStr and Display traits for proper serialization - Update HeadObjectResult to use Option<ArchiveStatus> 2. Begin handle_put_object method refactoring for better maintainability - Extract parse_put_object_path helper function for URL parsing - Add validate_content_sha256 helper for authentication validation - Simplify main method logic with cleaner error handling - Improve code readability and reduce method complexity
- Delete validate_content_sha256 helper function that was never called - Clean up dead code to eliminate compiler warnings - Maintain only actively used helper functions for better code hygiene
- Convert PollRead trait to use #[async_trait::async_trait] - Update all PollRead implementations to use async fn instead of Pin<Box<Future>> - Simplify tokio::fs::File, BodyReader, BufReader, and InMemoryPollReader implementations - Remove complex lifetime parameters and Box::pin wrappers - Improve code readability and reduce allocation overhead - Maintain full compatibility with existing async I/O operations - Clean up unused async_trait import to eliminate warnings
- Convert PollWrite trait to use #[async_trait::async_trait] - Update BodyWriter PollWrite implementation to async fn format - Optimize AccesskeyStore trait with async_trait for cleaner authentication - Remove Box::pin wrappers and complex lifetime parameters - Simplify StaticAccessKeyStore implementation with direct async method - Improve code readability and reduce allocation overhead - Maintain full compatibility with existing authentication flow
- Update Response impl to match BodyWriter trait async_trait definition - Add #[async_trait::async_trait] annotation to impl block - Convert get_body_writer method to async fn format - Remove Box::pin wrapper and complex lifetime parameters - Fix lifetime parameter mismatch between trait and implementation - Ensure full build compatibility with all Curvine components
- Merge 5 separate shell test scripts into comprehensive_s3_test.sh - Create comprehensive Python test suite with boto3 integration - Remove redundant test files (test_s3_complete.sh, test_streaming.sh, etc.) - Implement modular Python test classes for better maintainability - Add comprehensive test coverage: basic ops, range requests, streaming, performance, errors, metadata - All tests use English language and follow clean code principles - Shell test: 24/24 tests passed - full S3 compatibility verified - Python test: Ready for execution (requires boto3 installation) - Improved test organization and reduced maintenance overhead
|
First of all, why not use MinIO Gateway + FUSE directly? Because there is a performance bottleneck. The FUSE layer adds additional system call overhead. Each S3 operation must go through: S3 client → MinIO Gateway → FUSE → kernel → Curvine. The multi-layer conversion leads to increased latency and CPU consumption. In addition, the file system semantics of FUSE do not fully match the object storage semantics of S3, and it is difficult to ensure the consistency of concurrent access. The suggestion about Curvine worker s3 Agent is indeed valuable, and support will be considered in scenarios with higher performance requirements in the future. It is still necessary to provide a centralized gateway at present. Many machine learning and AI training directly use the s3 client. Curvine provides a standard s3 gateway to make it more convenient for users to access. |
Key fixes: 1. Replace redundant field names in struct initialization 2. Use ok_or() instead of ok_or_else() with closures 3. Replace len() > 0 with is_empty() checks 4. Fix redundant pattern matching using is_err() method 5. Use strip_prefix() instead of manual string slicing 6. Combine nested format calls to reduce allocations 7. Initialize struct fields directly instead of reassignment 8. Rename utils module to s3_utils to avoid name conflicts 9. Remove unused lifetime parameters 10. Improve code quality following Rust best practices All changes maintain functionality while improving performance.
- Create custom AuthError type for extract_args function - Add allow annotation for too_many_arguments - Combine duplicate if conditions in PutObjectOption - Fix to_string usage in format! macro - Ensure all code passes GitHub CI clippy checks
* feat: add s3 object gateway * Refactoring and optimization * Refactoring and optimization * fix: update xmlns in S3 response * add detailed comments * add detailed comments * fix log level * fix: s3_gw: fix async trait * fix: unified runtime * fix: eliminate rt.block_on calls * fix: create delete bucket handler use async trait * fix: Implement complete Range request suffix syntax support * fix: ListObjects prefix filtering functionality - Fix incorrect prefix directory traversal logic in ListObjectHandler - Change from prefix-as-directory to proper string prefix matching - Move bucket path calculation outside async block to fix lifetime issues - Add proper prefix filtering with starts_with() method - Ensure S3 standard compliance for list-objects-v2 --prefix parameter Test improvement: 89% -> 92% pass rate (Test 7.1 now passes) * fix: Return 409 Conflict for duplicate bucket creation - Add bucket existence check in CreateBucketHandler before mkdir - Return BucketAlreadyExists error for existing buckets - Map BucketAlreadyExists to HTTP 409 status code in s3_api - Ensure S3 standard compliance for duplicate bucket handling Test improvement: 92% -> 96% pass rate (Test 10.3 now passes) * fix: Bucket empty verification logic Fix grep -q . matching 'None' output incorrectly. Use proper string comparison for empty bucket check. Test improvement: 96% -> 100% pass rate * fix: HEAD object metadata in HTTP response Add x-amz-meta-* headers from object metadata to HEAD responses * fix: GET object metadata in HTTP response Add x-amz-meta-* headers to GET object responses for consistency with HEAD * Rename curvine-object to curvine-gateway * Optimize streaming GET and fix range requests with 1GB limit - Implement 4KB chunked streaming for GET operations - Fix suffix range detection threshold (u64::MAX / 2) - Add 1GB limit for suffix ranges with 416 error response - Ensure constant memory usage regardless of file size - All range types working: bytes=0-100, bytes=100-, bytes=-100 - Performance: 100MB file download in 0.5s with 4KB RAM * Optimize ListObjectHandler with async_trait - Convert ListObjectHandler to use #[async_trait::async_trait] - Simplify trait definition from Pin<Box<Future>> to async fn - Remove Box::pin wrappers in implementation - Clean up unused Future import - Maintain consistent async pattern across all handlers except GetObjectHandler * Fix ListBucketHandler to use real file metadata - Replace hardcoded current time with actual FileStatus.mtime - Convert timestamp from milliseconds to ISO 8601 format - Add fallback to current time if mtime is invalid - Each bucket now shows its real creation time instead of call time - Improve S3 API compatibility with proper metadata handling * Fix GetBucketLocationHandler to return configured region - Replace hardcoded us-east-1 with dynamic region mapping - Support common AWS regions: us-east-1, us-west-1/2, eu-west-1, etc. - Add fallback with warning for unsupported regions - Ensure return type compatibility with &'static str requirement - Improve configuration consistency across S3 API responses * Optimize curvine-gateway dependencies to use workspace config - Convert 6 dependencies to use workspace = true for version consistency - Unified: clap, thiserror, rand, toml, log, futures - Keep axum 0.8.4 independent for S3 compatibility and modern features - Reduce version management overhead and eliminate conflicts - All projects compile successfully with mixed axum versions * Add comprehensive English documentation for S3 API handlers - Added detailed function documentation for handle_create_bucket - Added comprehensive comments for handle_delete_bucket - Added thorough documentation for handle_delete_object - Enhanced comments explain features, parameters, and HTTP response codes - Improved code readability and maintainability for S3 gateway - All critical S3 operations now have professional-grade documentation * Improve S3 API with ArchiveStatus enum and PUT object refactoring 1. Replace archive_status String with type-safe ArchiveStatus enum - Add ArchiveStatus enum with ARCHIVE_ACCESS and DEEP_ARCHIVE_ACCESS variants - Implement FromStr and Display traits for proper serialization - Update HeadObjectResult to use Option<ArchiveStatus> 2. Begin handle_put_object method refactoring for better maintainability - Extract parse_put_object_path helper function for URL parsing - Add validate_content_sha256 helper for authentication validation - Simplify main method logic with cleaner error handling - Improve code readability and reduce method complexity * Remove unused validate_content_sha256 function - Delete validate_content_sha256 helper function that was never called - Clean up dead code to eliminate compiler warnings - Maintain only actively used helper functions for better code hygiene * Optimize PollRead trait with async_trait for better performance - Convert PollRead trait to use #[async_trait::async_trait] - Update all PollRead implementations to use async fn instead of Pin<Box<Future>> - Simplify tokio::fs::File, BodyReader, BufReader, and InMemoryPollReader implementations - Remove complex lifetime parameters and Box::pin wrappers - Improve code readability and reduce allocation overhead - Maintain full compatibility with existing async I/O operations - Clean up unused async_trait import to eliminate warnings * Optimize PollWrite trait and AccesskeyStore with async_trait - Convert PollWrite trait to use #[async_trait::async_trait] - Update BodyWriter PollWrite implementation to async fn format - Optimize AccesskeyStore trait with async_trait for cleaner authentication - Remove Box::pin wrappers and complex lifetime parameters - Simplify StaticAccessKeyStore implementation with direct async method - Improve code readability and reduce allocation overhead - Maintain full compatibility with existing authentication flow * Fix BodyWriter trait implementation compilation error - Update Response impl to match BodyWriter trait async_trait definition - Add #[async_trait::async_trait] annotation to impl block - Convert get_body_writer method to async fn format - Remove Box::pin wrapper and complex lifetime parameters - Fix lifetime parameter mismatch between trait and implementation - Ensure full build compatibility with all Curvine components * Consolidate and enhance test suite - Merge 5 separate shell test scripts into comprehensive_s3_test.sh - Create comprehensive Python test suite with boto3 integration - Remove redundant test files (test_s3_complete.sh, test_streaming.sh, etc.) - Implement modular Python test classes for better maintainability - Add comprehensive test coverage: basic ops, range requests, streaming, performance, errors, metadata - All tests use English language and follow clean code principles - Shell test: 24/24 tests passed - full S3 compatibility verified - Python test: Ready for execution (requires boto3 installation) - Improved test organization and reduced maintenance overhead * refactor: rename s3 test files * Fix S3 gateway configuration issues * Fix clippy warnings and errors Key fixes: 1. Replace redundant field names in struct initialization 2. Use ok_or() instead of ok_or_else() with closures 3. Replace len() > 0 with is_empty() checks 4. Fix redundant pattern matching using is_err() method 5. Use strip_prefix() instead of manual string slicing 6. Combine nested format calls to reduce allocations 7. Initialize struct fields directly instead of reassignment 8. Rename utils module to s3_utils to avoid name conflicts 9. Remove unused lifetime parameters 10. Improve code quality following Rust best practices All changes maintain functionality while improving performance. * Fix remaining clippy errors for CI compliance - Create custom AuthError type for extract_args function - Add allow annotation for too_many_arguments - Combine duplicate if conditions in PutObjectOption - Fix to_string usage in format! macro - Ensure all code passes GitHub CI clippy checks
* feat: add s3 object gateway * Refactoring and optimization * Refactoring and optimization * fix: update xmlns in S3 response * add detailed comments * add detailed comments * fix log level * fix: s3_gw: fix async trait * fix: unified runtime * fix: eliminate rt.block_on calls * fix: create delete bucket handler use async trait * fix: Implement complete Range request suffix syntax support * fix: ListObjects prefix filtering functionality - Fix incorrect prefix directory traversal logic in ListObjectHandler - Change from prefix-as-directory to proper string prefix matching - Move bucket path calculation outside async block to fix lifetime issues - Add proper prefix filtering with starts_with() method - Ensure S3 standard compliance for list-objects-v2 --prefix parameter Test improvement: 89% -> 92% pass rate (Test 7.1 now passes) * fix: Return 409 Conflict for duplicate bucket creation - Add bucket existence check in CreateBucketHandler before mkdir - Return BucketAlreadyExists error for existing buckets - Map BucketAlreadyExists to HTTP 409 status code in s3_api - Ensure S3 standard compliance for duplicate bucket handling Test improvement: 92% -> 96% pass rate (Test 10.3 now passes) * fix: Bucket empty verification logic Fix grep -q . matching 'None' output incorrectly. Use proper string comparison for empty bucket check. Test improvement: 96% -> 100% pass rate * fix: HEAD object metadata in HTTP response Add x-amz-meta-* headers from object metadata to HEAD responses * fix: GET object metadata in HTTP response Add x-amz-meta-* headers to GET object responses for consistency with HEAD * Rename curvine-object to curvine-gateway * Optimize streaming GET and fix range requests with 1GB limit - Implement 4KB chunked streaming for GET operations - Fix suffix range detection threshold (u64::MAX / 2) - Add 1GB limit for suffix ranges with 416 error response - Ensure constant memory usage regardless of file size - All range types working: bytes=0-100, bytes=100-, bytes=-100 - Performance: 100MB file download in 0.5s with 4KB RAM * Optimize ListObjectHandler with async_trait - Convert ListObjectHandler to use #[async_trait::async_trait] - Simplify trait definition from Pin<Box<Future>> to async fn - Remove Box::pin wrappers in implementation - Clean up unused Future import - Maintain consistent async pattern across all handlers except GetObjectHandler * Fix ListBucketHandler to use real file metadata - Replace hardcoded current time with actual FileStatus.mtime - Convert timestamp from milliseconds to ISO 8601 format - Add fallback to current time if mtime is invalid - Each bucket now shows its real creation time instead of call time - Improve S3 API compatibility with proper metadata handling * Fix GetBucketLocationHandler to return configured region - Replace hardcoded us-east-1 with dynamic region mapping - Support common AWS regions: us-east-1, us-west-1/2, eu-west-1, etc. - Add fallback with warning for unsupported regions - Ensure return type compatibility with &'static str requirement - Improve configuration consistency across S3 API responses * Optimize curvine-gateway dependencies to use workspace config - Convert 6 dependencies to use workspace = true for version consistency - Unified: clap, thiserror, rand, toml, log, futures - Keep axum 0.8.4 independent for S3 compatibility and modern features - Reduce version management overhead and eliminate conflicts - All projects compile successfully with mixed axum versions * Add comprehensive English documentation for S3 API handlers - Added detailed function documentation for handle_create_bucket - Added comprehensive comments for handle_delete_bucket - Added thorough documentation for handle_delete_object - Enhanced comments explain features, parameters, and HTTP response codes - Improved code readability and maintainability for S3 gateway - All critical S3 operations now have professional-grade documentation * Improve S3 API with ArchiveStatus enum and PUT object refactoring 1. Replace archive_status String with type-safe ArchiveStatus enum - Add ArchiveStatus enum with ARCHIVE_ACCESS and DEEP_ARCHIVE_ACCESS variants - Implement FromStr and Display traits for proper serialization - Update HeadObjectResult to use Option<ArchiveStatus> 2. Begin handle_put_object method refactoring for better maintainability - Extract parse_put_object_path helper function for URL parsing - Add validate_content_sha256 helper for authentication validation - Simplify main method logic with cleaner error handling - Improve code readability and reduce method complexity * Remove unused validate_content_sha256 function - Delete validate_content_sha256 helper function that was never called - Clean up dead code to eliminate compiler warnings - Maintain only actively used helper functions for better code hygiene * Optimize PollRead trait with async_trait for better performance - Convert PollRead trait to use #[async_trait::async_trait] - Update all PollRead implementations to use async fn instead of Pin<Box<Future>> - Simplify tokio::fs::File, BodyReader, BufReader, and InMemoryPollReader implementations - Remove complex lifetime parameters and Box::pin wrappers - Improve code readability and reduce allocation overhead - Maintain full compatibility with existing async I/O operations - Clean up unused async_trait import to eliminate warnings * Optimize PollWrite trait and AccesskeyStore with async_trait - Convert PollWrite trait to use #[async_trait::async_trait] - Update BodyWriter PollWrite implementation to async fn format - Optimize AccesskeyStore trait with async_trait for cleaner authentication - Remove Box::pin wrappers and complex lifetime parameters - Simplify StaticAccessKeyStore implementation with direct async method - Improve code readability and reduce allocation overhead - Maintain full compatibility with existing authentication flow * Fix BodyWriter trait implementation compilation error - Update Response impl to match BodyWriter trait async_trait definition - Add #[async_trait::async_trait] annotation to impl block - Convert get_body_writer method to async fn format - Remove Box::pin wrapper and complex lifetime parameters - Fix lifetime parameter mismatch between trait and implementation - Ensure full build compatibility with all Curvine components * Consolidate and enhance test suite - Merge 5 separate shell test scripts into comprehensive_s3_test.sh - Create comprehensive Python test suite with boto3 integration - Remove redundant test files (test_s3_complete.sh, test_streaming.sh, etc.) - Implement modular Python test classes for better maintainability - Add comprehensive test coverage: basic ops, range requests, streaming, performance, errors, metadata - All tests use English language and follow clean code principles - Shell test: 24/24 tests passed - full S3 compatibility verified - Python test: Ready for execution (requires boto3 installation) - Improved test organization and reduced maintenance overhead * refactor: rename s3 test files * Fix S3 gateway configuration issues * Fix clippy warnings and errors Key fixes: 1. Replace redundant field names in struct initialization 2. Use ok_or() instead of ok_or_else() with closures 3. Replace len() > 0 with is_empty() checks 4. Fix redundant pattern matching using is_err() method 5. Use strip_prefix() instead of manual string slicing 6. Combine nested format calls to reduce allocations 7. Initialize struct fields directly instead of reassignment 8. Rename utils module to s3_utils to avoid name conflicts 9. Remove unused lifetime parameters 10. Improve code quality following Rust best practices All changes maintain functionality while improving performance. * Fix remaining clippy errors for CI compliance - Create custom AuthError type for extract_args function - Add allow annotation for too_many_arguments - Combine duplicate if conditions in PutObjectOption - Fix to_string usage in format! macro - Ensure all code passes GitHub CI clippy checks
Summary
Implements a complete S3-compatible API gateway that provides AWS S3 API access to Curvine.
Implemented S3 Operations
Bucket Operations
Object Operations
Advanced Features
bytes=0-100bytes=100-bytes=-100x-amz-meta-*headersPerformance & Compatibility
This implementation enables Curvine to serve as a drop-in replacement for AWS S3 in existing applications and tools.