Skip to content

feat: add a cleanup mechanism for expired inodes (#41)#175

Merged
bigbigxu merged 1 commit intoCurvineIO:mainfrom
jlon:ttl
Aug 12, 2025
Merged

feat: add a cleanup mechanism for expired inodes (#41)#175
bigbigxu merged 1 commit intoCurvineIO:mainfrom
jlon:ttl

Conversation

@jlon
Copy link
Copy Markdown
Contributor

@jlon jlon commented Aug 12, 2025

🎯 Overview

This PR introduces a comprehensive TTL (Time-To-Live) cleanup system for Curvine's inode management, enabling automatic lifecycle management of files and directories with configurable expiration policies.

✨ Key Features

🏗️ Modular Architecture

  • TTL Manager: High-level orchestration and unified API
  • TTL Service: Service layer for cleanup operations
  • TTL Checker: Core expiration processing with bucket-based approach
  • TTL Executor: Filesystem-integrated cleanup execution
  • TTL Scheduler: Heartbeat-based periodic cleanup scheduling
  • TTL Bucket: Time-based organization for efficient batch processing

🔧 Core Capabilities

  • Multiple TTL Actions: Support for Delete, Move, and Free operations
  • Bucket-based Processing: Efficient time-interval organization for batch operations
  • Intelligent Retry Logic: Configurable retry attempts with timeout protection
  • Path Caching: High-performance inode path resolution with LRU caching
  • UFS Integration: Ready for Unified File System data migration
  • Thread-safe Operations: Concurrent access support with DashMap and RwLock

⚙️ Configuration System

pub struct TtlCleanupConfig {
    pub check_interval_ms: u64,        // Cleanup frequency (default: 60s)
    pub max_retry_count: u32,          // Max retry attempts (default: 3)
    pub max_retry_duration_ms: u64,    // Global retry timeout (default: 30min)
    pub retry_interval_ms: u64,        // Retry delay (default: 5s)
    pub bucket_interval_ms: u64,       // Bucket time window (default: 1h)
    pub cleanup_timeout_ms: u64,       // Operation timeout (default: 30s)
}

📊 Monitoring & Statistics

  • Comprehensive cleanup result tracking
  • Execution time monitoring with timeout detection
  • Success/failure statistics with detailed error reporting
  • Retry attempt tracking and bucket processing metrics

🔄 TTL Workflow

  1. Registration: Files/directories register TTL metadata during creation
  2. Bucket Organization: Inodes organized into time-based buckets for efficient processing
  3. Scheduled Cleanup: Heartbeat scheduler triggers periodic cleanup operations
  4. Batch Processing: Expired buckets processed in batches for optimal performance
  5. Action Execution: TTL actions (delete/move/free) executed based on storage policy
  6. Retry Management: Failed operations retried with exponential backoff

🛠️ Technical Implementation

Storage Policy Integration

// TTL configuration derived from storage policy
pub struct TtlConfig {
    pub ttl_ms: u64,              // Time-to-live duration
    pub action: TtlAction,         // Cleanup action (Delete/Move/Free)
    pub creation_time_ms: u64,     // Creation timestamp
}

Efficient Bucket Management

  • BTreeMap-based sorted storage for O(log n) bucket lookup
  • DashMap for concurrent access to inode-to-bucket mapping
  • Automatic cleanup of empty expired buckets

Filesystem Integration

  • Path resolution caching for improved performance
  • Real filesystem operations through MasterFilesystem
  • Storage policy awareness for action determination

🧪 Usage Example

// Initialize TTL system
let ttl_manager = InodeTtlManager::create(filesystem, master_conf)?;
ttl_manager.initialize()?;

// Register file with TTL
let ttl_config = TtlConfig::new(3600000, TtlAction::Delete); // 1 hour TTL
let metadata = TtlInodeMetadata::new(inode_id, ttl_config);
ttl_manager.add_inode(metadata)?;

// Cleanup is handled automatically by scheduler
// Manual cleanup also supported:
let result = ttl_manager.cleanup()?;

🔧 Configuration Integration

The system integrates with existing MasterConf configuration:

  • ttl_checker_interval_ms: Cleanup execution frequency
  • ttl_bucket_interval_ms: Bucket time window size
  • ttl_checker_retry_attempts: Maximum retry attempts

🚦 Testing & Validation

  • Comprehensive error handling with custom TtlError types
  • Configuration validation with sensible defaults
  • Extensive logging for debugging and monitoring
  • Thread-safety validation for concurrent operations

📈 Performance Characteristics

  • O(log n) bucket lookup using BTreeMap
  • Batch processing reduces filesystem operation overhead
  • Path caching minimizes redundant path resolution
  • Concurrent processing with thread-safe data structures

🎛️ Future Enhancements

  • UFS data migration implementation (placeholder ready)
  • Advanced cleanup policies (size-based, access-based)
  • Metrics integration with Prometheus
  • Journal system integration for TTL operations

Compatibility

  • Fully backward compatible with existing inode system
  • Optional TTL functionality - existing files unaffected
  • Configurable system - can be disabled if not needed

This implementation provides a production-ready TTL system that enhances Curvine's data lifecycle management capabilities while maintaining high performance and reliability standards.

@lzjqsdd lzjqsdd self-requested a review August 12, 2025 07:11
@jlon jlon force-pushed the ttl branch 2 times, most recently from 3097fd6 to 6a0dbc5 Compare August 12, 2025 08:26
@bigbigxu bigbigxu merged commit e2b55cf into CurvineIO:main Aug 12, 2025
3 checks passed
@lzjqsdd lzjqsdd added the enhancement New feature or request label Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants