Follow-up for standalone Caplin

Follow-up to PR #20035 to address the following concerns from Claude:

  **HIGH - Potential correctness issues**

  1. ExecutionEngineReader.CacheBody eviction is not thread-safe with reads                                                                                                                                         
  
  getters/execution_rpc.go:93-101 - The eviction in CacheBody iterates and deletes from r.cache while holding r.mu (write lock), which is fine. But the threshold calculation r.highestCached - 64 can underflow to 
  uint64(max) if highestCached < 64. This would evict nothing (all keys are < max), so it's not a correctness bug, but worth a comment or explicit guard.
                                                                                                                                                                                                                    
  2. cmd/caplin/main.go:100 - beaconCfg passed as nil to NewExecutionClientEngineRPC                                                                                                                                
  
  cc, err := execution_client.NewExecutionClientEngineRPC(cfg.JwtSecret, cfg.EngineAPIAddr, cfg.EngineAPIPort, nil)                                                                                                 
                                                                                                                                                                                                                    
  The nil beaconCfg means getAssembledBlockV3 and getAssembledBlockFromResponse will hit the "beaconCfg not set" error if SetBeaconChainConfig isn't called before block production. This is presumably set later   
  via caplin1.RunCaplinService, but I don't see SetBeaconChainConfig called on the ExecutionClientEngine in that path. If beaconCfg propagation relies on the ExecutionEngineReader.SetBeaconChainConfig, that's a  
  different object. The ExecutionClientEngine itself may never get its beaconCfg set in standalone mode, which would cause GetAssembledBlock to fail at block production time.                                      
                              
  Follow-up question: Is SetBeaconChainConfig called on the ExecutionClientEngine somewhere in RunCaplinService? If not, this is a block-production blocker in standalone remote mode.                              
  
  3. executionPayloadToEth1Block - missing fields for Electra+ forks                                                                                                                                                
                              
  execution_client_engine.go:470-530 - The conversion handles transactions, withdrawals, blob gas fields. But for Electra+, there may be additional fields (e.g., execution requests are not stored on the Eth1Block
   itself here — they're returned separately via requestsBundle). This seems intentional since execution requests come from the GetPayloadResponse.ExecutionRequests, not from the payload itself. Confirm this is
  correct.                                                                                                                                                                                                          
                              
  **MEDIUM - Design and robustness concerns**                                                                                                                                                                           
  
  4. ForkChoiceUpdate version mapping has an implicit Gloas catch-all                                                                                                                                               
                              
  execution_client_engine.go:222-231:                                                                                                                                                                               
  case clparams.DenebVersion, clparams.ElectraVersion, clparams.FuluVersion:
      resp, err = cc.engine.ForkchoiceUpdatedV3(...)                        
  default: // Gloas+ (Amsterdam)                                                                                                                                                                                    
      resp, err = cc.engine.ForkchoiceUpdatedV4(...)                                                                                                                                                                
                                                                                                                                                                                                                    
  The default case silently sends V4 for any unknown future version. If a new version is added to clparams without updating this switch, it could send the wrong FCU version. An explicit case                      
  clparams.GloasVersion: with a default that returns an error (matching the pattern in NewPayload) would be safer.                                                                                                  
                              
  5. onNewFinalized cleanup — O(n) sync.Map iteration                                                                                                                                                               
                              
  forkchoice/utils.go:75-102 - Four sync.Map.Range calls on every finalization. On a long-running node these maps grow with every block and are only pruned here. The sync.Map deletion during Range is safe, but   
  the linear scan over all entries could be expensive. A sorted structure or LRU with slot-indexed eviction would be more efficient. Not a blocker for merge, but worth a follow-up.
                                                                                                                                                                                                                    
  6. IsCanonicalHash remote mode does two RPC calls                                                                                                                                                                 
  
  execution_client_engine.go:303-315 - eth_getBlockByHash then eth_getBlockByNumber. This is correct for handling uncle blocks, but is called on the hot path during block processing. Consider caching or a        
  single-call approach if this becomes a performance bottleneck.
                                                                                                                                                                                                                    
  7. Ready() always returns true in remote mode                                                                                                                                                                     
  
  execution_client_engine.go:320:                                                                                                                                                                                   
  return true, nil            
                  
  This means caplin won't wait for the EL to be ready in standalone mode. If the EL is still syncing, caplin could start sending payloads that get rejected. Consider calling an RPC method (e.g., eth_syncing) to
  check actual readiness.                                                                                                                                                                                           
  
  **LOW - Style, minor issues**                                                                                                                                                                                         
                              
  8. EngineAPIRPCClient — repetitive boilerplate                                                                                                                                                                    
  
  The 25+ methods in engine_api_rpc_client.go are all identical CallContext wrappers. Consider a generic helper to reduce duplication:                                                                              
  func callEngine[T any](c *EngineAPIRPCClient, ctx context.Context, method string, args ...any) (*T, error) { ... }
  Not blocking, just a suggestion for maintainability.                                                                                                                                                              
                                                                                                                                                                                                                    
  9. caplin-hive-testing.md references absolute paths                                                                                                                                                               
                                                                                                                                                                                                                    
  .github/docs/caplin-hive-testing.md:27 - /erigon/mark/hive/ is a developer-local path. Consider making paths relative or using $HIVE_DIR placeholder.                                                             
                                                                                                                                                                                                                    
  10. caplin-assertoor.io file extension                                                                                                                                                                            
                              
  .github/workflows/kurtosis/caplin-assertoor.io - The .io extension is unusual for a YAML config file. The other files in that directory use .io too, so this is consistent with the existing pattern, just noting 
  it.                         
                                                                                                                                                                                                                    
  11. ParticipationBitList.MarshalJSON allocates []string on every call                                                                                                                                             
  
  solid/participation_bitlist.go:222-226 - For large validator sets (1M+ validators), this allocates a large string slice. The UnmarshalJSON fallback to hex format is good for backwards compatibility.            
                              
  12. block_1.json test fixture is ~2000 lines                                                                                                                                                                      
                              
  The test data file is very large. Consider whether it could be generated or compressed.                                                                                                                                                                                                                                       
                                                                                                                                                                                                                    
  **Questions for the author**                                                                                                                                                                                          
                                                                                                                                                                                                                    
  1. Where does SetBeaconChainConfig get called on the ExecutionClientEngine in standalone mode? (HIGH #2 above)                                                                                                    
  2. The consuming flag on EngineServer — in embedded engine API mode, consuming is true because !config.PolygonPosSingleSlotFinality. Is this correct for all chain types caplin targets, or could Polygon PoS
  chains break this?                                                                                                                                                                                                
  3. The ForkChoiceUpdate sends checkpoints from FinalizedCheckpoint()/JustifiedCheckpoint() (realized values). The spec uses finalized/justified from the store. Since you added unrealized checkpoints to the fork
   choice filter, should the FCU also use unrealized values?                                                                                                                                                        
                              


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-up for standalone Caplin #20151

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Follow-up for standalone Caplin #20151

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions