[TEST] Chaos testing - Scenario 01 Postgres DB Latency#39
Conversation
Ported from bitcoin-sv/teranode commit 79331ba This commit introduces chaos engineering testing infrastructure for Teranode using Toxiproxy to simulate network failures and database latency issues. Key additions: - Chaos test scenario 01: Database latency testing - Toxiproxy integration for network failure simulation - Docker Compose configuration with toxiproxy services - Comprehensive documentation and helper scripts - Test runner scripts for automated chaos testing Changes to existing files: - services/blockassembly/server_test.go: Fixed data race using channel-based approach instead of mutex (superior to the mutex approach in commit 237592c) - compose/docker-compose-ss.yml: Added toxiproxy services and updated volume paths
Ported from bitcoin-sv/teranode commit 3a1b48e This commit moves chaos testing to a separate workflow file to prevent it from affecting Claude Code review and SonarQube from triggering on fork-based PRs. Key changes: - Created new .github/workflows/teranode_pr_chaostests.yaml workflow - Adapted workflow to use SHA-pinned GitHub Actions per public repo security requirements - Chaos tests now run independently from main PR test workflow
|
🤖 Claude Code Review Status: Complete This PR introduces chaos testing infrastructure using Toxiproxy to validate system resilience under database latency conditions. The implementation includes comprehensive test coverage, helper scripts, and documentation. Analysis: The implementation follows solid chaos engineering principles with proper test isolation, cleanup, and recovery validation. The Toxiproxy client implementation is clean with appropriate error handling. The workflow integration is minimal and focused. Minor observations:
Positive aspects:
Overall this is a well-structured addition to the testing infrastructure. The observations above are minor and do not block the PR's core functionality. |
|
…n#39) Co-authored-by: rid3thespiral <24756750+rid3thespiral@users.noreply.github.com>
Closes CodeQL alert bsv-blockchain#39. The Aerospike connection URL (which may contain user:password in the userinfo) and the ClientPolicy struct (which has a Password field) were being logged at Debug level. Adds redactURL and aerospikePolicySummary helpers and rewires the log line to use them. URL password placeholder uses 'REDACTED' (no special chars) to avoid percent-encoding by url.UserPassword.



🎯 What Was Added
Infrastructure (compose/):
Modified docker-compose-ss.yml with toxiproxy-postgres and toxiproxy-kafka services
Added scripts/toxiproxy-config.json - Proxy configuration for PostgreSQL and Kafka
Added scripts/chaos-test-helpers.sh - 305-line bash helper script for manual chaos testing
Added scripts/toxiproxy-chaos-testing.md - 449-line comprehensive chaos testing guide
Fixed grafana/dashboards/main.yaml structure
Test Implementation (test/chaos/):
toxiproxy_client.go (269 lines) - Full Toxiproxy HTTP API client with latency, bandwidth, timeout toxics
scenario_01_database_latency_test.go (337 lines) - Complete database latency chaos test with:
Baseline performance measurement
Latency injection (5000ms)
Timeout failure testing
Slow query success validation
Multiple query latency verification
Retry behavior testing
Recovery validation
Data consistency checks
Documentation (test/chaos/):
README.md (320 lines) - Complete chaos testing documentation
quickstart.md (210 lines) - Quick start guide
implementation_summary.md (351 lines) - Implementation details and troubleshooting
run_scenario_01.sh (99 lines) - Automated test runner script
✅ Test Status
All tests passing - Scenario 01 validates database latency handling correctly (~144s runtime) This commit establishes the foundation for chaos engineering testing in Teranode, demonstrating proper failure injection, observation, and recovery validation patterns.