Skip to content

[TEST] Chaos testing - Scenario 01 Postgres DB Latency#39

Merged
oskarszoon merged 3 commits into
bsv-blockchain:mainfrom
oskarszoon:feature/chaos-testing
Oct 22, 2025
Merged

[TEST] Chaos testing - Scenario 01 Postgres DB Latency#39
oskarszoon merged 3 commits into
bsv-blockchain:mainfrom
oskarszoon:feature/chaos-testing

Conversation

@oskarszoon

Copy link
Copy Markdown
Contributor

🎯 What Was Added
Infrastructure (compose/):
Modified docker-compose-ss.yml with toxiproxy-postgres and toxiproxy-kafka services
Added scripts/toxiproxy-config.json - Proxy configuration for PostgreSQL and Kafka
Added scripts/chaos-test-helpers.sh - 305-line bash helper script for manual chaos testing
Added scripts/toxiproxy-chaos-testing.md - 449-line comprehensive chaos testing guide
Fixed grafana/dashboards/main.yaml structure

Test Implementation (test/chaos/):
toxiproxy_client.go (269 lines) - Full Toxiproxy HTTP API client with latency, bandwidth, timeout toxics
scenario_01_database_latency_test.go (337 lines) - Complete database latency chaos test with:

Baseline performance measurement
Latency injection (5000ms)
Timeout failure testing
Slow query success validation
Multiple query latency verification
Retry behavior testing
Recovery validation
Data consistency checks
Documentation (test/chaos/):
README.md (320 lines) - Complete chaos testing documentation
quickstart.md (210 lines) - Quick start guide
implementation_summary.md (351 lines) - Implementation details and troubleshooting
run_scenario_01.sh (99 lines) - Automated test runner script

✅ Test Status
All tests passing - Scenario 01 validates database latency handling correctly (~144s runtime) This commit establishes the foundation for chaos engineering testing in Teranode, demonstrating proper failure injection, observation, and recovery validation patterns.

rid3thespiral and others added 3 commits October 22, 2025 16:27
Ported from bitcoin-sv/teranode commit 79331ba

This commit introduces chaos engineering testing infrastructure for Teranode
using Toxiproxy to simulate network failures and database latency issues.

Key additions:
- Chaos test scenario 01: Database latency testing
- Toxiproxy integration for network failure simulation
- Docker Compose configuration with toxiproxy services
- Comprehensive documentation and helper scripts
- Test runner scripts for automated chaos testing

Changes to existing files:
- services/blockassembly/server_test.go: Fixed data race using channel-based
  approach instead of mutex (superior to the mutex approach in commit 237592c)
- compose/docker-compose-ss.yml: Added toxiproxy services and updated volume paths
Ported from bitcoin-sv/teranode commit 3a1b48e

This commit moves chaos testing to a separate workflow file to prevent it from
affecting Claude Code review and SonarQube from triggering on fork-based PRs.

Key changes:
- Created new .github/workflows/teranode_pr_chaostests.yaml workflow
- Adapted workflow to use SHA-pinned GitHub Actions per public repo security requirements
- Chaos tests now run independently from main PR test workflow
@github-actions

github-actions Bot commented Oct 22, 2025

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


This PR introduces chaos testing infrastructure using Toxiproxy to validate system resilience under database latency conditions. The implementation includes comprehensive test coverage, helper scripts, and documentation.

Analysis:

The implementation follows solid chaos engineering principles with proper test isolation, cleanup, and recovery validation. The Toxiproxy client implementation is clean with appropriate error handling. The workflow integration is minimal and focused.

Minor observations:

  1. Hardcoded credentials in test (test/chaos/scenario_01_database_latency_test.go:40-41): The database connection strings contain hardcoded credentials really_strong_password_change_me. While this matches the docker-compose configuration and is suitable for local testing, ensure this is documented as test-only.

  2. Bash script error handling (test/chaos/run_scenario_01.sh:49,59): The nc command checks use grep -q succeeded which may be fragile across different nc implementations. Consider using a more portable check or adding a comment about requiring GNU netcat.

  3. Volume path changes (compose/docker-compose-ss.yml:92-93): The PR changes volume paths from ./compose/postgres/... to ./postgres/.... Verify these paths exist relative to the compose directory to avoid runtime errors.

Positive aspects:

  • Proper test cleanup with defer statements
  • Comprehensive scenario coverage including timeouts, retries, and recovery
  • Good separation of concerns between client and test logic
  • Workflow only runs minimal required services for efficiency

Overall this is a well-structured addition to the testing infrastructure. The observations above are minor and do not block the PR's core functionality.

@sonarqubecloud

Copy link
Copy Markdown

@oskarszoon oskarszoon merged commit e246183 into bsv-blockchain:main Oct 22, 2025
8 checks passed
torrejonv pushed a commit to torrejonv/teranode that referenced this pull request Oct 26, 2025
…n#39)

Co-authored-by: rid3thespiral <24756750+rid3thespiral@users.noreply.github.com>
oskarszoon added a commit to oskarszoon/teranode that referenced this pull request May 22, 2026
Closes CodeQL alert bsv-blockchain#39. The Aerospike connection URL (which may contain
user:password in the userinfo) and the ClientPolicy struct (which has a
Password field) were being logged at Debug level. Adds redactURL and
aerospikePolicySummary helpers and rewires the log line to use them.

URL password placeholder uses 'REDACTED' (no special chars) to avoid
percent-encoding by url.UserPassword.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants