Skip to content

tracing-disable-must-be-zero-cpu#40

Merged
oskarszoon merged 3 commits into
bsv-blockchain:mainfrom
freemans13:bug/tracing-disable-must-be-zero-cpu
Oct 23, 2025
Merged

tracing-disable-must-be-zero-cpu#40
oskarszoon merged 3 commits into
bsv-blockchain:mainfrom
freemans13:bug/tracing-disable-must-be-zero-cpu

Conversation

@freemans13

Copy link
Copy Markdown
Collaborator

No description provided.

rid3thespiral added a commit to rid3thespiral/teranode that referenced this pull request Oct 23, 2025
…in#40)

Implement comprehensive chaos testing for Kafka broker failures to validate
system resilience and recovery capabilities.

## Changes

### New Test Implementation
- **test/chaos/scenario_02_kafka_broker_failure_test.go**: Complete chaos test
  for Kafka broker failure scenarios with 9 test phases:
  1. Baseline performance validation
  2. Latency injection (3s via toxiproxy)
  3. Sync producer behavior under latency
  4. Async producer behavior under latency
  5. Complete broker failure simulation (100% connection drops)
  6. Producer failure handling validation
  7. Consumer failure handling validation
  8. System recovery verification
  9. Message consistency validation

### Test Automation
- **test/chaos/run_scenario_02.sh**: Automated test runner with:
  - Pre-flight checks for Kafka and toxiproxy services
  - Auto-start docker compose if needed
  - Service connectivity verification
  - Automatic cleanup after test completion
  - Colored output for better readability

### Infrastructure Fixes
- **compose/docker-compose-ss.yml**:
  - Pin PostgreSQL to version 16 (prevent breaking upgrades)
  - Expose Kafka ports 9092 and 9093 to host
  - Update Kafka advertise listener to localhost for external access
  - Fixes PostgreSQL restart loop and Kafka connectivity issues

### Documentation
- **test/chaos/README.md**: Updated with:
  - Complete Scenario 2 documentation
  - Test phases and expected results
  - Usage instructions with helper scripts
  - Updated test duration estimates
  - Marked Scenario 2 as implemented

## Test Results

All tests passing (132.26 seconds):
- ✅ Baseline Performance
- ✅ Latency Injection
- ✅ Producer With Latency (sync and async)
- ✅ Broker Failure Injection
- ✅ Producer/Consumer Under Failure
- ✅ Recovery Verification
- ✅ Message Consistency

## Testing

```bash
# Run Scenario 2 only
./test/chaos/run_scenario_02.sh

# Run all chaos tests
go test -v ./test/chaos/...
```

## Prerequisites

- Docker compose with toxiproxy-kafka running
- Kafka accessible on localhost:9092 (direct) and localhost:19092 (via toxiproxy)
- Toxiproxy API on localhost:8475

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ordishs

ordishs commented Oct 23, 2025

Copy link
Copy Markdown
Collaborator

/claude-review

@github-actions

github-actions Bot commented Oct 23, 2025

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Current Review:

This PR implements a global flag-based optimization to eliminate tracing overhead when tracing is disabled. The implementation looks solid overall with good test coverage.

Key Changes:

  • Added tracingEnabled atomic bool flag in otel.go
  • Fast-path returns in Tracer(), Start(), and DecoupleTracingSpan() when disabled
  • Singleton no-op tracer pattern to avoid allocations
  • Comprehensive test coverage for enabled/disabled states

No critical issues found. The implementation follows proper patterns:

  • Uses atomic.Bool for thread-safe flag access
  • Singleton no-op tracer eliminates allocations when disabled
  • Explicit SetTracingEnabled(false) call in daemon ensures flag is set
  • All existing tests properly save/restore state

The optimization should effectively reduce CPU overhead when tracing is disabled by avoiding OpenTelemetry lookups and allocations.

@sonarqubecloud

Copy link
Copy Markdown

@oskarszoon oskarszoon merged commit 25173dd into bsv-blockchain:main Oct 23, 2025
8 checks passed
torrejonv pushed a commit to torrejonv/teranode that referenced this pull request Oct 26, 2025
Co-authored-by: oskarszoon <1449115+oskarszoon@users.noreply.github.com>
oskarszoon added a commit that referenced this pull request May 11, 2026
…he (#842)

Pins github.com/bitcoin-sv/bdk/module/gobdk to a pseudo-version of the
hot-fix on top of v1.2.3 (bitcoin-sv/bdk PR #40) that adds a per-instance
CachingScriptChecker overriding CheckSig to short-circuit identical
signature verifications within one EvalScript run.

Without the hot-fix the validator stalls for hours on testnet block
1,451,505 / tx 7bc9a3408dd0c87b835c887a0bce22c20788fc3c4b953929d4367656d80acab5,
whose 490 KB locking script performs 245,001 identical ECDSA verifications.
The cgo entry point is uninterruptible from Go so it presents to operators
as a hang in _Cfunc_ScriptEngine_VerifyScript. With the hot-fix the
verifier completes in ~70 s.

Do not bump gobdk to v1.2.4 (or later) until bitcoin-sv/bdk PR #41 — the
master/v1.2.4 port of the same hot-fix — has been merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants