pss: Improve pressure backstop queue handling by kortatu · Pull Request #1680 · ethersphere/swarm

kortatu · 2019-08-20T07:05:00Z

We wanted forward pending messages to always have a slot booked in the outbox queue. Although we kept the channel to implement that queue, we first check if there is space to add a new message taking into account pending messages.
When a message is enqueued for the first time, the forwardPending variable is incremented, and if that message is sent or is discarded, forwardPending is decreased.
There is a new flag pending in the enqueue function. When that flag is true, it means that this message already has a booked slot in the channel (skipping the new capacity check). However, if the flag is false (in most cases) it is a new message and before adding it to the queue we need to check if there is actual space left ( forwardPending < defaultOutboxCapacity ).
This way the numbers of messages in the channel will be always <= forwardPending
Both the update of the forwardPending variable and the enqueue of the message are protected with a Mutex to allow asynchronous processing.
References issue #1654

…to ensure channel capacity for pending messages

pss/pss.go

zelig

so if the big picture motivation is to be able to requeue a msg for retry even if a new message could not be enqueued.

i believe the cleanest way to do this is

outbox := make([]msg, queueCapacity)
slots := make(chan int, queueCapacity)
process :=make(chan int)

func enqueue(m msg) {
      ...
      select {
      case i := <- slots;
          outbox[i] = msg
          select {
          case process <- i:
          case <- quit:
          }
      default:
       // queue contention error
     } 
}

...

// each  worker can just reinsert  async
for i := range process {
    // send and forward
    // if fails to send then 
       select {
          case process <- i:
          case <- quit:
          }
    // if success give back the slot                      
   slots <- i // never blocks
}

pss/pss.go

…ss struct. New tests for enqueueing

…l for a reenqued message

nolash

I see you chose the solution with the mutexes. Which is fine; I already said feel free to choose between the alternatives.

I did have some second thoughts over the weekend though. I think the solution we should pick is the one that comsumes the least resources, especially since this code can get called a lot. Would you mind benchmarking this solution with a channel-based akin to what @zelig proposed please?

nolash · 2019-08-26T07:51:35Z

pss/pss.go

+	p.outboxMutex.Lock()
+	defer p.outboxMutex.Unlock()
+	pendingSize := p.getPending()
+	// Only allow defaultOutboxCapacity messages at most processed (both enqueued or being forwarded)


defaultOutbuxCapacity -> capacity of outbox

Changed in commit f73157b

It sounds good to me.
The only thing is that we only implemented the mechanism to ensure that a reenqued message always have a slot booked in the outbox, we didn't implement parallel processing of messages.
I think that in order to test both solutions we shoyuld compare them with parallel processing in both cases.
Anyway, I will implement @zelig proposal (with the augmented buffer to avoid deadlocks) in a different branch to compare them.

Great. Should we change the label to in progress again maybe?

In deed. Until we can't compare perfromance with the other solution it should be in progress.

nolash · 2019-08-26T07:52:29Z

pss/pss.go

-
 	metrics.GetOrRegisterCounter("pss.enqueue.outbox.full", nil).Inc(1)
+	if pending {
+		log.Crit("unexpected outbox full for pending message!")


log.Crit actually panics by its own

FIxed in commit 4e2aff6

pss/pss_test.go

nolash · 2019-08-26T08:04:07Z

pss/pss_test.go

+	}
+
+	reEnqueue := func(iteration int) {
+		time.Sleep(1 * time.Millisecond)


Please let's not use Sleeps in tests if we can help it.

pss/pss_test.go

nolash · 2019-08-26T08:12:23Z

pss/pss_test.go

+	topic := [4]byte{}
+	data := []byte{0x66, 0x6f, 0x6f}
+
+	msg := testMessage(messageAddr, topic, data)


Maybe we can just have testRandomMessage and conceal the addr, topic, data within?

Good idea. Fixed in commit 4e2aff6

Changed testMessage to testRandomMessage and conceal addr, topic and data inside

Added benchmark test for outbox message processing

kortatu · 2019-08-27T14:07:17Z

Added parallelization of message forwarding. Now the main routine spawn a new sub-routine for each message extracted from outbox channel.
Also added benchmark test for processing 200000 incoming messages.

kortatu · 2019-08-28T10:26:20Z

Added b.N loop in benchmark tests. Now results are more stable:

$ go test -v -bench=BenchmarkMessageProcessing -run=^$
goos: linux
goarch: amd64
pkg: github.com/ethersphere/swarm/pss
BenchmarkMessageProcessing/0.00_-4                     1        1426398612 ns/op        171242400 B/op   1792064 allocs/op
BenchmarkMessageProcessing/0.01_-4                     1        1073405959 ns/op        80046600 B/op    1528512 allocs/op
BenchmarkMessageProcessing/0.05_-4                     2         878509165 ns/op        67466820 B/op    1391229 allocs/op
PASS
ok      github.com/ethersphere/swarm/pss        5.414s

nolash · 2019-08-30T08:42:33Z

obsoleted by #1695

gitperonam and others added 2 commits August 20, 2019 08:34

pss: ensure space in queue for failed forwards

20c4a9c

pss: Added RWMutex for accesing forwardPending variable. Added mutex …

a44824c

…to ensure channel capacity for pending messages

kortatu changed the title ~~pss: Improve pressure backstop queue handling #1654~~ pss: Improve pressure backstop queue handling Aug 20, 2019

nolash added enhancement in progress pss ready for review and removed in progress labels Aug 20, 2019

nolash self-requested a review August 20, 2019 12:42

pss: Better way of counting number of pending messages

03e3f4b

nolash suggested changes Aug 20, 2019

View reviewed changes

pss/pss.go Outdated Show resolved Hide resolved

pss/pss.go Outdated Show resolved Hide resolved

zelig reviewed Aug 21, 2019

View reviewed changes

pss/pss.go Outdated Show resolved Hide resolved

kortatu added 2 commits August 21, 2019 12:21

pss: forwardPending mutex RLock. EnqueueMutext moved from method to P…

462897d

…ss struct. New tests for enqueueing

pss: Log crit and panic in an unexpected situation when outbox is ful…

a6036b6

…l for a reenqued message

kortatu requested a review from nolash August 23, 2019 08:49

nolash suggested changes Aug 26, 2019

View reviewed changes

kortatu added 2 commits August 26, 2019 10:55

pss: changed comment match actual behaviour

f73157b

pss: Removed panic as log.Crit panic itself.

4e2aff6

Changed testMessage to testRandomMessage and conceal addr, topic and data inside

kortatu added in progress and removed ready for review labels Aug 26, 2019

zelig mentioned this pull request Aug 27, 2019

review PSS and pushsync #1574

Closed

pss: Parallelized message forwarding.

b779498

Added benchmark test for outbox message processing

kortatu force-pushed the issue-1654 branch from ddaf5b3 to b779498 Compare August 27, 2019 14:04

kortatu mentioned this pull request Aug 27, 2019

pss: Improve pressure backstop queue handling - no mutex #1695

Merged

kortatu added ready for another review and removed in progress labels Aug 27, 2019

pss: Implemented b.N loop and timeout in benchmark tests

2f7fdfc

kortatu requested a review from nolash August 28, 2019 10:26

nolash closed this Aug 30, 2019

kortatu deleted the issue-1654 branch September 2, 2019 10:21

zelig mentioned this pull request Sep 4, 2019

swarm: add pull request template #1715

Merged

Conversation

kortatu commented Aug 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zelig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nolash left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kortatu Aug 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kortatu commented Aug 27, 2019

Uh oh!

kortatu commented Aug 28, 2019

Uh oh!

nolash commented Aug 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kortatu commented Aug 20, 2019 •

edited

Loading

nolash left a comment •

edited

Loading

kortatu Aug 26, 2019 •

edited

Loading