Skip to content

prepare API to get/set RandomState instances unique to each batch index#4448

Closed
grafi-tt wants to merge 4 commits intochainer:masterfrom
grafi-tt:randomstate-per-batch-idx-api
Closed

prepare API to get/set RandomState instances unique to each batch index#4448
grafi-tt wants to merge 4 commits intochainer:masterfrom
grafi-tt:randomstate-per-batch-idx-api

Conversation

@grafi-tt
Copy link
Copy Markdown
Contributor

@grafi-tt grafi-tt commented Mar 8, 2018

Part of #4230.

iterators.get_random_state (to be called by user) and iterators.random_state.set_random_state (to be called by iterator implementation) are implemented, and get_random_state is documented.

Currently iterators set dummy state by set_random_state which raises NotImplementedError immediately when get_random_state is called.

@delta2323
Copy link
Copy Markdown
Member

Sorry for not my responding so long. Thank you for making a smaller PR. I'll take a look.

@grafi-tt
Copy link
Copy Markdown
Contributor Author

@delta2323 Many thanks, and congratulation for 4.0.0 RC release!


def dummy_state():
raise NotImplementedError
with random_state.set_random_state(dummy_state):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me this context manager doesn't take any effect on the return batch. If so, are you thinking to modify the code in the PRs following this one?



def get_random_state():
"""Get a :class:`~numpy.random.RandomState` during iteration.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "during iteration" is not needed as users can call this function any time, although the most expected use case is to call it during iterations.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to note that returned instance is a thread-local variable.

def get_random_state():
"""Get a :class:`~numpy.random.RandomState` during iteration.

The instance obtained is unique to each index on a batch. You can
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the same reason as the previous one, I would suggest to describe it as one of expected use cases.


"""
state = _random_state.value
if not isinstance(state, numpy.random.RandomState):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you assume as a type of state other than RandomState?

Returns:
A list of :class:`numpy.random.RandomState` instances.
"""
return [derive_random_state() for _ in range(num)]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think six.moves.range is better because num can be arbitrarily large.

@delta2323
Copy link
Copy Markdown
Member

I added several comments. Could you check that?

@delta2323 delta2323 added the cat:feature Implementation that introduces new interfaces. label Mar 28, 2018
@delta2323
Copy link
Copy Markdown
Member

delta2323 commented May 14, 2018

@grafi-tt Hi, can you spare time to work on this PR? But I heard from other members that we may not need this PR any more. How do you think?

@grafi-tt
Copy link
Copy Markdown
Contributor Author

@delta2323 Sorry, I don't have enough time to work on the PRs. I'm fine with closing them.

It seems MultiprocessIterator isn't of high priority at all, because MultithreadIterator works well on typical workloads, as numpy and recent version of Pillow faithfully release GIL.

Just out of curiously, what other members say? (If you can't reveal, it's OK.)

@grafi-tt grafi-tt closed this May 28, 2018
@delta2323
Copy link
Copy Markdown
Member

Sorry for late response. Thank you for closing the PR. One of my member agreed with you in that it is enough to use MultiThreadIterator for preprocessing in which Python overhead and GIL is not problematic.

@okuta okuta added this to the Closed issues and PRs milestone Jun 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:feature Implementation that introduces new interfaces.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants