ReplayBuffer
- class cpprb.ReplayBuffer(size, env_dict=None, next_of=None, *, stack_compress=None, default_dtype=None, Nstep=None, mmap_prefix=None, **kwargs)
Bases:
objectReplay Buffer class to store transitions and to sample them randomly.
The transition can contain anything compatible with NumPy data type. User can specify by
env_dictparameters at constructor freely.The possible standard transition contains observation (
obs), action (act), reward (rew), the next observation (next_obs), and done (done).>>> env_dict = {"obs": {"shape": (4,4)}, ... "act": {"shape": 3, "dtype": np.int16}, ... "rew": {}, ... "next_obs": {"shape": (4,4)}, ... "done": {}}
In this class, sampling is random sampling and the same transition can be chosen multiple times.
Initialize
ReplayBuffer- Parameters:
size (int) – Buffer size
env_dict (dict of dict, optional) – Dictionary specifying environments. The keys of
env_dictbecome environment names. The values ofenv_dict, which are alsodict, defines"shape"(default1) and"dtypes"(fallback todefault_dtype)next_of (str or array like of str, optional) – Value names whose next items share memory region. The
"next_"prefixed items (eg.next_obsforobs) are automatically added toenv_dictwithout duplicated memory.stack_compress (str or array like of str, optional) – Value names whose duplicated stack dimension is compressed. The values must have stacked dimension at the last dimension.
default_dtype (numpy.dtype, optional) – Fallback dtype. The default value is
numpy.singleNstep (dict, optional) – If this option is specified, Nstep reward is used.
Nstep["size"]isintspecifying step size of Nstep reward.Nstep["rew"]isstror array like ofstrspecifying Nstep reward to be summed.Nstep["gamma"]is float specifying discount factor, its default is0.99.Nstep["next"]isstror list ofstrspecifying next values to be moved. When this option is enabled,"done"is required atenv_dict.mmap_prefix (str, optional) – File name prefix to map buffer data using mmap. If
None(default), stores only on memory. This feature is designed for very large data which cannot be located on physical memory.
Examples
Create simple replay buffer with buffer size of \(10^6\).
>>> rb = ReplayBuffer(1e+6, ... {"obs": {"shape": 3}, "act": {}, "rew": {}, ... "next_obs": {"shape": 3}, "done": {}})
Create replay buffer with
np.float64, but only"act"isnp.int8.>>> rb = ReplayBuffer(1e+6, ... {"obs": {"shape": 3}, "act": {"dtype": np.int8}, ... "rew": {}, ... "next_obs": {"shape": 3}, "done": {}}, ... default_dtype = np.float64)
Create replay buffer with
next_ofmemory compression for"obs". In this example,"next_obs"is automatically added and shares the memory with"obs".>>> rb = ReplayBuffer(1e+6, ... {"obs": {"shape": 3}, "act": {}, "rew": {}, "done": {}}, ... next_of="obs")
Create replay buffer with
stack_compressmemory compression for"obs". The stacked data must be a sliding window of a sequential data, and the last dimension is the stack dimension.>>> rb = ReplayBuffer(1e+6, ... {"obs": {"shape": (3,2)}}, ... stack_compress="obs") >>> rb.add(obs=[[1,2], ... [1,2], ... [1,2]]) 0 >>> rb.add(obs=[[2,3], ... [2,3], ... [2,3]]) 1
Create very large replay buffer mapping on disk.
>>> rb = ReplayBuffer(1e+9, {"obs": "shape": 3}, mmap_prefix="rb_data")
Methods Summary
add(self, **kwargs)Add transition(s) into replay buffer.
clear(self)Clear replay buffer.
get_all_transitions(self, bool shuffle)Get all transitions stored in replay buffer.
get_buffer_size(self)Get buffer size
get_current_episode_len(self)Get current episode length
get_next_index(self)Get the next index to store
get_stored_size(self)Get stored size
is_Nstep(self)Get whether use Nstep or not
load_transitions(self, file)Load transitions from file
on_episode_end(self)Call on episode end
sample(self, batch_size)Sample the stored transitions randomly with specified size
save_transitions(self, file, *[, safe])Save transitions to file
Methods Documentation
- add(self, **kwargs)
Add transition(s) into replay buffer.
Multple sets of transitions can be added simultaneously.
- Parameters:
**kwargs (array like or float or int) – Transitions to be stored.
- Returns:
The first index of stored position. If all transitions are stored into
NstepBufferand no transtions are stored into the main buffer,Noneis returned.- Return type:
int or None
- Raises:
KeyError – If any values defined at constructor are missing.
Warning
All values must be passed by key-value style (keyword arguments). It is user responsibility that all the values have the same step-size.
Examples
>>> rb = ReplayBuffer(1e+6, {"obs": {"shape": 3}})
Add a single transition:
[1,2,3].>>> rb.add(obs=[1,2,3])
Add three step sequential transitions:
[1,2,3],[4,5,6], and[7,8,9]simultaneously.>>> rb.add(obs=[[1,2,3], ... [4,5,6], ... [7,8,9]])
- clear(self) void
Clear replay buffer.
Set
indexandstored_sizeto0.Example
>>> rb = ReplayBuffer(5,{"done",{}}) >>> rb.add(1) >>> rb.get_stored_size() 1 >>> rb.get_next_index() 1 >>> rb.clear() >>> rb.get_stored_size() 0 >>> rb.get_next_index() 0
- get_all_transitions(self, bool shuffle: bool = False)
Get all transitions stored in replay buffer.
- Parameters:
shuffle (bool, optional) – When
True, transitions are shuffled. The default value isFalse.- Returns:
transitions – All transitions stored in this replay buffer.
- Return type:
dict of numpy.ndarray
- get_buffer_size(self) size_t
Get buffer size
- Returns:
buffer size
- Return type:
size_t
- get_current_episode_len(self) size_t
Get current episode length
- Returns:
Current episode length
- Return type:
size_t
- get_next_index(self) size_t
Get the next index to store
- Returns:
the next index to store
- Return type:
size_t
- get_stored_size(self) size_t
Get stored size
- Returns:
stored size
- Return type:
size_t
- is_Nstep(self) bool
Get whether use Nstep or not
- Returns:
Whether Nstep is used
- Return type:
bool
- load_transitions(self, file)
Load transitions from file
- Parameters:
file (str or file-like object) – File to read data
- Raises:
ValueError – When file format is wrong.
Warning
In order to avoid security vulnerability, you must not load untrusted file, since this method is based on
pickle.
- on_episode_end(self) void
Call on episode end
Finalize the current episode by moving remaining Nstep buffer transitions, evacuating overlapped data for memory compression features, and resetting episode length.
Notes
Calling this function at episode end is the user responsibility, since episode exploration can be terminated at certain length even though any
doneflags from environment is not set.
- sample(self, batch_size)
Sample the stored transitions randomly with specified size
- Parameters:
batch_size (int) – sampled batch size
- Returns:
sample – Sampled batch transitions, which might contains the same transition multiple times.
- Return type:
dict of ndarray
Examples
>>> rb = ReplayBuffer(1e+6, {"obs": {"shape": 3}}) >>> rb.add(obs=[1,2,3]) >>> rb.add(obs=[[1,2,3],[1,2,3]]) >>> rb.sample(4) {'obs': array([[1., 2., 3.], [1., 2., 3.], [1., 2., 3.], [1., 2., 3.]], dtype=float32)}
- save_transitions(self, file, *, safe=True)
Save transitions to file
- Parameters:
file (str or file-like object) – File to write data
safe (bool, optional) – If
False, we try more aggressive compression which might encounter future incompatibility.