ReplayBuffer

class cpprb.ReplayBuffer(size, env_dict=None, next_of=None, *, stack_compress=None, default_dtype=None, Nstep=None, mmap_prefix=None, **kwargs)

Bases: object

Replay Buffer class to store transitions and to sample them randomly.

The transition can contain anything compatible with NumPy data type. User can specify by env_dict parameters at constructor freely.

The possible standard transition contains observation (obs), action (act), reward (rew), the next observation (next_obs), and done (done).

>>> env_dict = {"obs": {"shape": (4,4)},
...             "act": {"shape": 3, "dtype": np.int16},
...             "rew": {},
...             "next_obs": {"shape": (4,4)},
...             "done": {}}

In this class, sampling is random sampling and the same transition can be chosen multiple times.

Initialize ReplayBuffer

Parameters:

size (int) – Buffer size
env_dict (dict of dict, optional) – Dictionary specifying environments. The keys of env_dict become environment names. The values of env_dict, which are also dict, defines "shape" (default 1) and "dtypes" (fallback to default_dtype)
next_of (str or array like of str, optional) – Value names whose next items share memory region. The "next_" prefixed items (eg. next_obs for obs) are automatically added to env_dict without duplicated memory.
stack_compress (str or array like of str, optional) – Value names whose duplicated stack dimension is compressed. The values must have stacked dimension at the last dimension.
default_dtype (numpy.dtype, optional) – Fallback dtype. The default value is numpy.single
Nstep (dict, optional) – If this option is specified, Nstep reward is used. Nstep["size"] is int specifying step size of Nstep reward. Nstep["rew"] is str or array like of str specifying Nstep reward to be summed. Nstep["gamma"] is float specifying discount factor, its default is 0.99. Nstep["next"] is str or list of str specifying next values to be moved. When this option is enabled, "done" is required at env_dict.
mmap_prefix (str, optional) – File name prefix to map buffer data using mmap. If None (default), stores only on memory. This feature is designed for very large data which cannot be located on physical memory.

Examples

Create simple replay buffer with buffer size of \(10^6\).

>>> rb = ReplayBuffer(1e+6,
...                   {"obs": {"shape": 3}, "act": {}, "rew": {},
...                    "next_obs": {"shape": 3}, "done": {}})

Create replay buffer with np.float64, but only "act" is np.int8.

>>> rb = ReplayBuffer(1e+6,
...                   {"obs": {"shape": 3}, "act": {"dtype": np.int8},
...                    "rew": {},
...                    "next_obs": {"shape": 3}, "done": {}},
...                    default_dtype = np.float64)

Create replay buffer with next_of memory compression for "obs". In this example, "next_obs" is automatically added and shares the memory with "obs".

>>> rb = ReplayBuffer(1e+6,
...                   {"obs": {"shape": 3}, "act": {}, "rew": {}, "done": {}},
...                   next_of="obs")

Create replay buffer with stack_compress memory compression for "obs". The stacked data must be a sliding window of a sequential data, and the last dimension is the stack dimension.

>>> rb = ReplayBuffer(1e+6,
...                   {"obs": {"shape": (3,2)}},
...                   stack_compress="obs")
>>> rb.add(obs=[[1,2],
...             [1,2],
...             [1,2]])
0
>>> rb.add(obs=[[2,3],
...             [2,3],
...             [2,3]])
1

Create very large replay buffer mapping on disk.

>>> rb = ReplayBuffer(1e+9, {"obs": "shape": 3}, mmap_prefix="rb_data")

Methods Summary

`add`(self, **kwargs)	Add transition(s) into replay buffer.
`clear`(self)	Clear replay buffer.
`get_all_transitions`(self, bool shuffle)	Get all transitions stored in replay buffer.
`get_buffer_size`(self)	Get buffer size
`get_current_episode_len`(self)	Get current episode length
`get_next_index`(self)	Get the next index to store
`get_stored_size`(self)	Get stored size
`is_Nstep`(self)	Get whether use Nstep or not
`load_transitions`(self, file)	Load transitions from file
`on_episode_end`(self)	Call on episode end
`sample`(self, batch_size)	Sample the stored transitions randomly with specified size
`save_transitions`(self, file, *[, safe])	Save transitions to file

Methods Documentation

add(self, **kwargs)

Add transition(s) into replay buffer.

Multple sets of transitions can be added simultaneously.

Parameters:: **kwargs (array like or float or int) – Transitions to be stored.
Returns:: The first index of stored position. If all transitions are stored into NstepBuffer and no transtions are stored into the main buffer, None is returned.
Return type:: int or None
Raises:: KeyError – If any values defined at constructor are missing.

Warning

All values must be passed by key-value style (keyword arguments). It is user responsibility that all the values have the same step-size.

Examples

>>> rb = ReplayBuffer(1e+6, {"obs": {"shape": 3}})

Add a single transition: [1,2,3].

>>> rb.add(obs=[1,2,3])

Add three step sequential transitions: [1,2,3], [4,5,6], and [7,8,9] simultaneously.

>>> rb.add(obs=[[1,2,3],
...             [4,5,6],
...             [7,8,9]])

clear(self) → void

Clear replay buffer.

Set index and stored_size to 0.

Example

>>> rb = ReplayBuffer(5,{"done",{}})
>>> rb.add(1)
>>> rb.get_stored_size()
1
>>> rb.get_next_index()
1
>>> rb.clear()
>>> rb.get_stored_size()
0
>>> rb.get_next_index()
0

get_all_transitions(self, bool shuffle: bool = False)

Get all transitions stored in replay buffer.

Parameters:: shuffle (bool, optional) – When True, transitions are shuffled. The default value is False.
Returns:: transitions – All transitions stored in this replay buffer.
Return type:: dict of numpy.ndarray

get_buffer_size(self) → size_t

Get buffer size

Returns:: buffer size
Return type:: size_t

get_current_episode_len(self) → size_t

Get current episode length

Returns:: Current episode length
Return type:: size_t

get_next_index(self) → size_t

Get the next index to store

Returns:: the next index to store
Return type:: size_t

get_stored_size(self) → size_t

Get stored size

Returns:: stored size
Return type:: size_t

is_Nstep(self) → bool

Get whether use Nstep or not

Returns:: Whether Nstep is used
Return type:: bool

load_transitions(self, file)

Load transitions from file

Parameters:: file (str or file-like object) – File to read data
Raises:: ValueError – When file format is wrong.

Warning

In order to avoid security vulnerability, you must not load untrusted file, since this method is based on pickle.

on_episode_end(self) → void

Call on episode end

Finalize the current episode by moving remaining Nstep buffer transitions, evacuating overlapped data for memory compression features, and resetting episode length.

Notes

Calling this function at episode end is the user responsibility, since episode exploration can be terminated at certain length even though any done flags from environment is not set.

sample(self, batch_size)

Sample the stored transitions randomly with specified size

Parameters:: batch_size (int) – sampled batch size
Returns:: sample – Sampled batch transitions, which might contains the same transition multiple times.
Return type:: dict of ndarray

Examples

>>> rb = ReplayBuffer(1e+6, {"obs": {"shape": 3}})
>>> rb.add(obs=[1,2,3])
>>> rb.add(obs=[[1,2,3],[1,2,3]])
>>> rb.sample(4)
{'obs': array([[1., 2., 3.],
               [1., 2., 3.],
               [1., 2., 3.],
               [1., 2., 3.]], dtype=float32)}

save_transitions(self, file, *, safe=True)

Save transitions to file

Parameters:

file (str or file-like object) – File to write data
safe (bool, optional) – If False, we try more aggressive compression which might encounter future incompatibility.