-
Notifications
You must be signed in to change notification settings - Fork 97
Review Request : G. Viejo, B. Girard, M. Khamassi #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
EDITOR Thank you for the submission. I will soon assign two reviewers. In the meantime, could you edit your first comment such as to copy the structure of #3 ? And I think the link to the PDF you provided is wrong. |
|
EDITOR Julien Vitay @vitay has accepted to review your submission |
|
EDITOR Georgios Detorakis @gdetor has accepted to review your submission |
|
EDITOR @gviejo Before the actual review starts, you might consider cleaning your repo (you added some temporary files): |
|
AUTHOR @rougier |
|
REVIEWER 1 Article The article describes precisely what the reproduced model does, what were the main difficulties in reimplementing the model (missing parameter values, typos) and where the reproduction differs from the original. The authors justify sufficiently the small difference in probabilities of action. The authors only reproduce the first figure of the original figure, which is sufficient. If the authors find the time to implement the other tasks, or at least to hint what should be changed in order to reproduce them, it would be of great value. The manuscript has some formatting issues:
Name Description Value
------------------ ----------------------------------------- -------------------------
$\sigma$ Updating rate of the average reward 0.02
$\eta$ Variance of evolution noise 0.0001
$P_n$ Variance of observation noise 0.05
$\beta$ Rate of exploration 1.0
$\rho$ Update rate of the reward function 0.1
$\gamma$ Discount factor 0.95
$\tau$ Time step of graph exploration 0.08
depth Depth of search in graph exploration 3
$\phi$ Update rate of the transition function 0.5
init cov Initialisation of covariance matrice 1.0
$\kappa$ Unscentered transform parameters 0.1
------------------ ----------------------------------------- ------------------------- Notice the typo on "Initialisation" for init cov.
> **Initialization**
>> $Q(s, a)^{Goal-Directed} = \{ 0, \ldots \}$
>> $Q(s, a)^{Habitual} = \{ 0, \dots \}$
>> $\Sigma = \left(
\begin{array}{*4{c}}
cov \times \eta & 0 & \ldots & 0 \\
0 & cov \times \eta & \ldots & \vdots \\
\vdots & \ldots & \ddots & 0 \\
0 & \ldots & 0 & cov \times \eta \\
\end{array}\right)$
>> $R(S1, EM) = 1$
>> $\bar{R} = 0$
>> $\hat{R}(s,a) = \{0,\ldots\}$
> **Main Loop**
>> $\textbf{FOR}\ i = 1:T$
>>> $s_t = S_0$
>>> $\textbf{IF}\ i = T_{devaluation}$
>>>> $R(S1, EM) = 0$
>>>> $\hat{R}(S1, EM) = -1$
>>> $\textbf{WHILE}\ s_t \neq S1 \bigwedge a_t \neq EM$
>>>> $a_t = \textbf{Selection}(s_t)$
>>>> $r_t = R(s_t,a_t)$
>>>> $s_{t+1} = transition(s_t, a_t)$
>>>> $\textbf{Update}(s_t,a_t, s_{t+1}, r_t)$
> **Selection**
>> $\{a_1,\ldots,ai,\ldots\} \leftarrow sort(Q(s_t,a_i))$
>> $VPI(s_t, a_1) = (Q(s_t,a_2)^{H}-Q(s_t,a_1)^{H})P(Q(s_t,a_1)^{H}<Q(s_t,a_2)^{H})+ \frac{\sigma(s_t,a_t)}{\sqrt{2\pi}} e^{-\frac{(Q(s_t,a_2)^H - Q(s_t,a_1)^H)^2}{2\sigma(s_t,a_t)^2}}$
>> $VPI(s_t, a_i) = (Q(s_t,a_i)^{H}-Q(s_t,a_1)^{H})P(Q(s_t,a_i)^{H}>Q(s_t,a_1)^{H}) + \frac{\sigma(s_t,a_t)}{\sqrt{2\pi}} e^{-\frac{(Q(s_t,a_1)^H - Q(s_t,a_i)^H)^2}{2\sigma(s_t,a_t)^2}}$
>> $\textbf{FOR}\ i \in \{a_1, a_2,\ldots, a_i, \ldots\}$
>>> $\textbf{IF}\ VPI(s_t, a_i) \geq \tau \bar{R}$
>>>> $Q(s_t,a_i) = \hat{R}(s_t,a_i) + \gamma \sum\limits_{s'}p_{T}(\{s,a\}\rightarrow s') \max\limits_{b \in A} Q(s',b)^{Goal-directed}$
>>> $\textbf{ELSE}$
>>>> $Q(s_t,a_i) = Q(s_t,a_i)^{Habitual}$
>> $a_t \leftarrow \textit{SoftMax}(Q(s_t,a), \beta)$
> **Update**
>> $\bar{R} = (1-\sigma) \bar{R} + \sigma r_t$
>> $\hat{R}(s_t,a_t) =(1 - \rho) \hat{R} + \rho r_t$
>> $p_{T}(s_t, a_t, s_{t+1}) = (1 - \phi) p_{T}(s_t, a_t, s_{t+1}) + \phi$
>> Specific to Kalman Q-Learning
>> $\Theta = \{ \theta_j, 0 \geq j \geq 2|S.A|\}$
>> $\check{W} = \{ w_j, 0 \geq j \geq 2|S.A| \}$
>> $\check{R} = \{ \check{r}_j = \theta_j(s_t,a_t) - \gamma \max\limits_{b \in A} \theta_j(s_{t+1},b),\ 0 \geq j \geq 2|S.A|\}$
>> $r_{predicted} = \sum\limits_{j=0}^{2|S.A|} w_j \check{r}_j$
>> $P_{\theta_j \check{r}_j} = \sum\limits_{j=0}^{2|S.A|} w_j (\theta_j - Q^{Habitual}_t)(\check{r}_j - r_{predicted})$
>> $P_{\check{r}_j} = \sum\limits_{j=0}^{2|S.A|} w_j (\check{r}_j - r_{predicted})^2 + P_n$
>> $K_t = P_{\theta_j \check{r}_j} P_{\check{r}_j}^{-1}$
>> $\delta_t = r_t - r_{predicted}$
>> $Q_{t+1}^{Habitual} = Q_{t}^{H} + K_t \delta_t$
>> $P_{t+1}^{H} = P_{t}^{H} - K_t P_{\Sigma_t} K_t^T$
As the algorithm is quite complex, it would be good to add more comments inside the algorithm, such as: >> # Sort the Q-values in descending order
>> $\{a_1,\ldots,ai,\ldots\} \leftarrow sort(Q(s_t,a_i))$
Code
from __future__ import print_functionand calling the print function like this: print(exp, nb_trials, deval_time) # in run.py
print("Parameters not found") # in models.pyas well as using from __future__ import print_function
try:
import xrange as range # Overrides range in Python2, does nothing in Python 3
except Exception:
pass
import numpy as np
...
for i in range(nb_blocs):
for exp, nb_trials, deval_time in zip(['mod','ext'], [nb_iter_mod, nb_iter_ext], [deval_mod_time, deval_ext_time]):
...
for j in range(nb_trials):
...the code would run with both Python 2 and Python 3. I suggest to make this change, as the README specifies "at least python 2.7.3", what includes python 3.x.
def createQValuesDict(states, actions):
"""
Returns a dictionary containing the Q-values of all state-action pairs.
The keys are either:
* states, in which case the vaue is a list action indices.
* (state, action) tuples, in which case ...
* (state, index) tuples, in which case...
Parameters:
* states: list of states
* actions: list of actions
"""
values = {0:np.zeros(len(states)*len(actions))}
tmp = 0
for s in states:
values[s] = []
n = 0
for a in actions:
values[(s,a)] = tmp
values[s].append(tmp)
values[(s,n)] = a
tmp = tmp + 1
n = n + 1
return valuesEven inside a single method, it would be good for comprehension to add comments to know what is currently done, for example: def updateValue(self, reward):
# Retrieve the current state, action and reward
r = (reward==1)*1
s = self.state
a = self.action
# Compute the sigma points
sigma_points, weights = computeSigmaPoints(self.values[0], self.covariance['cov'], self.kappa)
# Predicted rewards
rewards_predicted = (sigma_points[:,self.values[(s,a)]]-self.gamma*np.max(sigma_points[:,self.values[s]], 1)).reshape(len(sigma_points), 1)
reward_predicted = np.dot(rewards_predicted.flatten(), weights.flatten())
# Covariance
cov_values_rewards = np.sum(weights*(sigma_points-self.values[0])*(rewards_predicted-reward_predicted), 0)
cov_rewards = np.sum(weights*(rewards_predicted-reward_predicted)**2) + self.var_obs
# Kalman gains
kalman_gain = cov_values_rewards/cov_rewards
# Update the value
self.values[0] = self.values[0] + kalman_gain*(reward-reward_predicted)
# Update the covariance
self.covariance['cov'][:,:] = self.covariance['cov'][:,:] - (kalman_gain.reshape(len(kalman_gain), 1)*cov_rewards)*kalman_gainThis may be redundant with the method names, but it allows a much faster comprehension of the code.
|
|
REVIEWER 2 ArticleThe text explains the main goal of this replication and provides all the necessary parameters and the algorithm of the implementation as well. The table is not so clear. It's better to put parameters on separate rows in the first column, in a second column the comments regarding each parameter and finally in a third column the arithmetic values of parameters. It would be nice if authors could explain (or speculate) why there is a slight difference in time steps and probabilities of action (different parameters, different implementation of Kalman Q-learning). Source codesThe code runs smoothly on a Lenovo Tinkpad X250 laptop - Linux Fedora 22. It consumes ~44 MB of physical memory and it takes ~274,0s to finish execution and plot the results. The results are qualitatively similar to the ones in the original article. Here are some comments regarding the source codes:
TyposIn the introduction: LicenseAuthors use a BSD license for their source codes, although they do not provide any LICENSE along with their sources. In addition, they have not used the proper statement of BSD within their *.py files (see https://opensource.org/licenses/BSD-3-Clause). |
|
EDITOR Dear @gviejo Could you address reviewer's comment ? |
|
AUTHOR I changed the repositiry according to the rewiever's comments except for the use of ConfigParser. I choosed to clean some functions related to parameters dictionnary that were not used. |
|
EDITOR I'm not sure either what @gdetor refers to exactly. I think you're supposed to include the full test, but a reference fo the (new) BSD license might be ok. You may need to add the all rights reserved though. @gdetor, @vitay Could have a look at changes to check if it ok for you ? |
|
REVIEWER 1 The changes are great, thanks. It is good to publish. |
|
REVIEWER 2 |
|
EDITOR @gviejo Congratulations, your submission has been accepted. Please do not modify your repository until further notice. |
|
This submission has been accepted for publication, and has been published at https://github.com/ReScience/ReScience/wiki/Current-Issue |
AUTHOR
Dear @ReScience/editors,
I request a review for the replication of the following paper:
Keramati, M., Dezfouli, A., & Piray, P. (2011). Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS computational biology, 7(5), e1002055.
I believed the original results have been faithfully replicated as explained in the accompanying article
The repository lives at https://github.com/gviejo/ReScience-submission/tree/VIEJO-GIRARD-KHAMASSI-2016
EDITOR
Jan 20, 2016Jan 20, 2016Jan 21, 2016Feb 9, 2016Feb 8, 2016Feb 9, 2016