Review Request : G. Viejo, B. Girard, M. Khamassi #14

gviejo · 2016-01-20T11:48:30Z

AUTHOR

Dear @ReScience/editors,

I request a review for the replication of the following paper:

Keramati, M., Dezfouli, A., & Piray, P. (2011). Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS computational biology, 7(5), e1002055.

I believed the original results have been faithfully replicated as explained in the accompanying article

The repository lives at https://github.com/gviejo/ReScience-submission/tree/VIEJO-GIRARD-KHAMASSI-2016

$ git clone https://github.com/gviejo/ReScience-submission.git
$ cd ReScience-submission
$ git checkout VIEJO-GIRARD-KHAMASSI-2016
$ cd code

EDITOR

Editor acknowledgement (@rougier) Jan 20, 2016
Reviewer 1 (@vitay) Jan 20, 2016
Reviewer 2 (@gdetor) Jan 21, 2016
Review 1 decision [accept] Feb 9, 2016
Review 2 decision [accept] Feb 8, 2016
Editor decision [accept] Feb 9, 2016

rougier · 2016-01-20T12:10:08Z

EDITOR

Thank you for the submission. I will soon assign two reviewers.

In the meantime, could you edit your first comment such as to copy the structure of #3 ?

And I think the link to the PDF you provided is wrong.

rougier · 2016-01-20T13:54:26Z

EDITOR

Julien Vitay @vitay has accepted to review your submission

rougier · 2016-01-21T09:53:15Z

EDITOR

Georgios Detorakis @gdetor has accepted to review your submission

rougier · 2016-01-23T06:45:41Z

EDITOR

@gviejo Before the actual review starts, you might consider cleaning your repo (you added some temporary files):

article/missfont.log
article/rescience-template.log
article/texput.log
article/viejo_girard_khamassi.bcf
article/viejo_girard_khamassi.log
article/viejo_girard_khamassi.out
article/viejo_girard_khamassi.run.xml

gviejo · 2016-02-03T15:48:02Z

AUTHOR

@rougier
The repository should be clean.

vitay · 2016-02-08T11:17:07Z

REVIEWER 1
Alert: @gviejo

Article

The article describes precisely what the reproduced model does, what were the main difficulties in reimplementing the model (missing parameter values, typos) and where the reproduction differs from the original. The authors justify sufficiently the small difference in probabilities of action.

The authors only reproduce the first figure of the original figure, which is sufficient. If the authors find the time to implement the other tasks, or at least to hint what should be changed in order to reproduce them, it would be of great value.

The manuscript has some formatting issues:

The parameters table is not correct. By suppressing the first dashed line, it looks much nicer on my computer, but it may depend on pandoc's version:

Name               Description                               Value
------------------ ----------------------------------------- -------------------------
$\sigma$           Updating rate of the average reward       0.02
$\eta$             Variance of evolution noise               0.0001
$P_n$              Variance of observation noise             0.05
$\beta$            Rate of exploration                       1.0
$\rho$             Update rate of the reward function        0.1
$\gamma$           Discount factor                           0.95
$\tau$             Time step of graph exploration            0.08
depth              Depth of search in graph exploration      3
$\phi$             Update rate of the transition function    0.5
init cov           Initialisation of covariance matrice       1.0
$\kappa$           Unscentered transform parameters          0.1
------------------ ----------------------------------------- -------------------------

Notice the typo on "Initialisation" for init cov.

The algorithm should not be indented that much to the right, it feels almost centered. Some equations are centered in the initialization phase, what impairs reading. Below is a modification that looks more consistent:

> **Initialization**

>> $Q(s, a)^{Goal-Directed} = \{ 0, \ldots \}$

>> $Q(s, a)^{Habitual} = \{ 0, \dots \}$

>> $\Sigma = \left(
    \begin{array}{*4{c}}
    cov \times \eta &  0 &  \ldots & 0 \\
     0 &  cov \times \eta &  \ldots & \vdots \\
    \vdots &  \ldots &  \ddots & 0 \\
     0 &  \ldots &  0 & cov \times \eta \\     
  \end{array}\right)$

>> $R(S1, EM) = 1$

>> $\bar{R} = 0$

>> $\hat{R}(s,a) = \{0,\ldots\}$

> **Main Loop**

>> $\textbf{FOR}\ i = 1:T$

>>> $s_t = S_0$

>>> $\textbf{IF}\ i = T_{devaluation}$

>>>> $R(S1, EM) = 0$

>>>> $\hat{R}(S1, EM) = -1$

>>> $\textbf{WHILE}\ s_t \neq S1 \bigwedge a_t \neq EM$

>>>> $a_t = \textbf{Selection}(s_t)$

>>>> $r_t = R(s_t,a_t)$

>>>> $s_{t+1} = transition(s_t, a_t)$

>>>> $\textbf{Update}(s_t,a_t, s_{t+1}, r_t)$

> **Selection** 

>> $\{a_1,\ldots,ai,\ldots\} \leftarrow sort(Q(s_t,a_i))$

>> $VPI(s_t, a_1) = (Q(s_t,a_2)^{H}-Q(s_t,a_1)^{H})P(Q(s_t,a_1)^{H}<Q(s_t,a_2)^{H})+ \frac{\sigma(s_t,a_t)}{\sqrt{2\pi}} e^{-\frac{(Q(s_t,a_2)^H - Q(s_t,a_1)^H)^2}{2\sigma(s_t,a_t)^2}}$

>> $VPI(s_t, a_i) = (Q(s_t,a_i)^{H}-Q(s_t,a_1)^{H})P(Q(s_t,a_i)^{H}>Q(s_t,a_1)^{H}) + \frac{\sigma(s_t,a_t)}{\sqrt{2\pi}} e^{-\frac{(Q(s_t,a_1)^H - Q(s_t,a_i)^H)^2}{2\sigma(s_t,a_t)^2}}$

>> $\textbf{FOR}\ i \in \{a_1, a_2,\ldots, a_i, \ldots\}$

>>> $\textbf{IF}\ VPI(s_t, a_i) \geq \tau \bar{R}$

>>>> $Q(s_t,a_i) = \hat{R}(s_t,a_i) + \gamma \sum\limits_{s'}p_{T}(\{s,a\}\rightarrow s') \max\limits_{b \in A} Q(s',b)^{Goal-directed}$

>>> $\textbf{ELSE}$

>>>> $Q(s_t,a_i) = Q(s_t,a_i)^{Habitual}$

>> $a_t \leftarrow \textit{SoftMax}(Q(s_t,a), \beta)$

> **Update**

>> $\bar{R} = (1-\sigma) \bar{R} + \sigma r_t$

>> $\hat{R}(s_t,a_t) =(1 - \rho) \hat{R} + \rho r_t$

>> $p_{T}(s_t, a_t, s_{t+1}) = (1 - \phi) p_{T}(s_t, a_t, s_{t+1}) + \phi$

>> Specific to Kalman Q-Learning

>> $\Theta = \{ \theta_j, 0 \geq j \geq 2|S.A|\}$

>> $\check{W} = \{ w_j, 0 \geq j \geq 2|S.A| \}$

>> $\check{R} = \{ \check{r}_j = \theta_j(s_t,a_t) - \gamma \max\limits_{b \in A} \theta_j(s_{t+1},b),\ 0 \geq j \geq 2|S.A|\}$

>> $r_{predicted} = \sum\limits_{j=0}^{2|S.A|} w_j \check{r}_j$

>> $P_{\theta_j \check{r}_j} = \sum\limits_{j=0}^{2|S.A|} w_j (\theta_j - Q^{Habitual}_t)(\check{r}_j - r_{predicted})$

>> $P_{\check{r}_j} = \sum\limits_{j=0}^{2|S.A|} w_j (\check{r}_j - r_{predicted})^2 + P_n$

>> $K_t = P_{\theta_j \check{r}_j} P_{\check{r}_j}^{-1}$

>> $\delta_t = r_t - r_{predicted}$

>> $Q_{t+1}^{Habitual} = Q_{t}^{H} + K_t \delta_t$

>> $P_{t+1}^{H} = P_{t}^{H} - K_t P_{\Sigma_t} K_t^T$

As the algorithm is quite complex, it would be good to add more comments inside the algorithm, such as:

>> # Sort the Q-values in descending order

>> $\{a_1,\ldots,ai,\ldots\} \leftarrow sort(Q(s_t,a_i))$

Code

The code runs correctly on Linux Mint 17.3 (python 2.7.6, numpy 1.8.2, matplotlib 1.3.1, scipy 0.13.3), with only the mentioned dependencies required. It reproduces the figure of the article as expected.
At the small cost of importing the future module at the beginning of run.py:

from __future__ import print_function

and calling the print function like this:

print(exp, nb_trials, deval_time) # in run.py
print("Parameters not found") # in models.py

as well as using range() instead of xrange():

from __future__ import print_function
try:
    import xrange as range # Overrides range in Python2, does nothing in Python 3
except Exception:
    pass
import numpy as np
...
for i in range(nb_blocs):
    for exp, nb_trials, deval_time in zip(['mod','ext'], [nb_iter_mod, nb_iter_ext], [deval_mod_time, deval_ext_time]):
        ...
        for j in range(nb_trials):
        ...

the code would run with both Python 2 and Python 3. I suggest to make this change, as the README specifies "at least python 2.7.3", what includes python 3.x.

run.py is well structured, with the parameters grouped and commented, before the main loop is actually performed. However, the two other files contain many methods which are not documented at all. The method names are sufficiently well chosen so that one understands what they are supposed to do. In order to allow users to adapt and expand your code, it would be great if you could add some docstrings to each method, even if it sounds dummy, as in:

def createQValuesDict(states, actions):
    """
    Returns a dictionary containing the Q-values of all state-action pairs.

    The keys are either:
    * states, in which case the vaue is a list action indices.
    * (state, action) tuples, in which case ...
    * (state, index) tuples, in which case...

    Parameters:
    * states: list of states
    * actions: list of actions
    """
    values = {0:np.zeros(len(states)*len(actions))}
    tmp = 0
    for s in states:
        values[s] = []
        n = 0
        for a in actions:
            values[(s,a)] = tmp
            values[s].append(tmp)
            values[(s,n)] = a
            tmp = tmp + 1
            n = n + 1
    return values

Even inside a single method, it would be good for comprehension to add comments to know what is currently done, for example:

    def updateValue(self, reward):
        # Retrieve the current state, action and reward
        r = (reward==1)*1        
        s = self.state
        a = self.action
        # Compute the sigma points
        sigma_points, weights = computeSigmaPoints(self.values[0], self.covariance['cov'], self.kappa)
        # Predicted rewards
        rewards_predicted = (sigma_points[:,self.values[(s,a)]]-self.gamma*np.max(sigma_points[:,self.values[s]], 1)).reshape(len(sigma_points), 1)
        reward_predicted = np.dot(rewards_predicted.flatten(), weights.flatten())
        # Covariance
        cov_values_rewards = np.sum(weights*(sigma_points-self.values[0])*(rewards_predicted-reward_predicted), 0)
        cov_rewards = np.sum(weights*(rewards_predicted-reward_predicted)**2) + self.var_obs
        # Kalman gains
        kalman_gain = cov_values_rewards/cov_rewards
        # Update the value
        self.values[0] = self.values[0] + kalman_gain*(reward-reward_predicted)
        # Update the covariance
        self.covariance['cov'][:,:] = self.covariance['cov'][:,:] - (kalman_gain.reshape(len(kalman_gain), 1)*cov_rewards)*kalman_gain

This may be redundant with the method names, but it allows a much faster comprehension of the code.

The code is released under the BSD licence (as defined by the header of the python files), but no LICENSE.txt file is present in the directory. Not fully sure, but I think you should add one.

gdetor · 2016-02-08T18:21:11Z

REVIEWER 2
Alert: @gviejo

Article

The text explains the main goal of this replication and provides all the necessary parameters and the algorithm of the implementation as well. The table is not so clear. It's better to put parameters on separate rows in the first column, in a second column the comments regarding each parameter and finally in a third column the arithmetic values of parameters. It would be nice if authors could explain (or speculate) why there is a slight difference in time steps and probabilities of action (different parameters, different implementation of Kalman Q-learning).

Source codes

The code runs smoothly on a Lenovo Tinkpad X250 laptop - Linux Fedora 22. It consumes ~44 MB of physical memory and it takes ~274,0s to finish execution and plot the results. The results are qualitatively similar to the ones in the original article.

Here are some comments regarding the source codes:

Use range instead of xrange
params dictionary in run.py is defined but never used.
run.py file is well structured and commented.
models.py and fonctions.py are well structured but there are not at all comments. Is good to comment at least the functions and the basic methods of your classes.
In models.py for the two classes you can use the configuration file parser (ConfigParser see: https://docs.python.org/2/library/configparser.html) for loading the parameters of the model.
In models.py, fonctions.py and run.py there are long lines of code. Is better to break them into multiple lines.

Typos

In the introduction:
"We replicate the results of the first task i.e. the devaluation experiment with two states
and two actions." Put a comma before i.e.

License

Authors use a BSD license for their source codes, although they do not provide any LICENSE along with their sources. In addition, they have not used the proper statement of BSD within their *.py files (see https://opensource.org/licenses/BSD-3-Clause).

rougier · 2016-02-08T18:46:30Z

EDITOR

Dear @gviejo

Could you address reviewer's comment ?
For the code license, have a look at http://rescience.github.io/write/, you'll find explanation for including a license in your code repository.

gviejo · 2016-02-09T16:24:15Z

AUTHOR

@rougier

I changed the repositiry according to the rewiever's comments except for the use of ConfigParser. I choosed to clean some functions related to parameters dictionnary that were not used.
I added LICENSE.md in the code repository but I am not sure to understand the comments about the LICENSE within the py files.

rougier · 2016-02-09T16:50:38Z

EDITOR

I'm not sure either what @gdetor refers to exactly. I think you're supposed to include the full test, but a reference fo the (new) BSD license might be ok. You may need to add the all rights reserved though.

# Copyright (c) 2015, Guillaume Viejo
# Distributed under the (new) BSD License.

# Copyright (c) 2015, Guillaume Viejo. All rights reserved.
# Distributed under the (new) BSD License.

@gdetor, @vitay Could have a look at changes to check if it ok for you ?

vitay · 2016-02-09T17:39:33Z

REVIEWER 1

The changes are great, thanks. It is good to publish.

gdetor · 2016-02-09T19:05:45Z

REVIEWER 2
Everything's fine. It can be published. Thank you.

rougier · 2016-02-09T19:42:49Z

EDITOR

@gviejo Congratulations, your submission has been accepted. Please do not modify your repository until further notice.

rougier · 2016-02-10T08:56:09Z

This submission has been accepted for publication, and has been published at https://github.com/ReScience/ReScience/wiki/Current-Issue

gviejo added 2 commits January 6, 2016 18:48

Version ready for submission

20ca2b3

First Submission for Viejo-Khamassi-Girard-2016

870c339

rougier added Life Science 01 - Request labels Jan 20, 2016

rougier self-assigned this Jan 20, 2016

rougier changed the title ~~Review Request~~ Review Request : G. Viejo, B. Girard, M. Khamassi Jan 20, 2016

rougier added the 02 - Review label Jan 21, 2016

Cleaning repository

58c406e

Update after first round of reviewing

b133022

rougier added the 03 - Accepted label Feb 9, 2016

rougier added the 04 - Published label Feb 10, 2016

ReScience locked and limited conversation to collaborators Feb 10, 2016

rougier closed this Feb 10, 2016

rougier added the Python label Jul 3, 2016

Review Request : G. Viejo, B. Girard, M. Khamassi #14

Review Request : G. Viejo, B. Girard, M. Khamassi #14

Uh oh!

Conversation

gviejo commented Jan 20, 2016

Uh oh!

rougier commented Jan 20, 2016

Uh oh!

rougier commented Jan 20, 2016

Uh oh!

rougier commented Jan 21, 2016

Uh oh!

rougier commented Jan 23, 2016

Uh oh!

gviejo commented Feb 3, 2016

Uh oh!

vitay commented Feb 8, 2016

Uh oh!

gdetor commented Feb 8, 2016

Article

Source codes

Typos

License

Uh oh!

rougier commented Feb 8, 2016

Uh oh!

gviejo commented Feb 9, 2016

Uh oh!

rougier commented Feb 9, 2016

Uh oh!

vitay commented Feb 9, 2016

Uh oh!

gdetor commented Feb 9, 2016

Uh oh!

rougier commented Feb 9, 2016

Uh oh!

rougier commented Feb 10, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants