-
Notifications
You must be signed in to change notification settings - Fork 97
Review Request: Vitay #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
EDITOR Thanks @vitay for the submission. I will soon assign two reviewers. |
|
EDITOR @neuronalX Could you review this submission ? |
|
REVIEWER 1 Yes I can! |
|
EDITOR Thanks @neuronalX, I will assign a second reviewer and review will start. |
|
EDITOR @vitay Sorry for the delay but because of the period (conference and/or vacation), a lot of reviewers are not available. I'm still searching and will keep you informed here. |
|
EDITOR @vitay The second reviewer will be @piero-le-fou. @neuronalX @piero-le-fou Thanks for having accepted, you can start the review now and post your comments in this thread. |
|
EDITOR @neuronalX @piero-le-fou Any update ? |
|
REVIEWER 2 So, I've been a bit unlucky with the simulations, but it has been my fault for one of them. I didn't read carefully the Readme file. Nonetheless, it might be helpful for others: check that you have the latest version of the required python packages (see Readme). In my case, the code would execute properly until the "linspace" function with a dtype argument which is at the end of the Fig2.py file. Other issue: the end of the Fig3.py script has missing import and variable conversion.
Other than that all the code is running perfectly!! |
|
AUTHOR Thanks for the report. I uploaded a fix for Fig3.py. The script took so long (3 days) that I ran the simulation only once without the plot, and never checked it worked as standalone... Also good to know that np.linspace(..., dtype=) is new, perhaps I should try to avoid using it to have a broader compatibility. |
|
EDITOR @neuronalX Any update on your side ? |
|
Dear Julien Vitay, Because the code is well written, I was able to go through it and understand most of it. There is a few things I was not able to understand, thus I have some little remarks on the code. I tried to organise it by file. (I will give comments on the paper later.) RecurrentNetwork.pyDoes it need to be in a list in which you append the 2950 arrays/scalars (for Fig.1) one at a time?Or can you use a numpy array instead? (may be more efficient for long simulations) Why is the index "t" not at the same place in the two use of "trajectory" and the use of "stimulus"?Could you please give more explanations about the dimensions/shape of "error" in the code (and paper) when you give this line of code:Moreover, could you please explain why you use 0 indices in all the following lines Optional remark: could you use the more explicit "_" instead of "dummy"?Fig1.pyNames of variables do not seem consistent for the output of net.simulate()Should "trajectory" and "pretraining" be understood as "initial_trajectory" and "(pre)trained_trajectory" resp.? Fig2.pyGot a warning at lines 74 and 77This is because "t_perturbation" is a float and not an int Optional: could you inscrease the compatibility of np.linspace() use?Do you really need to use "dtype=np.int32"? Because without this it should be compatible with prior versions of numpy. Optional: could you use an exception (for data loading)?It would be better for end-users. Otherwise the error one gets is not explicit (i.e. concerning the next line). (NB: I am not sure for which version of Python this code is working) Fig3.pyConcerning the time of simulation ("3 days"), did you try to change the numpy arrays from dtype "float64" to dtype "float32"?(Because 3 days is a bit long for one that want to try the code, so it could be interesting to improve it.) Moreover (but I may have not understood it well), you may also win time by using the same "10 initial networks" (like in the original experiment) and by training the recurrent connections only once (because retraining the readout can be done for different delays with the same recurrent dynamics). Thank you in advance for your answers! |
|
EDITOR @vitay Can you address the comments ? |
|
Thanks a lot for the very useful comments. I pushed the corresponding changes. RecurrentNetwork.py
I added a comment in the article on this.
Fig1.py The variable names have been made more consistent: Fig2.py
Fig3.py
|
|
Thanks @vitay @neuronalX @piero-le-fou Are you satisfied with the changes ? |
|
I would like to add a few remarks. But before, I should mention these two typos at the beginning of the paper:
First, I want to emphasise that the author did a great job at reproducing the original paper's results, and that the additional comments from the author on the details of the implementation are quite useful and even interesting. In addition, the code provided by the author is substantially more readable that the original Matlab code. The only question I have is regarding the reproduction of figure 2. It seems that, in the original paper, each instance of perturbation injected in the network elicited temporarily divergent trajectories. In other words, in each simulation the effect of the perturbation was different. However, it doesn't seem to be the case in the reproduced figure, and I noticed in the code that the same connectivity and impulse are used for each simulation. Was it a different impulse input connectivity for each simulation in the original code? If not, how would you explain this difference. Although, I do not doubt that, whatever the impulse conditions, the perturbed trajectories would go back to its attracting course, I suppose that one of the goal of the figure was to demonstrate that with different perturbations of a given amplitude, the trajectory always ultimately converges back within a reasonable amount of time. Of minor importance, but since it was brought up by the other reviewer, a more elegant way of discarding irrelevant outputs of a function in python is to address outputs them directly from the function, e.g. out2 = f(arg)[1] Thank you |
|
Thanks for the comments. I uploaded the manuscript with the missing words. For the perturbation in Fig. 2, I am not fully sure: the original code does not provide the complete script for the figure, it is a GUI where the user can click during the trajectory to perturb the neurons. In the text is only stated:
So I think it should be the same perturbation every time, as it is only the weights which are random. I just realize I used two different perturbation neurons (one for each word), while the text implies only one, but that should not make a big difference. Perhaps Laje and Buonomano indeed used a different perturbation every time (with different weights), that would explain the higher variance. The perturbation is rather short (10 ms), so it should not have such a dramatic effect on the trajectories if the perturbation is deterministic. |
|
@rougier |
|
@piero-le-fou Ok, thanks for the review. |
|
@neuronalX Any update on your review ? |
|
Thank you for the answers, the modifications in the code and the interesting answers. Hereafter, I have some comments on the paper. Strongly / WeaklyStrongly or Weakly connected?In the beginning of the paper, you talk about "weakly/sparsely connected" (line 3 of Introduction and line 2 of Methods), and also about "strongly" (l.4 Intro.)
And then, you state:
Are the network used by Laje and Buonomano strongly or sparsly connected? Strongly / Weakly connected vs. Chaotic / DeterministicYou seem to state that weakly connected RNN are deterministic, and strongly connected ones are chaotic. You should state clearly the references for such statements, because from my understanding of reservoir computing these chaotic/deterministic properties do not come from the sparness of the network, but is for instance more related to the "effective spectral radius" [1] of the weight matrix (in the absence of output feedback). Moreover, the order of statements could be interpreted as comparing chaotic and deterministic as antagonists: a deterministic network can be chaotic. [1] Jaeger Herbert, Mantas Lukoševičius, Dan Popovici, and Udo Siewert (2007) Optimization and Applications of Echo State Networks with Leaky-Integrator Neurons. Neural Networks 20(3): 335–52. "supervised learning (gradient descent-like)"Could you also please give a reference for this statement? Because one particularity of Reservoir Computing (RC) over "more classical" learning schemes is the opportunity to do a one-shot learning of the weights by linear or ridge regression. Thus, I do not think that "gradient descent-like" is a good summarizing qualifier for RC. |
|
Strongly or sparsely connected? Actually both: the connection matrix is sparse (connection probability of 10%) but the weights, when they exist, are strong (gain g>1.5). I called the second aspect "strongly connected", but maybe a more correct term would be "high-gain regime" as in the original article. Chaotic/deterministic The understanding I had (I am not an expert) was based on citations of this paper: Sompolinsky, Crisanti, and Sommers (1998). Chaos in Random Neural Networks. Phys. Rev. Lett. 61, 259. doi:10.1103/PhysRevLett.61.259 For example, in Sussilo and Abbott 2009, they state:
So I inferred the chaotic behavior of a recurrent network depends mostly on the gain of the connections (g>1.5). This is probably a link between this gain and the effective spectral radius of Jaeger et al. (2007), but I am not sure which one. supervised learning (gradient descent-like) My bad, I thought only iterative rules like RLS were used. For all these reasons, I changed the introduction into:
I hope it is less confusing that way. |
|
@vitay Thank you very much for the precisions. I know understand much better what you mean. Of course the strength of the connections (which is related to the spectral radius) influence the chaotic/non-chaotic regime. |
|
@neuronalX Do you have any other comments or are you happy with the proposed changes ? |
|
@neuronalX Can you tell me if you accept/reject or intend to upload new comments ? |
|
Yes, I will have more comments. |
|
Ok. Let's give us two more weeks, does that sound good ? |
|
I hope to finish this week. Le 27.09.2016 à 12:54, Nicolas P. Rougier a écrit :
|
|
Hello, I am very sorry for the long delay. 1st equation of the networkPlease define Input matrix
|
|
Thanks @neuronalX. Provided @vitay makes recommend corrections, would you recommend acceptance ? |
|
@rougier Sure! |
|
Thanks, I have pushed the desired modifications. 1st equation of the network The sentence is now:
Input matrix Win Win is indeed not scaled, as they use only 1 to 4 input neurons in the experiments, which are even not activated at the same time (the impulses occur at different times). I added the following sentence:
I_0 & chaotic behavior I am not sure about this: I ran the simulations without noise and the trajectories were always deterministic in the durations considered here. Even with this amount of noise, trajectories diverge only after 1s. Perhaps that if you simulate long enough you would see some divergence without noise as rounding errors accumulate. It seems like the impulses bring the network into a highly deterministic state which "survives" to the chaotic nature of the network. I simply modified the sentence to:
typo Wout Fixed. error[i, 0] That is because I use (N, 1) shapes for vectors instead of (N,). Adressing error[i] returns an arrays of shape (1,), which would work in this case, but is not that clean. I extended the decription of the error vector:
g This is the value taken by Laje and Buonomano in the original article. With g=1.5 the network is less chaotic and it takes less training trials to "tame" that chaos. I added the following comment:
exact function I have not systematically studied all possible functions to make a strong claim, but my guess is that as long as the dynamics of the reservoir are complex enough, the read-out neurons can learn virtually any function (except perhaps at high frequencies). Here we focus on learning the recurrent weights (i.e. to exhibit stabe trajectories long enough), not the readout ones, so any function of the same duration could have been used (e.g. flat). The gaussian bump after a delay is primarily there to attract time perception researchers. I have extended a bit that comment:
|
|
Congratulations @vitay, your submission has been accepted and will be soon published. |
|
@vitay Can you provide a list of keywords ? |
|
For the model: recurrent neural networks, reservoir computing, dynamical systems, learning, chaos Perhaps also the Python/Numpy keywords, I do not know if they are implicit. |
|
Publication has been published and will appear soon on https://rescience.github.io/read |
Julien Vitay
Dear @ReScience/editors,
I request a review for the reproduction of the following paper:
I believe the original results have been faithfully reproduced as explained in the accompanying article.
The repository lives at https://github.com/vitay/ReScience-submission/tree/vitay
Best regards,
Julien Vitay
EDITOR
01 July 2016)01 July 2016)11 July 2016)07 October 2016]29 August 2016]07 October 2016]