<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.2">Jekyll</generator><link href="https://b-mu.github.io//feed.xml" rel="self" type="application/atom+xml" /><link href="https://b-mu.github.io//" rel="alternate" type="text/html" /><updated>2022-07-23T21:47:14+00:00</updated><id>https://b-mu.github.io//feed.xml</id><title type="html">Baorun (Lauren) Mu</title><subtitle>&quot;Study hard what interests you the most in the most undisciplined, irreverent and original manner possible.&quot; -- Richard Feynman</subtitle><entry><title type="html">Mount a remote file system on Mac via sshfs</title><link href="https://b-mu.github.io//jekyll/update/2022/07/08/mac-sshfs.html" rel="alternate" type="text/html" title="Mount a remote file system on Mac via sshfs" /><published>2022-07-08T04:21:00+00:00</published><updated>2022-07-08T04:21:00+00:00</updated><id>https://b-mu.github.io//jekyll/update/2022/07/08/mac-sshfs</id><content type="html" xml:base="https://b-mu.github.io//jekyll/update/2022/07/08/mac-sshfs.html">&lt;ol&gt;
  &lt;li&gt;Install &lt;a href=&quot;https://www.macports.org/install.php&quot;&gt;MacPorts&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;e.g. for Montery v12, simply download the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.pkg&lt;/code&gt; and install&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Install &lt;a href=&quot;https://ports.macports.org/port/sshfs/&quot;&gt;sshfs&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sudo port install sshfs&lt;/code&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Install &lt;a href=&quot;https://osxfuse.github.io/&quot;&gt;macFuse&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;need to allow system software from Benjamin Fleischer in System Preferences to use this extension&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Create a directory and mount the remote file system
    &lt;ul&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mkidr local_fs&lt;/code&gt;&lt;/li&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sshfs $USERNAME@HOST:DIR local_fs&lt;/code&gt;&lt;/li&gt;
      &lt;li&gt;if the server requires key authentication, use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sshfs $USERNAME@HOST:DIR local_fs -o IdentityFile=PATH_TO_KEY&lt;/code&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;(note) If the connection is broken, the local directory that the remote system is mounted might show as busy but cannot be read or written to. One can unmount the fs by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;umount -f local_fs&lt;/code&gt; and mount at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;local_fs&lt;/code&gt; again (as in step 3)&lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id=&quot;reference&quot;&gt;Reference&lt;/h1&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://stackoverflow.com/questions/37458814/how-to-open-remote-files-in-sublime-text-3&quot;&gt;Stackoverflow: How to open remote files in sublime text 3&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://stackoverflow.com/questions/14057830/unmount-the-directory-which-is-mounted-by-sshfs-in-mac&quot;&gt;Stackoverflow: Unmount the directory which is mounted by sshfs in Mac&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://askubuntu.com/questions/777116/sshfs-giving-remote-host-has-disconnected&quot;&gt;Ask Ubuntu: sshfs giving “remote host has disconnected”&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">Install MacPorts e.g. for Montery v12, simply download the .pkg and install Install sshfs sudo port install sshfs Install macFuse need to allow system software from Benjamin Fleischer in System Preferences to use this extension Create a directory and mount the remote file system mkidr local_fs sshfs $USERNAME@HOST:DIR local_fs if the server requires key authentication, use sshfs $USERNAME@HOST:DIR local_fs -o IdentityFile=PATH_TO_KEY (note) If the connection is broken, the local directory that the remote system is mounted might show as busy but cannot be read or written to. One can unmount the fs by umount -f local_fs and mount at local_fs again (as in step 3)</summary></entry><entry><title type="html">Get Started with SciNet Mist</title><link href="https://b-mu.github.io//jekyll/update/2022/03/02/scinet-mist-get-started.html" rel="alternate" type="text/html" title="Get Started with SciNet Mist" /><published>2022-03-02T04:21:00+00:00</published><updated>2022-03-02T04:21:00+00:00</updated><id>https://b-mu.github.io//jekyll/update/2022/03/02/scinet-mist-get-started</id><content type="html" xml:base="https://b-mu.github.io//jekyll/update/2022/03/02/scinet-mist-get-started.html">&lt;p&gt;This is a tutorial for those who are new to Mist, a GPU cluster in the SciNet supercomputer center.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Register a &lt;a href=&quot;https://ccdb.computecanada.ca/&quot;&gt;Compute Canada Database (CCDB)&lt;/a&gt; account
    &lt;ul&gt;
      &lt;li&gt;there are multiple roles for an account, for student researcher to be added to a group, need a sponsor code&lt;/li&gt;
      &lt;li&gt;it takes a few hour for the account to be activated by consortium&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Request access to Niagara and Mist on &lt;a href=&quot;https://ccdb.computecanada.ca/services/opt_in?&quot;&gt;Compute Canada&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;takes hours, will be notified by email&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Set up ssh public/private key:
    &lt;ul&gt;
      &lt;li&gt;generate key via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ssh-keygen -t $TYPE -f $KEY_NAME&lt;/code&gt;, where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$TYPE&lt;/code&gt; could be any of these preferred public key algorithm: rsa, dsa, ecdsa, ed25519&lt;/li&gt;
      &lt;li&gt;copy public key (i.e the file with extension .pub) to Compute Canada’s webpage, select MyAccount -&amp;gt; Manage SSH Keys&lt;/li&gt;
      &lt;li&gt;give the key a name, then click add key&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;ssh to Mist
    &lt;ul&gt;
      &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ssh -i $USERNAME -Y $USERNAME@mist.scinet.utoronto.ca&lt;/code&gt;, where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$USERNAME&lt;/code&gt; is that of the CCDB account registered in step 1&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Load/Install software modules
    &lt;ul&gt;
      &lt;li&gt;load anaconda: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;module load anaconda3&lt;/code&gt;&lt;/li&gt;
      &lt;li&gt;create a virtual environment: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;conda create -n $ENV_NAME python=$PYTHON_VERSION&lt;/code&gt;&lt;/li&gt;
      &lt;li&gt;install any requirements in IBM Open-CE Conda Channel:
        &lt;ul&gt;
          &lt;li&gt;e.g. PyTorch, CUDAToolkit: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;conda install -c /scinet/mist/ibm/open-ce pytorch=1.10.2 cudatoolkit=11.2&lt;/code&gt;&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;(Heads up) Large dataset like ImageNet can exhaust disk quota in personal directory under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/home($HOME)&lt;/code&gt;, it should be downloaded and stored in personal directory under  &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/scratch($SCRATCH)&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;(Optional, but recommended) Request a debugjob via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;debugjob --clean -g $NUM_GPUS&lt;/code&gt; and test the code on a small scale experiment first. This command gives an interactive session equivalent to 1 full hour compute&lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id=&quot;reference&quot;&gt;Reference&lt;/h1&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.computecanada.ca/wiki/Compute_Canada_Documentation&quot;&gt;Compute Canada Documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.scinethpc.ca/getting-a-scinet-account/&quot;&gt;Getting Access to SciNet Systems and Services&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.ssh.com/academy/ssh/keygen&quot;&gt;Keygen (SSH Academy)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.scinet.utoronto.ca/index.php/Mist&quot;&gt;Mist Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">This is a tutorial for those who are new to Mist, a GPU cluster in the SciNet supercomputer center.</summary></entry><entry><title type="html">Multi-Node Distributed Neural Network Training on AWS</title><link href="https://b-mu.github.io//jekyll/update/2022/02/13/aws-multinode.html" rel="alternate" type="text/html" title="Multi-Node Distributed Neural Network Training on AWS" /><published>2022-02-13T04:21:00+00:00</published><updated>2022-02-13T04:21:00+00:00</updated><id>https://b-mu.github.io//jekyll/update/2022/02/13/aws-multinode</id><content type="html" xml:base="https://b-mu.github.io//jekyll/update/2022/02/13/aws-multinode.html">&lt;p&gt;note: the following tutorial will use an example with two p3.16 instance (16 V100 GPUs on 2 nodes), but it is easy to generalize to more instances (if AWS’s use limit and capacity availability agrees)&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Launch multiple instances at once in AWS management console
    &lt;ul&gt;
      &lt;li&gt;click &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Launch Instances&lt;/code&gt; button&lt;/li&gt;
      &lt;li&gt;change the number of instances to 2 in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;3. Config Instances&lt;/code&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Allow all traffic (by changing inbound/outbound rules) in the security group with all instances&lt;/li&gt;
  &lt;li&gt;ssh to instances (in different terminals)
    &lt;ul&gt;
      &lt;li&gt;run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ssh -i {PRIVATE KEY} {INSTANCE NAME}@{PUBLIC IPv4 ADDRESS}&lt;/code&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Config network interface for every node
    &lt;ul&gt;
      &lt;li&gt;run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ifconfig&lt;/code&gt; to find the name (e.g. ens3)&lt;/li&gt;
      &lt;li&gt;run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;export NCCL_SOCKET_IFNAME=ens3&lt;/code&gt; to set the environment variable&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Add multi-node support in python script
    &lt;ul&gt;
      &lt;li&gt;in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;init_process_group&lt;/code&gt;, we should provide:
        &lt;ul&gt;
          &lt;li&gt;url: one can use the private IPv4 address of node 0 and the port name, e.g. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tcp://172.31.22.234:23456&lt;/code&gt;
            &lt;ul&gt;
              &lt;li&gt;this is one of the initialization method (other option e.g. shared file system, but need more configurations)&lt;/li&gt;
            &lt;/ul&gt;
          &lt;/li&gt;
          &lt;li&gt;node index (e.g. 0 and 1 with two nodes)&lt;/li&gt;
          &lt;li&gt;number of processors per node: to calculate the global rank of a GPU&lt;/li&gt;
          &lt;li&gt;world size: total number of GPUs to be used in all nodes&lt;/li&gt;
          &lt;li&gt;global rank: calculated from the above parameters and local rank of a GPU&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;note: local rank of a GPU will be automatically set by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pytorch.distributed.launch&lt;/code&gt; (see how to use it in the next step)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pytorch.distributed.launch&lt;/code&gt; to run training script in distributed setting
    &lt;ul&gt;
      &lt;li&gt;run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 {PYTHON_FILE_NAME} {ARGS OF PYTHON SCRIPT}&lt;/code&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Go fast girl!&lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id=&quot;reference&quot;&gt;Reference&lt;/h1&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/inkawhich/pt-distributed-tutorial/blob/master/pytorch-aws-distributed-tutorial.py&quot;&gt;pt-distributed-tutorial by Nathan Inkawhich&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">note: the following tutorial will use an example with two p3.16 instance (16 V100 GPUs on 2 nodes), but it is easy to generalize to more instances (if AWS’s use limit and capacity availability agrees)</summary></entry><entry><title type="html">Download ImageNet-1k Dataset on AWS</title><link href="https://b-mu.github.io//jekyll/update/2022/02/12/aws-imagenet.html" rel="alternate" type="text/html" title="Download ImageNet-1k Dataset on AWS" /><published>2022-02-12T04:21:00+00:00</published><updated>2022-02-12T04:21:00+00:00</updated><id>https://b-mu.github.io//jekyll/update/2022/02/12/aws-imagenet</id><content type="html" xml:base="https://b-mu.github.io//jekyll/update/2022/02/12/aws-imagenet.html">&lt;ol&gt;
  &lt;li&gt;Launch a t2.large instance&lt;/li&gt;
  &lt;li&gt;Create a 1000G GP2 volume on AWS (has to be in the same zone with the instance)&lt;/li&gt;
  &lt;li&gt;Attach the volume created in step 1 to the instance&lt;/li&gt;
  &lt;li&gt;Format the volume and mount in the system
    &lt;ul&gt;
      &lt;li&gt;run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lsblk&lt;/code&gt; to check the device name of the volume created in step 1
        &lt;ul&gt;
          &lt;li&gt;e.g. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/dev/xvdf&lt;/code&gt;&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;format the volume: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sudo mkfs -t ext4 $DEVICE_NAME&lt;/code&gt;&lt;/li&gt;
      &lt;li&gt;mount the volume:
        &lt;ul&gt;
          &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sudo mkdir imagenet&lt;/code&gt;&lt;/li&gt;
          &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sudo mount $DEVICE_NAME imagenet&lt;/code&gt; where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DEVICE_NAME&lt;/code&gt; should be replaced by that found above&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Sign up on &lt;a href=&quot;https://image-net.org/index.php&quot;&gt;image-net.org&lt;/a&gt; (if one does not have an account)&lt;/li&gt;
  &lt;li&gt;Download &lt;a href=&quot;https://image-net.org/challenges/LSVRC/2012/2012-downloads.php&quot;&gt;ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012)
&lt;/a&gt;
    &lt;ul&gt;
      &lt;li&gt;Training data
        &lt;ul&gt;
          &lt;li&gt;right click Training images (Task 1 &amp;amp; 2) on ImageNet’s website and copy link address&lt;/li&gt;
          &lt;li&gt;run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sudo nohup wget $LINK&lt;/code&gt; where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$Link&lt;/code&gt; should be replaced by the copied link address from website (takes hours!)&lt;/li&gt;
          &lt;li&gt;extract: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tar -xf ILSVRC2012_img_train.tar&lt;/code&gt;&lt;/li&gt;
          &lt;li&gt;create directories &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;find . -name &quot;*.tar&quot; | while read NAME ; do mkdir -p &quot;${NAME%.tar}&quot;; tar -xvf &quot;${NAME}&quot; -C &quot;${NAME%.tar}&quot;; rm -f &quot;${NAME}&quot;; done&lt;/code&gt;&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;Validation data
        &lt;ul&gt;
          &lt;li&gt;right click Validation images (all tasks) on ImageNet’s website and copy link address&lt;/li&gt;
          &lt;li&gt;run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sudo nohup wget $LINK&lt;/code&gt; where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$Link&lt;/code&gt; should be replaced by the copied link address from website&lt;/li&gt;
          &lt;li&gt;download &lt;a href=&quot;https://github.com/juliensimon/aws/blob/master/mxnet/imagenet/build_validation_tree.sh&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;build_validation_tree.sh&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
          &lt;li&gt;create directories by running &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sh build_validation_tree.sh&lt;/code&gt;&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;note:
        &lt;ul&gt;
          &lt;li&gt;both training and validation dataset should have 1000 directories in total, from n01440764 to n15075141&lt;/li&gt;
          &lt;li&gt;make sure only directories with training images are in the “train” and “validation” directories, otherwise, the target can be wrong at training&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Create a snapshot from the volume that contains the dataset in AWS EC2 management console&lt;/li&gt;
  &lt;li&gt;Terminate the instance used to download the dataset. Launch a new instance for training. Attach the volume to the new instance for training and mount the volume as in step 3.&lt;/li&gt;
  &lt;li&gt;Load the dataset to the training program with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torchvision.datasets.ImageFolder&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Start training!&lt;/li&gt;
&lt;/ol&gt;

&lt;h1 id=&quot;reference&quot;&gt;Reference&lt;/h1&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://housekdk.gitbook.io/ml/ml/cv/imagenet-horovod&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[Hands-on]&lt;/code&gt; Fast Training ImageNet on on-demand EC2 GPU instances with Horovod&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://blog.slavv.com/learning-machine-learning-on-the-cheap-persistent-aws-spot-instances-668e7294b6d8&quot;&gt;Learning Machine Learning on the cheap: Persistent AWS Spot Instances&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">Launch a t2.large instance Create a 1000G GP2 volume on AWS (has to be in the same zone with the instance) Attach the volume created in step 1 to the instance Format the volume and mount in the system run lsblk to check the device name of the volume created in step 1 e.g. /dev/xvdf format the volume: sudo mkfs -t ext4 $DEVICE_NAME mount the volume: sudo mkdir imagenet sudo mount $DEVICE_NAME imagenet where DEVICE_NAME should be replaced by that found above Sign up on image-net.org (if one does not have an account) Download ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) Training data right click Training images (Task 1 &amp;amp; 2) on ImageNet’s website and copy link address run sudo nohup wget $LINK where $Link should be replaced by the copied link address from website (takes hours!) extract: tar -xf ILSVRC2012_img_train.tar create directories find . -name &quot;*.tar&quot; | while read NAME ; do mkdir -p &quot;${NAME%.tar}&quot;; tar -xvf &quot;${NAME}&quot; -C &quot;${NAME%.tar}&quot;; rm -f &quot;${NAME}&quot;; done Validation data right click Validation images (all tasks) on ImageNet’s website and copy link address run sudo nohup wget $LINK where $Link should be replaced by the copied link address from website download build_validation_tree.sh create directories by running sh build_validation_tree.sh note: both training and validation dataset should have 1000 directories in total, from n01440764 to n15075141 make sure only directories with training images are in the “train” and “validation” directories, otherwise, the target can be wrong at training Create a snapshot from the volume that contains the dataset in AWS EC2 management console Terminate the instance used to download the dataset. Launch a new instance for training. Attach the volume to the new instance for training and mount the volume as in step 3. Load the dataset to the training program with torchvision.datasets.ImageFolder Start training!</summary></entry><entry><title type="html">Truncated Backpropagation for Bilevel Optimization</title><link href="https://b-mu.github.io//jekyll/update/2021/12/24/tbp-for-bo.html" rel="alternate" type="text/html" title="Truncated Backpropagation for Bilevel Optimization" /><published>2021-12-24T04:21:00+00:00</published><updated>2021-12-24T04:21:00+00:00</updated><id>https://b-mu.github.io//jekyll/update/2021/12/24/tbp-for-bo</id><content type="html" xml:base="https://b-mu.github.io//jekyll/update/2021/12/24/tbp-for-bo.html">&lt;p&gt;This post is a paper-reading note. Paper: &lt;a href=&quot;https://arxiv.org/abs/1810.10667&quot;&gt;Truncated Backpropagation for Bilevel Optimization, Amirreza Shaban &amp;amp; Ching-An Cheng et al. AISTATS 2018&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;bilevel-optimization-bo&quot;&gt;Bilevel Optimization (BO)&lt;/h1&gt;
&lt;ul&gt;
  &lt;li&gt;one application of BO in Machine Learning is &lt;strong&gt;Hyperparamter Optimization&lt;/strong&gt; (HO)
\[ \text{min}_{\lambda} f(\hat{w}, \lambda) \text{ s.t. } \hat{w} \approx \text{argmin}_w g(w, \lambda) \]
    &lt;ul&gt;
      &lt;li&gt;denote parameter and hyperparameter by $w$ and $\lambda$, the outer and inner objective function by $f$ (validation loss function) and $g$ (training loss function)&lt;/li&gt;
      &lt;li&gt;for this setup, the outer problem (minimizing the validation loss) does not depend on the hyperparameter $\lambda$, so the direct gradient $\frac{\partial \mathcal{f}}{\partial \lambda} = 0$&lt;/li&gt;
      &lt;li&gt;note: we follow &lt;a href=&quot;https://arxiv.org/abs/1810.10667&quot;&gt;Shaban et al. 2018&lt;/a&gt; that includes the algorithm to solver the inner objective as part of the problem’s formulation. Thus, we use $\hat{w}$ to denote the solution given by the solver instead of the exact minimizer $w^{*}$&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;challenges:
    &lt;ul&gt;
      &lt;li&gt;dependency of the optimization of $\lambda$ on the inner problem is complicated, so that evaluating &lt;strong&gt;exact gradients&lt;/strong&gt; is not scalable for high-dimensional problems&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;prior works:
    &lt;ul&gt;
      &lt;li&gt;other approaches for HO: Grid Search (black box, run the training procedure many times), Random Search, Bayesian Optimization, hypernetwork (&lt;a href=&quot;https://arxiv.org/abs/1903.03088&quot;&gt;MacKay &amp;amp; Vicol et al. 2019&lt;/a&gt;, &lt;a href=&quot;https://arxiv.org/abs/2010.13514&quot;&gt;Bae et al. 2020&lt;/a&gt;)&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;Implicit Differentiation&lt;/strong&gt;: rely on estimate of Jacobian $J_{\lambda} \hat{w} = \frac{\partial \hat{w}}{\partial \lambda}$
        &lt;ul&gt;
          &lt;li&gt;note: this method relies on the assumption that $\hat{w} = w^{*}$, i.e. optimality of the approximate solution&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;Dynamical System perspective&lt;/strong&gt;: treat the iterative algorithm to solve the inner problem as a dynamical system and apply backpropagation through the system&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;truncated-backpropagation-for-bilevel-optimization&quot;&gt;Truncated Backpropagation for Bilevel Optimization&lt;/h1&gt;
&lt;ul&gt;
  &lt;li&gt;idea: &lt;strong&gt;approximate gradients&lt;/strong&gt; using Truncated Backpropagation (TBP) through “time” (as in a previous post, TBP reduces time and space complexities by removing long term dependencies)
    &lt;ul&gt;
      &lt;li&gt;note: here “time” refers to the optimization steps performed to solve the inner problem&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;hypergradient&lt;/strong&gt;
    &lt;ul&gt;
      &lt;li&gt;following the Dynamic System perspective, the iterative optimization algorithm that solves inner problem is viewed as a dynamical system: denote the transition function by $\Xi_t$ and the number of iterations by $T$
  \[ w_{t + 1} =  \Xi_{t + 1} (w_t, \lambda), ~~~ \hat{w_0} = \Xi_{\lambda} \text{ at } t = 0, ~~~ \hat{w} = w_T\]&lt;/li&gt;
      &lt;li&gt;unrolling the computation graph:
  \[ \frac{df}{d\lambda} = \frac{\partial f}{\partial \lambda} + \sum_{t = 0}^T B_t A_{t + 1} … A_T \frac{\partial f}{\partial \hat{w}} ~~~ \text{# by the Chain Rule}\]
  \[ \text{where } A_{t + 1} = \frac{\partial \Xi_{t + 1}(w_t, \lambda)}{\partial w_t} = \frac{\partial w_{t + 1}}{\partial w_t}, B_{t + 1} = \frac{\partial \Xi_{t + 1}(w_t, \lambda)}{\partial \lambda} = \frac{\partial w_{t + 1}}{\partial \lambda} \text{ for } t \geq 0, B_0 = \frac{d \Xi_{0} (\lambda)}{d \lambda}\]&lt;/li&gt;
      &lt;li&gt;dimensions:
        &lt;ul&gt;
          &lt;li&gt;denote $w_t \in \mathbb{R}^M$ and $\lambda \in \mathbb{R}^N$, then $A_t \in \mathbb{R}^{M \times M}$, $B_t \in \mathbb{R}^{N \times M}$&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;reverse mode differentiation&lt;/strong&gt; (RMD):
  \[ \frac{d f}{d \lambda} = h_{-1}, ~~~ \alpha_T = \frac{\partial f}{\partial \hat{w}}, ~~~ h_T = \frac{\partial f}{\partial \lambda}\]
  \[ h_{t - 1} = h_t + B_t \alpha_t, ~~~ \alpha_{t - 1} = A_t\alpha_t, ~~~ t = 0, …, T \]
        &lt;ul&gt;
          &lt;li&gt;need to store intermediate values $\{w_t \in \mathbb{R}^M\}_{t = 1}^T$, then the space requirement is $O(MT)$&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;forward mode differentiation&lt;/strong&gt; (FMD):
  \[ \frac{d f}{d \lambda} = Z_T \frac{\partial f}{\partial \hat{w}} + \frac{\partial f}{\partial \lambda}, ~~~ Z_0 = B_0\]
  \[ Z_{t + 1} = Z_t A_{t + 1} + B_{t + 1}, ~~~ t = 0, …, T - 1\]
        &lt;ul&gt;
          &lt;li&gt;need to propagate the matrices $Z_t \in \mathbb{R}^{M \times N}$, so the time complexity is N times of that of reverse mode RMD (matrix-matrix vs. matrix-vector multiplications), but no need to store intermediate values $w_t$ from the forward pass&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;K-step &lt;strong&gt;truncated backpropagation&lt;/strong&gt; (K-RMD):
  \[ h_{T - K} = \frac{\partial f}{\partial \lambda} + \sum_{t = T - K + 1}^T B_t A_{t + 1} … A_T \frac{\partial f}{\partial \hat{w}} \]
        &lt;ul&gt;
          &lt;li&gt;only need to store $\{w_t\}_{t = T - k + 1}^T$, the space requirement is $O(MK)$&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;main &lt;strong&gt;theoretical results&lt;/strong&gt; (informal):
    &lt;ol&gt;
      &lt;li&gt;(Accuracy) if the inner problem is locally strongly convex around $\hat{w}$, then the bias of $h_{T - K}$ decays exponentially in K.&lt;/li&gt;
      &lt;li&gt;(Sufficient Descent) if the inner problem $g$ is second-order continuously differentiable, then $-h_{T - K}$ is a sufficent descent direction.&lt;/li&gt;
      &lt;li&gt;(Convergence) following the above results, we have that on-average convergence to $\epsilon$-approximate stationary point is guaranteed by $O(\text{log } 1 / \epsilon)$-step truncated backpropagation&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;relation with &lt;strong&gt;Implicit Differentiation&lt;/strong&gt;:
    &lt;ul&gt;
      &lt;li&gt;in the limit where $\hat{w}$ converges to $w^{*}$, $h_{T - K}$ can be viewed as an order-K (i.e. first K terms) Taylor series approximating the matrix inverse in the total derivative, the residual term has an upper bound
  \[ \frac{df}{d\lambda} = \frac{\partial f}{\partial \lambda} - \frac{\partial^2 g}{\partial \lambda \partial w} \bigg( \frac{\partial^2 g}{\partial w^2} \bigg)^{-1} \frac{\partial f}{\partial \hat{w}}\]
        &lt;ul&gt;
          &lt;li&gt;note: the above equation relies on the assumptions that (1) g is second-order continuously differentiable (2) there exists a unique optimal solution $w^{*}$ and all the derivatives are evaluated at $w^{*}$&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;experiment: compare K-step truncated RMD to K-step &lt;strong&gt;Conjuagate Gradient Descent&lt;/strong&gt; (CG)
        &lt;ul&gt;
          &lt;li&gt;both require local strong-convexity to ensure a good approximation&lt;/li&gt;
          &lt;li&gt;if $w^{*}$ is available, then CG gives a smaller bias&lt;/li&gt;
          &lt;li&gt;in practice, $w^{*}$ is usually unknown, K-step truncated RMD has a weaker assumption&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">This post is a paper-reading note. Paper: Truncated Backpropagation for Bilevel Optimization, Amirreza Shaban &amp;amp; Ching-An Cheng et al. AISTATS 2018.</summary></entry><entry><title type="html">Variational Inference: ELBO and reparameterization trick</title><link href="https://b-mu.github.io//jekyll/update/2021/12/23/elbo.html" rel="alternate" type="text/html" title="Variational Inference: ELBO and reparameterization trick" /><published>2021-12-23T04:21:00+00:00</published><updated>2021-12-23T04:21:00+00:00</updated><id>https://b-mu.github.io//jekyll/update/2021/12/23/elbo</id><content type="html" xml:base="https://b-mu.github.io//jekyll/update/2021/12/23/elbo.html">&lt;p&gt;This post is a short review of Evidience Lower Bound (ELBO), which is the standard objective function to be optimized in Variational Inference.&lt;/p&gt;

&lt;h1 id=&quot;variational-inference&quot;&gt;Variational Inference&lt;/h1&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;latent variables&lt;/strong&gt;
    &lt;ul&gt;
      &lt;li&gt;latent/hidden variable: a random variable that cannot be conditioned on for inference because its value is unknown&lt;/li&gt;
      &lt;li&gt;Let the latent r.v. $\mathbf{Z}$ have distribution $p_{\theta^*}$ and the variable $\mathbf{X}$ have conditional distribution $p_{\theta^*} (x \vert z)$&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;objective&lt;/strong&gt;
    &lt;ul&gt;
      &lt;li&gt;get a maximum likelihood estimate for $\theta$, denoted $\theta_{MLE}$, so that we can estimate the distribution of variable given the latent variable $p_{\theta_{MLE}}(x \vert z)$ and marginal likelihood of an variable $p_{\theta_{MLE}}(x)$&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;maximum likelihood estimate&lt;/strong&gt; $\theta_{MLE}$: by maximizing the marginal likelihood
  \[p_{\theta}(x) = \int p_{\theta} (x \vert z) p_{\theta}(z) dz\]
        &lt;ul&gt;
          &lt;li&gt;suppose $p_{\theta} (x)$ and $p_{\theta} (z \vert x)$ are &lt;em&gt;intractable&lt;/em&gt;&lt;/li&gt;
          &lt;li&gt;&lt;strong&gt;Variational Inference&lt;/strong&gt;: approximate the posterior $p_{\theta}(z \vert x)$ then use it to estimate a lower bound on $\text{log } p_{\theta}(x)$ to update $\theta$
            &lt;ul&gt;
              &lt;li&gt;Let $q_{\phi} (z \vert x)$ be an approximating distribution for $p_{\theta} (z \vert x)$&lt;/li&gt;
              &lt;li&gt;$q_{\phi}$ is fit to $p_{\theta}$ by minimizing the &lt;strong&gt;Kullback-Leibler (KL) divergence&lt;/strong&gt; \[D_{KL}(q_{\phi} (z \vert x) ~\Vert~ p_{\theta} (z \vert x))\]&lt;/li&gt;
            &lt;/ul&gt;
          &lt;/li&gt;
          &lt;li&gt;the idea of using the posterior $p_{\theta} (z \vert x)$ to estimate the marginal likelihood is also used in the EM algorithm&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;evidence lower bound (ELBO)&lt;/strong&gt;
    &lt;ul&gt;
      &lt;li&gt;derivation:
  \(\begin{align}
  &amp;amp;p_{\theta}(z \vert x) = \frac{p_{\theta} (x, z)}{p_{\theta} (x)} ~~~ \# \text{ by definition of conditional probability}\\
  \Rightarrow &amp;amp;\text{log } p_{\theta} (x) = - \text{log } p_{\theta} (z \vert x) + \text{log } p_{\theta} (x, z) ~~~ \# \text{ take log}\\
  \Rightarrow &amp;amp;\text{log } p_{\theta} (x) = - \text{log } p_{\theta} (z \vert x) + \text{log } p_{\theta} (x, z) + \text{log } q_{\phi} (z \vert x) - \text{log } q_{\phi} (z \vert x) ~~~ \# \text{ add and subtract}\\
  \Rightarrow &amp;amp;\text{log } p_{\theta} (x) = \text{log } \frac{q_{\phi} (z \vert x)}{p_{\theta} (z \vert x)} + \text{log } \frac{p_{\theta} (x, z)}{q_{\phi} (z \vert x)} ~~~ \# \text{ rearrange}\\
  \Rightarrow &amp;amp;\text{log } p_{\theta} (x) = \underbrace{E_{z \sim q_{\phi}}\bigg[ \text{log } \frac{q_{\phi}(z \vert x)}{p_{\theta}(z \vert x)}\bigg]}_{D_{KL}(q_{\phi} \Vert p_{\theta})} + E_{z \sim q_{\phi}} [\text{log } p_{\theta}(x, z) - \text{log } q_{\phi} (z \vert x)] ~~~\# \text{ take expectation w.r.t. } z
  \end{align}\)
        &lt;ul&gt;
          &lt;li&gt;By holding $\theta$ as fixed, the LHS is fixed. If we increase $E_{z \sim q_{\phi}} [\text{log } p_{\theta} (x, z) - q_{\phi}(z \vert x)]$ w.r.t. $\phi$, then the KL divergence will decrease and the approximating distribution is improved.&lt;/li&gt;
          &lt;li&gt;Since the KL divergence is non-negative, we have \[\text{log } p_{\theta} (x) \geq E_{z \sim q_{\phi}} \bigg[ \text{log } p_{\theta} (x, z) - q_{\phi} (z \vert x) \bigg]\]&lt;/li&gt;
          &lt;li&gt;Denote the lower bound (ELBO) on the RHS by $\mathcal{L}(\theta, \phi; x)$. It is the objective function maximized in Variational Inference.&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;an analytical form may not be available for ELBO, use a Monte Carlo estimate of the expectation instead:
        &lt;ul&gt;
          &lt;li&gt;e.g. sample $z_i ~ (i = 1, …, N)$ from $q_{\phi} (z \vert x)$, then
  \[\hat{\mathcal{L}}(\theta, \phi; x) = \frac{1}{N} \sum_{i = 1}^N \bigg[\text{log } p_{\theta} (x, z_i) - \text{log } q_{\phi} (z_i \vert x) \bigg] \]
            &lt;ul&gt;
              &lt;li&gt;note: gradient-based method may not be directly applicable to update $\theta$ and $\phi$, since $z_i’s$ are not differentiable. This motivates the next section.&lt;/li&gt;
            &lt;/ul&gt;
          &lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;reparameterization trick&lt;/strong&gt;
    &lt;ul&gt;
      &lt;li&gt;TODO(bmu)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;reference&quot;&gt;Reference:&lt;/h1&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1312.6114&quot;&gt;Auto-Encoding Variational Bayes, Diederik P. Kingma et al. NeurIPS 2014&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">This post is a short review of Evidience Lower Bound (ELBO), which is the standard objective function to be optimized in Variational Inference.</summary></entry><entry><title type="html">Truncated Backpropagation through Time</title><link href="https://b-mu.github.io//jekyll/update/2021/12/23/truncated-bp-through-time.html" rel="alternate" type="text/html" title="Truncated Backpropagation through Time" /><published>2021-12-23T04:21:00+00:00</published><updated>2021-12-23T04:21:00+00:00</updated><id>https://b-mu.github.io//jekyll/update/2021/12/23/truncated-bp-through-time</id><content type="html" xml:base="https://b-mu.github.io//jekyll/update/2021/12/23/truncated-bp-through-time.html">&lt;p&gt;This post is a short review of Backpropagation Through Time and Truncated Backpropagation Through Time algorithms with a naive RNN model.&lt;/p&gt;

&lt;h1 id=&quot;recurrent-neural-network&quot;&gt;Recurrent Neural Network&lt;/h1&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;motivation&lt;/strong&gt;
    &lt;ul&gt;
      &lt;li&gt;handle varying length in samples&lt;/li&gt;
      &lt;li&gt;want to share features learned across different positions of sequence data&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;forward propagation&lt;/strong&gt;:
  \[ \mathbf{a}^{(t + 1)} = g_a(\mathbf{W_{aa}} \mathbf{a}^{(t)} + \mathbf{W_{ax}} x^{(t)} + \mathbf{b_a}), ~~~ \mathbf{y}^{(t + 1)} = g_y(\mathbf{W_{ya}} \mathbf{a}^{(t)} + \mathbf{b_y}) \]
    &lt;ul&gt;
      &lt;li&gt;note:
        &lt;ul&gt;
          &lt;li&gt;this is a naive RNN model with the simplest architecture&lt;/li&gt;
          &lt;li&gt;the parameters are &lt;em&gt;shared&lt;/em&gt; across the time steps&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;backpropagation through time&lt;/strong&gt;
    &lt;ul&gt;
      &lt;li&gt;loss
  \[ \mathcal{L}^{(t)} (\mathbf{\hat{y}}^{(t)}, \mathbf{y}^{t)}) = -\mathbf{y}^{(t)} \text{ log } \mathbf{\hat{y}}^{(t)} - (1 - \mathbf{y}^{(t)}) \text{ log } (1 - \mathbf{\hat{y}}^{(t)}), ~~~ \mathcal{L} = \sum_{t = 1}^T \mathcal{L}^{(t)} (\mathbf{\hat{y}}^{(t)}, \mathbf{y}^{(t)})\]&lt;/li&gt;
      &lt;li&gt;gradient descent on parameters &lt;img src=&quot;/assets/bptt.jpeg&quot; alt=&quot;BPTT&quot; /&gt;&lt;/li&gt;
      &lt;li&gt;heavy computational and memory cost:
        &lt;ul&gt;
          &lt;li&gt;need to store hidden states $\{\mathbf{a}^{(t)}\}_{t = 1}^T$&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;truncated-backpropagation-through-time&quot;&gt;Truncated Backpropagation through Time&lt;/h1&gt;
&lt;ul&gt;
  &lt;li&gt;evenly split a long sequence into groups of short sequences, for every $k_1$ forward steps, perform one backward pass over the latest $k_2 ~ (\geq k_1)$ steps, repeat this loop until reaching the end of sequence &lt;img src=&quot;/assets/tbptt.jpeg&quot; alt=&quot;TBPTT&quot; /&gt;&lt;/li&gt;
  &lt;li&gt;a practical method to reduce computational and memory cost, but lose long term dependency and has biased gradient estimate&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;anticipated-reweighted-truncated-backpropagation-artbp&quot;&gt;Anticipated Reweighted Truncated Backpropagation (ARTBP)&lt;/h1&gt;
&lt;p&gt;&lt;img src=&quot;/assets/artbp.png&quot; alt=&quot;TBP&quot; /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;TODO(bmu)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;reference&quot;&gt;Reference&lt;/h1&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.researchgate.net/publication/2343555_An_Efficient_Gradient-Based_Algorithm_for_On-Line_Training_of_Recurrent_Network_Trajectories&quot;&gt;An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories, Ronald J. Williams et al. 1990&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf&quot;&gt;Training Recurrent Neural Network, Ilya Sutskever 2013&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1705.08209&quot;&gt;Unbiasing Truncated Backpropagation Through Time, Corentin Tallec et al. arxiv 2017&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">This post is a short review of Backpropagation Through Time and Truncated Backpropagation Through Time algorithms with a naive RNN model.</summary></entry><entry><title type="html">How to enable MathJax in Jekyll minima theme</title><link href="https://b-mu.github.io//jekyll/update/2021/12/22/enable-mathjax-in-jekyll.html" rel="alternate" type="text/html" title="How to enable MathJax in Jekyll minima theme" /><published>2021-12-22T04:21:00+00:00</published><updated>2021-12-22T04:21:00+00:00</updated><id>https://b-mu.github.io//jekyll/update/2021/12/22/enable-mathjax-in-jekyll</id><content type="html" xml:base="https://b-mu.github.io//jekyll/update/2021/12/22/enable-mathjax-in-jekyll.html">&lt;p&gt;1: Find the minima bundle&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;bundle show minima
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;2: Inside the bundle (given by step 1), add the code block below to the end of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_layouts/default.html&lt;/code&gt;  (i.e. after the outmost &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;html&amp;gt; ... &amp;lt;/html&amp;gt;&lt;/code&gt;)&lt;/p&gt;

&lt;div class=&quot;language-html highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;script &lt;/span&gt;&lt;span class=&quot;na&quot;&gt;type=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;text/x-mathjax-config&quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;MathJax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;Hub&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;Config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;extensions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;tex2jax.js&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;jax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;input/TeX&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;output/HTML-CSS&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;tex2jax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;inlineMath&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\\&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\\&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;displayMath&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;$$&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;$$&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\\&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\\&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;processEscapes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
    &lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;HTML-CSS&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;fonts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;TeX&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;script &lt;/span&gt;&lt;span class=&quot;na&quot;&gt;src=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;type=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;text/javascript&quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
  &lt;li&gt;note:
    &lt;ul&gt;
      &lt;li&gt;this step enables MathJax (with both inline and display mode)
        &lt;ul&gt;
          &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$$...$$&lt;/code&gt; is only rendered as display math if the lines above and below it are blank&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;the above code uses the copy of MathJax from a Content Delivery Network &lt;a href=&quot;https://cdnjs.com/&quot;&gt;cdnjs&lt;/a&gt;, which needs network access. alternative: download and install a local copy of MathJax on server/hard disk&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;3: Create a local copy of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_layouts&lt;/code&gt; and save the change&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cp -r {path to _layouts in the bundle} {path to repo}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;reference&quot;&gt;Reference:&lt;/h1&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.mathjax.org/en/v2.7-latest/configuration.html&quot;&gt;MathJax v2.7 docs: Loading and Configuring MathJax&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://zjuwhw.github.io/2017/06/04/MathJax.html&quot;&gt;Blog: Use MathJax to write Equations in Jekyll blogs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">1: Find the minima bundle bundle show minima</summary></entry><entry><title type="html">mac os 10.15 + bootcamp(win 10) + nvidia eGPU</title><link href="https://b-mu.github.io//jekyll/update/2020/09/15/mac-nvidia-egpu.html" rel="alternate" type="text/html" title="mac os 10.15 + bootcamp(win 10) + nvidia eGPU" /><published>2020-09-15T04:21:00+00:00</published><updated>2020-09-15T04:21:00+00:00</updated><id>https://b-mu.github.io//jekyll/update/2020/09/15/mac-nvidia-egpu</id><content type="html" xml:base="https://b-mu.github.io//jekyll/update/2020/09/15/mac-nvidia-egpu.html">&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;back up mac&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;bootcamp win 10
    &lt;ul&gt;
      &lt;li&gt;download &lt;a href=&quot;https://www.microsoft.com/en-ca/software-download/&quot;&gt;win 10 iso&lt;/a&gt;(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;~&lt;/code&gt;5.7G)&lt;/li&gt;
      &lt;li&gt;launch bootcamp, select a partition size, install windows (&lt;a href=&quot;https://support.apple.com/en-ca/HT201468&quot;&gt;Apple’s instruction&lt;/a&gt;)&lt;/li&gt;
      &lt;li&gt;system will reboot in windows&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;nvidia drivers
    &lt;ul&gt;
      &lt;li&gt;plugin gpu and turn on gpu power supply (for msi RTX2080 super, needs all of 2 x 8 pins)&lt;/li&gt;
      &lt;li&gt;install &lt;a href=&quot;https://www.nvidia.com/en-us/geforce/geforce-experience/&quot;&gt;geforce experience&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;check drivers update in geforce experience&lt;/li&gt;
      &lt;li&gt;if you need nvidia control panel but it is missing, try: &lt;a href=&quot;https://www.youtube.com/watch?v=Ytnv8XJ_hV4&quot;&gt;standard driver instead of DCH drivers&lt;/a&gt; by using &lt;a href=&quot;https://www.nvidia.com/Download/Find.aspx&quot;&gt;advance drivers search&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;anaconda
    &lt;ul&gt;
      &lt;li&gt;install &lt;a href=&quot;https://www.anaconda.com/products/individual&quot;&gt;anaconda&lt;/a&gt;
        &lt;ul&gt;
          &lt;li&gt;tick add anaconda to Path variable&lt;/li&gt;
          &lt;li&gt;or do it manually by start -&amp;gt; type “env var” -&amp;gt; click “edit the system environment variables” -&amp;gt; click “environment variables” -&amp;gt; click “Path” under User variables and click edit -&amp;gt; add “C:\Users[user_name]\anaconda3; C:\Users[user_name]\anaconda3\Scripts;”&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;create env &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;conda create --name env_gpu&lt;/code&gt;&lt;/li&gt;
      &lt;li&gt;install tensorfow-gpu 1 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;conda install tensorflow-gpu=1.15&lt;/code&gt; (this will automatically install cuda 10.0 and cudnn 7.6, &lt;a href=&quot;https://www.tensorflow.org/install/source&quot;&gt;compatible&lt;/a&gt; with tensorflow 1.15)&lt;/li&gt;
      &lt;li&gt;check if gpu is visible: &lt;a href=&quot;https://stackoverflow.com/questions/38009682/how-to-tell-if-tensorflow-is-using-gpu-acceleration-from-inside-python-shell/38019608&quot;&gt;several methods&lt;/a&gt;, alternatively, run the command: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nvidia-smi&lt;/code&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;git-bash
    &lt;ul&gt;
      &lt;li&gt;install &lt;a href=&quot;https://git-scm.com/downloads&quot;&gt;git&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;if anaconda is successfully added to Path variable, then the command “conda activate env_gpu” should be recognized and the environment should be activated&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;up to now, python scripts should be able to run in this environment. on the other hand, since it is in a bash shell, .sh scripts should also work.&lt;/p&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">back up mac</summary></entry></feed>