<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/stylesheet.xsl" type="text/xsl"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:podcast="https://podcastindex.org/namespace/1.0">
  <channel>
    <atom:link rel="self" type="application/rss+xml" href="https://feeds.transistor.fm/talkrl" title="MP3 Audio"/>
    <atom:link rel="hub" href="https://pubsubhubbub.appspot.com/"/>
    <podcast:podping usesPodping="true"/>
    <title>TalkRL: The Reinforcement Learning Podcast</title>
    <generator>Transistor (https://transistor.fm)</generator>
    <itunes:new-feed-url>https://feeds.transistor.fm/talkrl</itunes:new-feed-url>
    <description>TalkRL podcast is All Reinforcement Learning, All the Time.  
In-depth interviews with brilliant people at the forefront of RL research and practice. 
Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute. 
Hosted by Robin Ranjit Singh Chauhan.</description>
    <copyright>© 2026 Robin Ranjit Singh Chauhan</copyright>
    <podcast:guid>9df41ab7-ec6e-513e-ad8e-dba745580575</podcast:guid>
    <podcast:locked>yes</podcast:locked>
    <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    <podcast:trailer pubdate="Thu, 01 Aug 2019 12:00:00 -0700" url="https://media.transistor.fm/eb1eb0e8/e7088950.mp3" length="1634446" type="audio/mpeg">About TalkRL Podcast: All Reinforcement Learning, All the Time</podcast:trailer>
    <language>en</language>
    <pubDate>Sun, 19 Oct 2025 17:38:13 -0700</pubDate>
    <lastBuildDate>Fri, 06 Mar 2026 09:09:43 -0800</lastBuildDate>
    <link>https://www.talkrl.com</link>
    <image>
      <url>https://img.transistorcdn.com/JTeBOLVE8cxOHij8dslp7TQrxYBUFEBnjYRYRPw5_Ik/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9zaG93/LzIwNDcvMTcwNzk1/NDcxMS1hcnR3b3Jr/LmpwZw.jpg</url>
      <title>TalkRL: The Reinforcement Learning Podcast</title>
      <link>https://www.talkrl.com</link>
    </image>
    <itunes:category text="Technology"/>
    <itunes:category text="Technology"/>
    <itunes:type>episodic</itunes:type>
    <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
    <itunes:image href="https://img.transistorcdn.com/JTeBOLVE8cxOHij8dslp7TQrxYBUFEBnjYRYRPw5_Ik/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9zaG93/LzIwNDcvMTcwNzk1/NDcxMS1hcnR3b3Jr/LmpwZw.jpg"/>
    <itunes:summary>TalkRL podcast is All Reinforcement Learning, All the Time.  
In-depth interviews with brilliant people at the forefront of RL research and practice. 
Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute. 
Hosted by Robin Ranjit Singh Chauhan.</itunes:summary>
    <itunes:subtitle>TalkRL podcast is All Reinforcement Learning, All the Time.</itunes:subtitle>
    <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
    <itunes:owner>
      <itunes:name>Robin Ranjit Singh Chauhan</itunes:name>
    </itunes:owner>
    <itunes:complete>No</itunes:complete>
    <itunes:explicit>No</itunes:explicit>
    <item>
      <title>Joseph Modayil of Openmind Research Institute @ RLC 2025</title>
      <itunes:episode>71</itunes:episode>
      <podcast:episode>71</podcast:episode>
      <itunes:title>Joseph Modayil of Openmind Research Institute @ RLC 2025</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">7b3c2192-6171-4089-9f5d-ad4471d6de2e</guid>
      <link>https://share.transistor.fm/s/fdb5233c</link>
      <description>
        <![CDATA[<p>Joseph Modayil is the Founder, President &amp; Research Director of Openmind Research Institute.</p><p><strong>Featured References  </strong></p><p><a href="https://www.openmindresearch.org/">Openmind Research Institute</a>  <strong></strong></p><p><a href="https://arxiv.org/abs/2208.11173">The Alberta Plan for AI Research</a>  <br>Richard S. Sutton, Michael Bowling, Patrick M. Pilarski  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://scholar.google.co.in/citations?user=G3pvUNEAAAAJ&amp;hl=ja">Joseph Modayil on Google Scholar</a>  </li><li><a href="https://josephmodayil.com/">Joseph Modayil Homepage</a>  </li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Joseph Modayil is the Founder, President &amp; Research Director of Openmind Research Institute.</p><p><strong>Featured References  </strong></p><p><a href="https://www.openmindresearch.org/">Openmind Research Institute</a>  <strong></strong></p><p><a href="https://arxiv.org/abs/2208.11173">The Alberta Plan for AI Research</a>  <br>Richard S. Sutton, Michael Bowling, Patrick M. Pilarski  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://scholar.google.co.in/citations?user=G3pvUNEAAAAJ&amp;hl=ja">Joseph Modayil on Google Scholar</a>  </li><li><a href="https://josephmodayil.com/">Joseph Modayil Homepage</a>  </li></ul>]]>
      </content:encoded>
      <pubDate>Fri, 02 Jan 2026 21:00:00 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/fdb5233c/043d21d5.mp3" length="4323412" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/uTa44dUPdpLbEbwSbuOE7LDBTLchZkUaIE7bTAdXERY/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS80YWE0/YjQwYzc1MzA1ZGNj/YjIxNjhhZTBmMzJi/OGZlNC5qcGc.jpg"/>
      <itunes:duration>267</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Joseph Modayil is the Founder, President &amp; Research Director of Openmind Research Institute.</p><p><strong>Featured References  </strong></p><p><a href="https://www.openmindresearch.org/">Openmind Research Institute</a>  <strong></strong></p><p><a href="https://arxiv.org/abs/2208.11173">The Alberta Plan for AI Research</a>  <br>Richard S. Sutton, Michael Bowling, Patrick M. Pilarski  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://scholar.google.co.in/citations?user=G3pvUNEAAAAJ&amp;hl=ja">Joseph Modayil on Google Scholar</a>  </li><li><a href="https://josephmodayil.com/">Joseph Modayil Homepage</a>  </li></ul>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/fdb5233c/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/fdb5233c/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/fdb5233c/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/fdb5233c/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/fdb5233c/transcription" type="text/html"/>
    </item>
    <item>
      <title>Danijar Hafner on Dreamer v4</title>
      <itunes:episode>73</itunes:episode>
      <podcast:episode>73</podcast:episode>
      <itunes:title>Danijar Hafner on Dreamer v4</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">4bd16232-285d-418a-8b85-6efe49df3e35</guid>
      <link>https://share.transistor.fm/s/e440a692</link>
      <description>
        <![CDATA[<p>Danijar Hafner was a Research Scientist at Google DeepMind until recently.</p><p><br><strong>Featured References   </strong></p><p><a href="https://arxiv.org/abs/2509.24527">Training Agents Inside of Scalable World Models</a> [ <a href="https://danijar.com/project/dreamer4/">blog</a> ]  <br>Danijar Hafner, Wilson Yan, Timothy Lillicrap</p><p><a href="https://arxiv.org/abs/2410.12557">One Step Diffusion via Shortcut Models</a><br>Kevin Frans, Danijar Hafner, Sergey Levine, Pieter Abbeel</p><p><a href="https://arxiv.org/abs/2009.01791">Action and Perception as Divergence Minimization</a> [ <a href="https://danijar.com/project/apd/">blog</a> ] <br>Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess </p><p><br><strong>Additional References   </strong></p><ul><li><a href="https://arxiv.org/abs/2301.04104v1">Mastering Diverse Domains through World Models</a> [ <a href="https://danijar.com/project/dreamerv3/">blog</a> ] DreaverV3l Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap   </li><li><a href="https://arxiv.org/abs/2010.02193">Mastering Atari with Discrete World Models</a> [ <a href="https://danijar.com/project/dreamerv2/">blog</a> ] DreaverV2 ; Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba   </li><li><a href="https://arxiv.org/abs/1912.01603">Dream to Control: Learning Behaviors by Latent Imagination</a> [ <a href="https://danijar.com/project/dreamer/">blog</a> ] Dreamer ; Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi </li><li><a href="https://arxiv.org/abs/2206.11795">Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos</a> [ <a href="https://openai.com/research/vpt">Blog Post</a> ], Baker et al</li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Danijar Hafner was a Research Scientist at Google DeepMind until recently.</p><p><br><strong>Featured References   </strong></p><p><a href="https://arxiv.org/abs/2509.24527">Training Agents Inside of Scalable World Models</a> [ <a href="https://danijar.com/project/dreamer4/">blog</a> ]  <br>Danijar Hafner, Wilson Yan, Timothy Lillicrap</p><p><a href="https://arxiv.org/abs/2410.12557">One Step Diffusion via Shortcut Models</a><br>Kevin Frans, Danijar Hafner, Sergey Levine, Pieter Abbeel</p><p><a href="https://arxiv.org/abs/2009.01791">Action and Perception as Divergence Minimization</a> [ <a href="https://danijar.com/project/apd/">blog</a> ] <br>Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess </p><p><br><strong>Additional References   </strong></p><ul><li><a href="https://arxiv.org/abs/2301.04104v1">Mastering Diverse Domains through World Models</a> [ <a href="https://danijar.com/project/dreamerv3/">blog</a> ] DreaverV3l Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap   </li><li><a href="https://arxiv.org/abs/2010.02193">Mastering Atari with Discrete World Models</a> [ <a href="https://danijar.com/project/dreamerv2/">blog</a> ] DreaverV2 ; Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba   </li><li><a href="https://arxiv.org/abs/1912.01603">Dream to Control: Learning Behaviors by Latent Imagination</a> [ <a href="https://danijar.com/project/dreamer/">blog</a> ] Dreamer ; Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi </li><li><a href="https://arxiv.org/abs/2206.11795">Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos</a> [ <a href="https://openai.com/research/vpt">Blog Post</a> ], Baker et al</li></ul>]]>
      </content:encoded>
      <pubDate>Sun, 09 Nov 2025 23:00:00 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/e440a692/bfc0657e.mp3" length="81038948" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/W7TyxYCReaPtrEYqKmeXZ2nFoOtGKNG9QUm82FbQ7vQ/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9jODll/MTI1MjQyNGY4MDVl/NjdmNGIwN2NmMTE0/NTE5Yi5qcGc.jpg"/>
      <itunes:duration>6052</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Danijar Hafner was a Research Scientist at Google DeepMind until recently.</p><p><br><strong>Featured References   </strong></p><p><a href="https://arxiv.org/abs/2509.24527">Training Agents Inside of Scalable World Models</a> [ <a href="https://danijar.com/project/dreamer4/">blog</a> ]  <br>Danijar Hafner, Wilson Yan, Timothy Lillicrap</p><p><a href="https://arxiv.org/abs/2410.12557">One Step Diffusion via Shortcut Models</a><br>Kevin Frans, Danijar Hafner, Sergey Levine, Pieter Abbeel</p><p><a href="https://arxiv.org/abs/2009.01791">Action and Perception as Divergence Minimization</a> [ <a href="https://danijar.com/project/apd/">blog</a> ] <br>Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess </p><p><br><strong>Additional References   </strong></p><ul><li><a href="https://arxiv.org/abs/2301.04104v1">Mastering Diverse Domains through World Models</a> [ <a href="https://danijar.com/project/dreamerv3/">blog</a> ] DreaverV3l Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap   </li><li><a href="https://arxiv.org/abs/2010.02193">Mastering Atari with Discrete World Models</a> [ <a href="https://danijar.com/project/dreamerv2/">blog</a> ] DreaverV2 ; Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba   </li><li><a href="https://arxiv.org/abs/1912.01603">Dream to Control: Learning Behaviors by Latent Imagination</a> [ <a href="https://danijar.com/project/dreamer/">blog</a> ] Dreamer ; Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi </li><li><a href="https://arxiv.org/abs/2206.11795">Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos</a> [ <a href="https://openai.com/research/vpt">Blog Post</a> ], Baker et al</li></ul>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/e440a692/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/e440a692/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/e440a692/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/e440a692/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/e440a692/transcription" type="text/html"/>
    </item>
    <item>
      <title>David Abel on the Science of Agency @ RLDM 2025</title>
      <itunes:episode>72</itunes:episode>
      <podcast:episode>72</podcast:episode>
      <itunes:title>David Abel on the Science of Agency @ RLDM 2025</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">471d1732-233a-49e1-8e08-70fba25946b4</guid>
      <link>https://share.transistor.fm/s/8fe37747</link>
      <description>
        <![CDATA[<p>David Abel is a Senior Research Scientist at DeepMind on the Agency team, and an Honorary Fellow at the University of Edinburgh. His research blends computer science and philosophy, exploring foundational questions about reinforcement learning, definitions, and the nature of agency.  </p><p><br><strong>Featured References  </strong></p><p><br><a href="https://arxiv.org/pdf/2505.10361">Plasticity as the Mirror of Empowerment</a>  <br> David Abel, Michael Bowling, André Barreto, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh  </p><p><br><a href="https://arxiv.org/pdf/2307.11046">A Definition of Continual RL</a>  <br> David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh  </p><p><br><a href="https://arxiv.org/pdf/2502.04403">Agency is Frame-Dependent</a>  <br> David Abel, André Barreto, Michael Bowling, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh  </p><p><br><a href="https://arxiv.org/abs/2111.00876">On the Expressivity of Markov Reward</a>  <br> David Abel, Will Dabney, Anna Harutyunyan, Mark Ho, Michael Littman, Doina Precup, Satinder Singh — Outstanding Paper Award, NeurIPS 2021  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://ieeexplore.ieee.org/abstract/document/1091610/similar#similar">Bidirectional Communication Theory</a> — Marko 1973  </li><li><a href="https://www.isiweb.ee.ethz.ch/archive/massey_pub/pdf/BI532.pdf">Causality, Feedback and Directed Information</a> — Massey 1990  </li><li><a href="https://openreview.net/forum?id=Sv7DazuCn8">The Big World Hypothesis</a> — Javed et al. 2024  </li><li><a href="https://www.nature.com/articles/s41586-024-07711-7">Loss of plasticity in deep continual learning</a> — Dohare et al. 2024  </li><li><a href="https://david-abel.github.io/tdorl.pdf">Three Dogmas of Reinforcement Learning</a> — Abel 2024  </li><li><a href="https://pubmed.ncbi.nlm.nih.gov/39054370/">Explaining dopamine through prediction errors and beyond</a> — Gershman et al. 2024  </li><li><a href="https://scholar.google.com/citations?user=lvBJlmwAAAAJ&amp;hl=en">David Abel Google Scholar</a>  </li><li><a href="https://david-abel.github.io/">David Abel personal website</a>  </li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>David Abel is a Senior Research Scientist at DeepMind on the Agency team, and an Honorary Fellow at the University of Edinburgh. His research blends computer science and philosophy, exploring foundational questions about reinforcement learning, definitions, and the nature of agency.  </p><p><br><strong>Featured References  </strong></p><p><br><a href="https://arxiv.org/pdf/2505.10361">Plasticity as the Mirror of Empowerment</a>  <br> David Abel, Michael Bowling, André Barreto, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh  </p><p><br><a href="https://arxiv.org/pdf/2307.11046">A Definition of Continual RL</a>  <br> David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh  </p><p><br><a href="https://arxiv.org/pdf/2502.04403">Agency is Frame-Dependent</a>  <br> David Abel, André Barreto, Michael Bowling, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh  </p><p><br><a href="https://arxiv.org/abs/2111.00876">On the Expressivity of Markov Reward</a>  <br> David Abel, Will Dabney, Anna Harutyunyan, Mark Ho, Michael Littman, Doina Precup, Satinder Singh — Outstanding Paper Award, NeurIPS 2021  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://ieeexplore.ieee.org/abstract/document/1091610/similar#similar">Bidirectional Communication Theory</a> — Marko 1973  </li><li><a href="https://www.isiweb.ee.ethz.ch/archive/massey_pub/pdf/BI532.pdf">Causality, Feedback and Directed Information</a> — Massey 1990  </li><li><a href="https://openreview.net/forum?id=Sv7DazuCn8">The Big World Hypothesis</a> — Javed et al. 2024  </li><li><a href="https://www.nature.com/articles/s41586-024-07711-7">Loss of plasticity in deep continual learning</a> — Dohare et al. 2024  </li><li><a href="https://david-abel.github.io/tdorl.pdf">Three Dogmas of Reinforcement Learning</a> — Abel 2024  </li><li><a href="https://pubmed.ncbi.nlm.nih.gov/39054370/">Explaining dopamine through prediction errors and beyond</a> — Gershman et al. 2024  </li><li><a href="https://scholar.google.com/citations?user=lvBJlmwAAAAJ&amp;hl=en">David Abel Google Scholar</a>  </li><li><a href="https://david-abel.github.io/">David Abel personal website</a>  </li></ul>]]>
      </content:encoded>
      <pubDate>Mon, 08 Sep 2025 10:34:40 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/8fe37747/d9dc25af.mp3" length="57349969" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/tRmL_xgpMx38zGGhUKeJFxdqLrZ9bcKPucXHN4Gdsmg/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8xMGQz/NGRkYzdkYzExOWUy/NGNmNDg1NDUxMDg0/MjBlYi5qcGVn.jpg"/>
      <itunes:duration>3582</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>David Abel is a Senior Research Scientist at DeepMind on the Agency team, and an Honorary Fellow at the University of Edinburgh. His research blends computer science and philosophy, exploring foundational questions about reinforcement learning, definitions, and the nature of agency.  </p><p><br><strong>Featured References  </strong></p><p><br><a href="https://arxiv.org/pdf/2505.10361">Plasticity as the Mirror of Empowerment</a>  <br> David Abel, Michael Bowling, André Barreto, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh  </p><p><br><a href="https://arxiv.org/pdf/2307.11046">A Definition of Continual RL</a>  <br> David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh  </p><p><br><a href="https://arxiv.org/pdf/2502.04403">Agency is Frame-Dependent</a>  <br> David Abel, André Barreto, Michael Bowling, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh  </p><p><br><a href="https://arxiv.org/abs/2111.00876">On the Expressivity of Markov Reward</a>  <br> David Abel, Will Dabney, Anna Harutyunyan, Mark Ho, Michael Littman, Doina Precup, Satinder Singh — Outstanding Paper Award, NeurIPS 2021  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://ieeexplore.ieee.org/abstract/document/1091610/similar#similar">Bidirectional Communication Theory</a> — Marko 1973  </li><li><a href="https://www.isiweb.ee.ethz.ch/archive/massey_pub/pdf/BI532.pdf">Causality, Feedback and Directed Information</a> — Massey 1990  </li><li><a href="https://openreview.net/forum?id=Sv7DazuCn8">The Big World Hypothesis</a> — Javed et al. 2024  </li><li><a href="https://www.nature.com/articles/s41586-024-07711-7">Loss of plasticity in deep continual learning</a> — Dohare et al. 2024  </li><li><a href="https://david-abel.github.io/tdorl.pdf">Three Dogmas of Reinforcement Learning</a> — Abel 2024  </li><li><a href="https://pubmed.ncbi.nlm.nih.gov/39054370/">Explaining dopamine through prediction errors and beyond</a> — Gershman et al. 2024  </li><li><a href="https://scholar.google.com/citations?user=lvBJlmwAAAAJ&amp;hl=en">David Abel Google Scholar</a>  </li><li><a href="https://david-abel.github.io/">David Abel personal website</a>  </li></ul>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/8fe37747/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/8fe37747/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/8fe37747/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/8fe37747/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/8fe37747/transcription" type="text/html"/>
    </item>
    <item>
      <title>Jake Beck, Alex Goldie, &amp; Cornelius Braun on Sutton's OaK, Metalearning, LLMs, Squirrels @ RLC 2025</title>
      <itunes:episode>71</itunes:episode>
      <podcast:episode>71</podcast:episode>
      <itunes:title>Jake Beck, Alex Goldie, &amp; Cornelius Braun on Sutton's OaK, Metalearning, LLMs, Squirrels @ RLC 2025</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">cf851a6b-aba3-4762-9457-585dbde39eec</guid>
      <link>https://share.transistor.fm/s/bfaf0f4b</link>
      <description>
        <![CDATA[<p>Recorded at Reinforcement Learning Conference 2025 at University of Alberta, Edmonton Alberta Canada.</p><p><strong>Featured References<br></strong><br><a href="https://www.youtube.com/live/XqYTQfQeMrE?t=22620s">Lecture on the Oak Architecture</a>, Rich Sutton</p><p><a href="http://www.incompleteideas.net/Talks/AlbertaPlan.pdf">Alberta Plan</a>, Rich Sutton with Mike Bowling and Patrick Pilarski </p><p><br><strong>Additional References</strong></p><ul><li><a href="https://scholar.google.ca/citations?user=PrS_dHMAAAAJ&amp;hl=en&amp;oi=sra">Jacob Beck</a> on Google Scholar </li><li><a href="https://scholar.google.com/citations?user=wogOjBsAAAAJ&amp;hl=en">Alex Goldie</a> on Google Scholar</li><li><a href="https://scholar.google.com/citations?user=Fh-XpPkAAAAJ&amp;hl=de">Cornelius Braun</a> on Google Scholar</li><li><a href="https://rl-conference.cc/">Reinforcement Learning Conference</a></li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Recorded at Reinforcement Learning Conference 2025 at University of Alberta, Edmonton Alberta Canada.</p><p><strong>Featured References<br></strong><br><a href="https://www.youtube.com/live/XqYTQfQeMrE?t=22620s">Lecture on the Oak Architecture</a>, Rich Sutton</p><p><a href="http://www.incompleteideas.net/Talks/AlbertaPlan.pdf">Alberta Plan</a>, Rich Sutton with Mike Bowling and Patrick Pilarski </p><p><br><strong>Additional References</strong></p><ul><li><a href="https://scholar.google.ca/citations?user=PrS_dHMAAAAJ&amp;hl=en&amp;oi=sra">Jacob Beck</a> on Google Scholar </li><li><a href="https://scholar.google.com/citations?user=wogOjBsAAAAJ&amp;hl=en">Alex Goldie</a> on Google Scholar</li><li><a href="https://scholar.google.com/citations?user=Fh-XpPkAAAAJ&amp;hl=de">Cornelius Braun</a> on Google Scholar</li><li><a href="https://rl-conference.cc/">Reinforcement Learning Conference</a></li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Tue, 19 Aug 2025 00:24:47 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/bfaf0f4b/2fbeab26.mp3" length="11878711" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/QiLjtOmEWwyJY3823RvVwjxlGXzd6Csz3hosecuc66c/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9hOTk5/NWFiYzA0NzQzYTk1/MTE2YzhlNjEwZmQw/MWNmMy5qcGc.jpg"/>
      <itunes:duration>740</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Recorded at Reinforcement Learning Conference 2025 at University of Alberta, Edmonton Alberta Canada.</p><p><strong>Featured References<br></strong><br><a href="https://www.youtube.com/live/XqYTQfQeMrE?t=22620s">Lecture on the Oak Architecture</a>, Rich Sutton</p><p><a href="http://www.incompleteideas.net/Talks/AlbertaPlan.pdf">Alberta Plan</a>, Rich Sutton with Mike Bowling and Patrick Pilarski </p><p><br><strong>Additional References</strong></p><ul><li><a href="https://scholar.google.ca/citations?user=PrS_dHMAAAAJ&amp;hl=en&amp;oi=sra">Jacob Beck</a> on Google Scholar </li><li><a href="https://scholar.google.com/citations?user=wogOjBsAAAAJ&amp;hl=en">Alex Goldie</a> on Google Scholar</li><li><a href="https://scholar.google.com/citations?user=Fh-XpPkAAAAJ&amp;hl=de">Cornelius Braun</a> on Google Scholar</li><li><a href="https://rl-conference.cc/">Reinforcement Learning Conference</a></li></ul><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/bfaf0f4b/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/bfaf0f4b/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/bfaf0f4b/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/bfaf0f4b/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/bfaf0f4b/transcription" type="text/html"/>
    </item>
    <item>
      <title>Outstanding Paper Award Winners - 2/2 @ RLC 2025</title>
      <itunes:episode>70</itunes:episode>
      <podcast:episode>70</podcast:episode>
      <itunes:title>Outstanding Paper Award Winners - 2/2 @ RLC 2025</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">d3273e74-11b7-42a2-b499-586d53184af2</guid>
      <link>https://share.transistor.fm/s/d7b2c262</link>
      <description>
        <![CDATA[<p>We caught up with the <a href="https://rl-conference.cc/RLC2025Awards.html">RLC Outstanding Paper award winners</a> for your listening pleasure. </p><p>Recorded on location at <a href="https://rl-conference.cc/">Reinforcement Learning Conference 2025</a>, at University of Alberta, in Edmonton Alberta Canada in August 2025.</p><p><strong>Featured References <br></strong><br><em>Empirical Reinforcement Learning Research<br></em><a href="https://openreview.net/forum?id=aeY0CAOnca">Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions<br></a>Ayush Jain, Norio Kosaka, Xinhu Li, Kyung-Min Kim, Erdem Biyik, Joseph J Lim</p><p><em>Applications of Reinforcement Learning<br></em><a href="https://openreview.net/forum?id=x00VCsuHAb">WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies</a><br>William Solow, Sandhya Saisubramanian, Alan Fern</p><p><em>Emerging Topics in Reinforcement Learning<br></em><a href="https://openreview.net/forum?id=XZBYLXNGjT">Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners</a><br>Calarina Muslimani, Kerrick Johnstonbaugh, Suyog Chandramouli, Serena Booth, W. Bradley Knox, Matthew E. Taylor</p><p><em>Scientific Understanding in Reinforcement Learning<br></em><a href="https://openreview.net/forum?id=eBWwBIFV7T#discussion">Multi-Task Reinforcement Learning Enables Parameter Scaling</a><br>Reginald McLean, Evangelos Chatzaroulas, J K Terry, Isaac Woungang, Nariman Farsad, Pablo Samuel Castro</p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>We caught up with the <a href="https://rl-conference.cc/RLC2025Awards.html">RLC Outstanding Paper award winners</a> for your listening pleasure. </p><p>Recorded on location at <a href="https://rl-conference.cc/">Reinforcement Learning Conference 2025</a>, at University of Alberta, in Edmonton Alberta Canada in August 2025.</p><p><strong>Featured References <br></strong><br><em>Empirical Reinforcement Learning Research<br></em><a href="https://openreview.net/forum?id=aeY0CAOnca">Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions<br></a>Ayush Jain, Norio Kosaka, Xinhu Li, Kyung-Min Kim, Erdem Biyik, Joseph J Lim</p><p><em>Applications of Reinforcement Learning<br></em><a href="https://openreview.net/forum?id=x00VCsuHAb">WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies</a><br>William Solow, Sandhya Saisubramanian, Alan Fern</p><p><em>Emerging Topics in Reinforcement Learning<br></em><a href="https://openreview.net/forum?id=XZBYLXNGjT">Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners</a><br>Calarina Muslimani, Kerrick Johnstonbaugh, Suyog Chandramouli, Serena Booth, W. Bradley Knox, Matthew E. Taylor</p><p><em>Scientific Understanding in Reinforcement Learning<br></em><a href="https://openreview.net/forum?id=eBWwBIFV7T#discussion">Multi-Task Reinforcement Learning Enables Parameter Scaling</a><br>Reginald McLean, Evangelos Chatzaroulas, J K Terry, Isaac Woungang, Nariman Farsad, Pablo Samuel Castro</p>]]>
      </content:encoded>
      <pubDate>Sun, 17 Aug 2025 23:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/d7b2c262/c4f8c609.mp3" length="13776420" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/P2ZO9ByR22E87EK7anQhWhetIDQRAT5Y5HyxnbkxTck/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lMzkx/ODEwZmM1YjQ3MGIz/OGJhMDQ3MTE3OGE2/MTgxZi53ZWJw.jpg"/>
      <itunes:duration>858</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>We caught up with the <a href="https://rl-conference.cc/RLC2025Awards.html">RLC Outstanding Paper award winners</a> for your listening pleasure. </p><p>Recorded on location at <a href="https://rl-conference.cc/">Reinforcement Learning Conference 2025</a>, at University of Alberta, in Edmonton Alberta Canada in August 2025.</p><p><strong>Featured References <br></strong><br><em>Empirical Reinforcement Learning Research<br></em><a href="https://openreview.net/forum?id=aeY0CAOnca">Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions<br></a>Ayush Jain, Norio Kosaka, Xinhu Li, Kyung-Min Kim, Erdem Biyik, Joseph J Lim</p><p><em>Applications of Reinforcement Learning<br></em><a href="https://openreview.net/forum?id=x00VCsuHAb">WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies</a><br>William Solow, Sandhya Saisubramanian, Alan Fern</p><p><em>Emerging Topics in Reinforcement Learning<br></em><a href="https://openreview.net/forum?id=XZBYLXNGjT">Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners</a><br>Calarina Muslimani, Kerrick Johnstonbaugh, Suyog Chandramouli, Serena Booth, W. Bradley Knox, Matthew E. Taylor</p><p><em>Scientific Understanding in Reinforcement Learning<br></em><a href="https://openreview.net/forum?id=eBWwBIFV7T#discussion">Multi-Task Reinforcement Learning Enables Parameter Scaling</a><br>Reginald McLean, Evangelos Chatzaroulas, J K Terry, Isaac Woungang, Nariman Farsad, Pablo Samuel Castro</p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/d7b2c262/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/d7b2c262/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/d7b2c262/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/d7b2c262/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/d7b2c262/transcription" type="text/html"/>
    </item>
    <item>
      <title>Outstanding Paper Award Winners - 1/2 @ RLC 2025</title>
      <itunes:episode>69</itunes:episode>
      <podcast:episode>69</podcast:episode>
      <itunes:title>Outstanding Paper Award Winners - 1/2 @ RLC 2025</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">9c7b6740-13d8-4242-9e3d-0e767620f2f1</guid>
      <link>https://share.transistor.fm/s/4e5cf3a1</link>
      <description>
        <![CDATA[<p>We caught up with the <a href="https://rl-conference.cc/RLC2025Awards.html">RLC Outstanding Paper award winners</a> for your listening pleasure.  </p><p>Recorded on location at <a href="https://rl-conference.cc/">Reinforcement Learning Conference 2025</a>, at University of Alberta, in Edmonton Alberta Canada in August 2025.</p><p><strong>Featured References  <br></strong><br><em>Scientific Understanding in Reinforcement Learning  <br></em><a href="https://openreview.net/forum?id=jKzQ6af2DU#discussion">How Should We Meta-Learn Reinforcement Learning Algorithms?</a>  <br>Alexander David Goldie, Zilin Wang, Jakob Nicolaus Foerster, Shimon Whiteson  </p><p><em>Tooling, Environments, and Evaluation for Reinforcement Learning  <br></em><a href="https://openreview.net/forum?id=0LFJnnMKeT">Syllabus: Portable Curricula for Reinforcement Learning Agents</a>  <br>Ryan Sullivan, Ryan Pégoud, Ameen Ur Rehman, Xinchen Yang, Junyun Huang, Aayush Verma, Nistha Mitra, John P Dickerson  </p><p><em>Resourcefulness in Reinforcement Learning  <br></em><a href="https://openreview.net/forum?id=qRyteMTgn0#discussion">PufferLib 2.0: Reinforcement Learning at 1M steps/s  <br></a>Joseph Suarez  </p><p><em>Theory of Reinforcement Learning  <br></em><a href="https://openreview.net/forum?id=LZAafvwVMa">Deep Reinforcement Learning with Gradient  Eligibility Traces</a>  <br>Esraa Elelimy, Brett Daley, Andrew Patterson, Marlos C. Machado, Adam White, Martha White  </p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>We caught up with the <a href="https://rl-conference.cc/RLC2025Awards.html">RLC Outstanding Paper award winners</a> for your listening pleasure.  </p><p>Recorded on location at <a href="https://rl-conference.cc/">Reinforcement Learning Conference 2025</a>, at University of Alberta, in Edmonton Alberta Canada in August 2025.</p><p><strong>Featured References  <br></strong><br><em>Scientific Understanding in Reinforcement Learning  <br></em><a href="https://openreview.net/forum?id=jKzQ6af2DU#discussion">How Should We Meta-Learn Reinforcement Learning Algorithms?</a>  <br>Alexander David Goldie, Zilin Wang, Jakob Nicolaus Foerster, Shimon Whiteson  </p><p><em>Tooling, Environments, and Evaluation for Reinforcement Learning  <br></em><a href="https://openreview.net/forum?id=0LFJnnMKeT">Syllabus: Portable Curricula for Reinforcement Learning Agents</a>  <br>Ryan Sullivan, Ryan Pégoud, Ameen Ur Rehman, Xinchen Yang, Junyun Huang, Aayush Verma, Nistha Mitra, John P Dickerson  </p><p><em>Resourcefulness in Reinforcement Learning  <br></em><a href="https://openreview.net/forum?id=qRyteMTgn0#discussion">PufferLib 2.0: Reinforcement Learning at 1M steps/s  <br></a>Joseph Suarez  </p><p><em>Theory of Reinforcement Learning  <br></em><a href="https://openreview.net/forum?id=LZAafvwVMa">Deep Reinforcement Learning with Gradient  Eligibility Traces</a>  <br>Esraa Elelimy, Brett Daley, Andrew Patterson, Marlos C. Machado, Adam White, Martha White  </p>]]>
      </content:encoded>
      <pubDate>Fri, 15 Aug 2025 12:13:19 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/4e5cf3a1/144486fa.mp3" length="6548300" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/S105Lg5ztU1fGJcjgqgGMPDNEamspZabYlWfh4tulOU/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9kNDQy/YzNlODEyODZlNTAw/NTg5MGFjZjIwOTlk/NGMyMS53ZWJw.jpg"/>
      <itunes:duration>406</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>We caught up with the <a href="https://rl-conference.cc/RLC2025Awards.html">RLC Outstanding Paper award winners</a> for your listening pleasure.  </p><p>Recorded on location at <a href="https://rl-conference.cc/">Reinforcement Learning Conference 2025</a>, at University of Alberta, in Edmonton Alberta Canada in August 2025.</p><p><strong>Featured References  <br></strong><br><em>Scientific Understanding in Reinforcement Learning  <br></em><a href="https://openreview.net/forum?id=jKzQ6af2DU#discussion">How Should We Meta-Learn Reinforcement Learning Algorithms?</a>  <br>Alexander David Goldie, Zilin Wang, Jakob Nicolaus Foerster, Shimon Whiteson  </p><p><em>Tooling, Environments, and Evaluation for Reinforcement Learning  <br></em><a href="https://openreview.net/forum?id=0LFJnnMKeT">Syllabus: Portable Curricula for Reinforcement Learning Agents</a>  <br>Ryan Sullivan, Ryan Pégoud, Ameen Ur Rehman, Xinchen Yang, Junyun Huang, Aayush Verma, Nistha Mitra, John P Dickerson  </p><p><em>Resourcefulness in Reinforcement Learning  <br></em><a href="https://openreview.net/forum?id=qRyteMTgn0#discussion">PufferLib 2.0: Reinforcement Learning at 1M steps/s  <br></a>Joseph Suarez  </p><p><em>Theory of Reinforcement Learning  <br></em><a href="https://openreview.net/forum?id=LZAafvwVMa">Deep Reinforcement Learning with Gradient  Eligibility Traces</a>  <br>Esraa Elelimy, Brett Daley, Andrew Patterson, Marlos C. Machado, Adam White, Martha White  </p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/4e5cf3a1/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/4e5cf3a1/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/4e5cf3a1/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/4e5cf3a1/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/4e5cf3a1/transcription" type="text/html"/>
    </item>
    <item>
      <title>Thomas Akam on Model-based RL in the Brain</title>
      <itunes:episode>68</itunes:episode>
      <podcast:episode>68</podcast:episode>
      <itunes:title>Thomas Akam on Model-based RL in the Brain</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">47c019af-80c9-4c64-b19a-95103a9094c8</guid>
      <link>https://share.transistor.fm/s/e104ec35</link>
      <description>
        <![CDATA[<p>Prof <a href="https://www.psy.ox.ac.uk/people/thomas-akam">Thomas Akam</a> is a Neuroscientist at the Oxford University Department of Experimental Psychology.  He is a Wellcome Career Development Fellow and Associate Professor at the University of Oxford, and leads the <a href="https://www.psy.ox.ac.uk/research/cognitive-circuits">Cognitive Circuits research group</a>.</p><p><strong>Featured References</strong></p><p><a href="https://github.com/ThomasAkam/talks/blob/main/2025-06-11_RLDM_tutorial.pdf">Brain Architecture for Adaptive Behaviour</a><br>Thomas Akam, RLDM 2025 Tutorial</p><p><strong>Additional References</strong></p><ul><li><a href="https://scholar.google.com/citations?user=b809FRsAAAAJ&amp;hl=en">Thomas Akam on Google Scholar</a></li><li><a href="https://github.com/pyPhotometry">pyPhotometry</a> : Open source, Python based, fiber photometry data acquisition </li><li><a href="https://github.com/pyControl">pyControl</a> : Open source, Python based, behavioural experiment control.</li><li><a href="https://pubmed.ncbi.nlm.nih.gov/16286932/">Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control</a>, Nathaniel D Daw, Yael Niv, Peter Dayan, 2005</li><li><a href="https://psycnet.apa.org/record/1969-04876-001">Further analysis of the hippocampal amnesic syndrome: 14-year follow-up study of H. M.</a>, Milner, B., Corkin, S., &amp; Teuber, H. L., 1968</li><li><a href="https://www.science.org/doi/abs/10.1126/science.1159775">Internally generated cell assembly sequences in the rat hippocampus</a>, Pastalkova E, Itskov V, Amarasingham A, Buzsáki G. Science. 2008</li><li><a href="https://rldm.org/">Multi-disciplinary Conference on Reinforcement Learning and Decision 2025</a></li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Prof <a href="https://www.psy.ox.ac.uk/people/thomas-akam">Thomas Akam</a> is a Neuroscientist at the Oxford University Department of Experimental Psychology.  He is a Wellcome Career Development Fellow and Associate Professor at the University of Oxford, and leads the <a href="https://www.psy.ox.ac.uk/research/cognitive-circuits">Cognitive Circuits research group</a>.</p><p><strong>Featured References</strong></p><p><a href="https://github.com/ThomasAkam/talks/blob/main/2025-06-11_RLDM_tutorial.pdf">Brain Architecture for Adaptive Behaviour</a><br>Thomas Akam, RLDM 2025 Tutorial</p><p><strong>Additional References</strong></p><ul><li><a href="https://scholar.google.com/citations?user=b809FRsAAAAJ&amp;hl=en">Thomas Akam on Google Scholar</a></li><li><a href="https://github.com/pyPhotometry">pyPhotometry</a> : Open source, Python based, fiber photometry data acquisition </li><li><a href="https://github.com/pyControl">pyControl</a> : Open source, Python based, behavioural experiment control.</li><li><a href="https://pubmed.ncbi.nlm.nih.gov/16286932/">Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control</a>, Nathaniel D Daw, Yael Niv, Peter Dayan, 2005</li><li><a href="https://psycnet.apa.org/record/1969-04876-001">Further analysis of the hippocampal amnesic syndrome: 14-year follow-up study of H. M.</a>, Milner, B., Corkin, S., &amp; Teuber, H. L., 1968</li><li><a href="https://www.science.org/doi/abs/10.1126/science.1159775">Internally generated cell assembly sequences in the rat hippocampus</a>, Pastalkova E, Itskov V, Amarasingham A, Buzsáki G. Science. 2008</li><li><a href="https://rldm.org/">Multi-disciplinary Conference on Reinforcement Learning and Decision 2025</a></li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Sun, 03 Aug 2025 23:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/e104ec35/59430419.mp3" length="50070300" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/WjuOIT7wH7PmqLAjUj8mpVMSVDJ1S-MqG5UwpC95sTw/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS85MTg0/OTRiZjY1NTM1MDRk/YzU1MTMwNjdlYzVm/MmI4Mi53ZWJw.jpg"/>
      <itunes:duration>3126</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Prof <a href="https://www.psy.ox.ac.uk/people/thomas-akam">Thomas Akam</a> is a Neuroscientist at the Oxford University Department of Experimental Psychology.  He is a Wellcome Career Development Fellow and Associate Professor at the University of Oxford, and leads the <a href="https://www.psy.ox.ac.uk/research/cognitive-circuits">Cognitive Circuits research group</a>.</p><p><strong>Featured References</strong></p><p><a href="https://github.com/ThomasAkam/talks/blob/main/2025-06-11_RLDM_tutorial.pdf">Brain Architecture for Adaptive Behaviour</a><br>Thomas Akam, RLDM 2025 Tutorial</p><p><strong>Additional References</strong></p><ul><li><a href="https://scholar.google.com/citations?user=b809FRsAAAAJ&amp;hl=en">Thomas Akam on Google Scholar</a></li><li><a href="https://github.com/pyPhotometry">pyPhotometry</a> : Open source, Python based, fiber photometry data acquisition </li><li><a href="https://github.com/pyControl">pyControl</a> : Open source, Python based, behavioural experiment control.</li><li><a href="https://pubmed.ncbi.nlm.nih.gov/16286932/">Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control</a>, Nathaniel D Daw, Yael Niv, Peter Dayan, 2005</li><li><a href="https://psycnet.apa.org/record/1969-04876-001">Further analysis of the hippocampal amnesic syndrome: 14-year follow-up study of H. M.</a>, Milner, B., Corkin, S., &amp; Teuber, H. L., 1968</li><li><a href="https://www.science.org/doi/abs/10.1126/science.1159775">Internally generated cell assembly sequences in the rat hippocampus</a>, Pastalkova E, Itskov V, Amarasingham A, Buzsáki G. Science. 2008</li><li><a href="https://rldm.org/">Multi-disciplinary Conference on Reinforcement Learning and Decision 2025</a></li></ul><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/e104ec35/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/e104ec35/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/e104ec35/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/e104ec35/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/e104ec35/transcription" type="text/html"/>
    </item>
    <item>
      <title>Stefano Albrecht on Multi-Agent RL @ RLDM 2025</title>
      <itunes:episode>67</itunes:episode>
      <podcast:episode>67</podcast:episode>
      <itunes:title>Stefano Albrecht on Multi-Agent RL @ RLDM 2025</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">cbb7cd5f-6010-45d8-b18d-f0d130ec0c5e</guid>
      <link>https://share.transistor.fm/s/c24331ed</link>
      <description>
        <![CDATA[<p><a href="https://agents-lab.org/stefano-albrecht/">Stefano V. Albrecht</a> was previously Associate Professor at the University of Edinburgh, and is currently serving as Director of AI at startup <a href="https://www.deepflow.com/">Deepflow</a>. He is a Program Chair of <a href="https://rldm.org/">RLDM 2025</a> and is co-author of the MIT Press textbook "<a href="https://marl-book.com/">Multi-Agent Reinforcement Learning: Foundations and Modern Approaches</a>".</p><p><br></p><p><strong>Featured References</strong></p><p><br></p><p><a href="https://marl-book.com/">Multi-Agent Reinforcement Learning: Foundations and Modern Approaches</a></p><p>Stefano V. Albrecht,  Filippos Christianos,  Lukas Schäfer</p><p>MIT Press, 2024</p><p><br></p><p><a href="https://rldm.org/">RLDM 2025: Reinforcement Learning and Decision Making Conference</a></p><p>Dublin, Ireland</p><p><br></p><p><a href="https://github.com/uoe-agents/epymarl">EPyMARL: Extended Python MARL framework</a></p><p>https://github.com/uoe-agents/epymarl</p><p><br></p><p><a href="http://arxiv.org/abs/2006.07869">Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks</a></p><p>Georgios Papoudakis and Filippos Christianos and Lukas Schäfer and Stefano V. Albrecht</p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://agents-lab.org/stefano-albrecht/">Stefano V. Albrecht</a> was previously Associate Professor at the University of Edinburgh, and is currently serving as Director of AI at startup <a href="https://www.deepflow.com/">Deepflow</a>. He is a Program Chair of <a href="https://rldm.org/">RLDM 2025</a> and is co-author of the MIT Press textbook "<a href="https://marl-book.com/">Multi-Agent Reinforcement Learning: Foundations and Modern Approaches</a>".</p><p><br></p><p><strong>Featured References</strong></p><p><br></p><p><a href="https://marl-book.com/">Multi-Agent Reinforcement Learning: Foundations and Modern Approaches</a></p><p>Stefano V. Albrecht,  Filippos Christianos,  Lukas Schäfer</p><p>MIT Press, 2024</p><p><br></p><p><a href="https://rldm.org/">RLDM 2025: Reinforcement Learning and Decision Making Conference</a></p><p>Dublin, Ireland</p><p><br></p><p><a href="https://github.com/uoe-agents/epymarl">EPyMARL: Extended Python MARL framework</a></p><p>https://github.com/uoe-agents/epymarl</p><p><br></p><p><a href="http://arxiv.org/abs/2006.07869">Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks</a></p><p>Georgios Papoudakis and Filippos Christianos and Lukas Schäfer and Stefano V. Albrecht</p>]]>
      </content:encoded>
      <pubDate>Tue, 22 Jul 2025 14:29:54 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/c24331ed/15c80345.mp3" length="30344159" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/bGjH1_L9l2Hyb79R6z-a-SlB121LS20JAtbojb0OmAA/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS84MzY0/NmExNzBiNWQwMmMw/NzU0ZGJhMjAzNjZi/ZDAzZS5qcGc.jpg"/>
      <itunes:duration>1894</itunes:duration>
      <itunes:summary>
        <![CDATA[<p><a href="https://agents-lab.org/stefano-albrecht/">Stefano V. Albrecht</a> was previously Associate Professor at the University of Edinburgh, and is currently serving as Director of AI at startup <a href="https://www.deepflow.com/">Deepflow</a>. He is a Program Chair of <a href="https://rldm.org/">RLDM 2025</a> and is co-author of the MIT Press textbook "<a href="https://marl-book.com/">Multi-Agent Reinforcement Learning: Foundations and Modern Approaches</a>".</p><p><br></p><p><strong>Featured References</strong></p><p><br></p><p><a href="https://marl-book.com/">Multi-Agent Reinforcement Learning: Foundations and Modern Approaches</a></p><p>Stefano V. Albrecht,  Filippos Christianos,  Lukas Schäfer</p><p>MIT Press, 2024</p><p><br></p><p><a href="https://rldm.org/">RLDM 2025: Reinforcement Learning and Decision Making Conference</a></p><p>Dublin, Ireland</p><p><br></p><p><a href="https://github.com/uoe-agents/epymarl">EPyMARL: Extended Python MARL framework</a></p><p>https://github.com/uoe-agents/epymarl</p><p><br></p><p><a href="http://arxiv.org/abs/2006.07869">Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks</a></p><p>Georgios Papoudakis and Filippos Christianos and Lukas Schäfer and Stefano V. Albrecht</p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/c24331ed/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/c24331ed/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/c24331ed/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/c24331ed/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/c24331ed/transcription" type="text/html"/>
    </item>
    <item>
      <title>Satinder Singh: The Origin Story of RLDM @ RLDM 2025</title>
      <itunes:episode>66</itunes:episode>
      <podcast:episode>66</podcast:episode>
      <itunes:title>Satinder Singh: The Origin Story of RLDM @ RLDM 2025</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">9acfec53-991b-4861-80d3-e652c46f1176</guid>
      <link>https://share.transistor.fm/s/81c19aca</link>
      <description>
        <![CDATA[<p>Professor Satinder Singh of Google DeepMind and U of Michigan is co-founder of RLDM.  Here he narrates the origin story of the Reinforcement Learning and Decision Making meeting (not conference).</p><p>Recorded on location at Trinity College Dublin, Ireland during RLDM 2025.</p><p><strong>Featured References<br></strong><br><a href="https://rldm.org/">RLDM 2025: Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM)</a><br>June 11-14, 2025 at Trinity College Dublin, Ireland</p><p><a href="https://scholar.google.com/citations?user=8RgDBoEAAAAJ&amp;hl=en">Satinder Singh</a> on Google Scholar</p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Professor Satinder Singh of Google DeepMind and U of Michigan is co-founder of RLDM.  Here he narrates the origin story of the Reinforcement Learning and Decision Making meeting (not conference).</p><p>Recorded on location at Trinity College Dublin, Ireland during RLDM 2025.</p><p><strong>Featured References<br></strong><br><a href="https://rldm.org/">RLDM 2025: Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM)</a><br>June 11-14, 2025 at Trinity College Dublin, Ireland</p><p><a href="https://scholar.google.com/citations?user=8RgDBoEAAAAJ&amp;hl=en">Satinder Singh</a> on Google Scholar</p>]]>
      </content:encoded>
      <pubDate>Wed, 25 Jun 2025 08:48:11 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/81c19aca/2c71c89b.mp3" length="5745881" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/XtLBOGHUAHecc6sfW2tl8BTf2PpJaMG6GFzMc7T01N4/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS83MjM2/MmQ3YjcxMWIwOWI5/ODIxOTgwOTM4M2Vj/NGM2Mi5qcGc.jpg"/>
      <itunes:duration>357</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Professor Satinder Singh of Google DeepMind and U of Michigan is co-founder of RLDM.  Here he narrates the origin story of the Reinforcement Learning and Decision Making meeting (not conference).</p><p>Recorded on location at Trinity College Dublin, Ireland during RLDM 2025.</p><p><strong>Featured References<br></strong><br><a href="https://rldm.org/">RLDM 2025: Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM)</a><br>June 11-14, 2025 at Trinity College Dublin, Ireland</p><p><a href="https://scholar.google.com/citations?user=8RgDBoEAAAAJ&amp;hl=en">Satinder Singh</a> on Google Scholar</p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/81c19aca/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/81c19aca/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/81c19aca/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/81c19aca/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/81c19aca/transcription" type="text/html"/>
    </item>
    <item>
      <title>NeurIPS 2024 - Posters and Hallways 3</title>
      <itunes:episode>65</itunes:episode>
      <podcast:episode>65</podcast:episode>
      <itunes:title>NeurIPS 2024 - Posters and Hallways 3</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">7c5b05d2-68c5-4393-8620-c891f10a12b9</guid>
      <link>https://share.transistor.fm/s/255161e2</link>
      <description>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at NeurIPS 2024 in Vancouver BC Canada.   </p><p><strong>Featuring</strong>  </p><ul><li>Claire Bizon Monroc from Inria: <a href="https://papers.nips.cc/paper_files/paper/2024/hash/f0a4a0ecdc29a0087c0848948e2fce81-Abstract-Datasets_and_Benchmarks_Track.html">WFCRL: A Multi-Agent Reinforcement Learning Benchmark for Wind Farm Control</a>  </li><li>Andrew Wagenmaker from UC Berkeley: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/hash/8fa068ffe59817175d176bd75641fe16-Abstract-Conference.html">Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL</a>  </li><li>Harley Wiltzer from MILA: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/hash/b76bec34ef5e0c0ceedff6edfbefc9f5-Abstract-Conference.html">Foundations of Multivariate Distributional Reinforcement Learning</a>  </li><li>Vinzenz Thoma from ETH AI Center: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/hash/e66309ead63bc1410d2df261a28f602d-Abstract-Conference.html">Contextual Bilevel Reinforcement Learning for Incentive Alignment</a>  </li><li>Haozhe (Tony) Chen &amp; Ang (Leon) Li from Columbia: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/hash/a7f67788f7b4d77fa7cd6887de3dcbe7-Abstract-Datasets_and_Benchmarks_Track.html">QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers</a>  </li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at NeurIPS 2024 in Vancouver BC Canada.   </p><p><strong>Featuring</strong>  </p><ul><li>Claire Bizon Monroc from Inria: <a href="https://papers.nips.cc/paper_files/paper/2024/hash/f0a4a0ecdc29a0087c0848948e2fce81-Abstract-Datasets_and_Benchmarks_Track.html">WFCRL: A Multi-Agent Reinforcement Learning Benchmark for Wind Farm Control</a>  </li><li>Andrew Wagenmaker from UC Berkeley: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/hash/8fa068ffe59817175d176bd75641fe16-Abstract-Conference.html">Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL</a>  </li><li>Harley Wiltzer from MILA: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/hash/b76bec34ef5e0c0ceedff6edfbefc9f5-Abstract-Conference.html">Foundations of Multivariate Distributional Reinforcement Learning</a>  </li><li>Vinzenz Thoma from ETH AI Center: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/hash/e66309ead63bc1410d2df261a28f602d-Abstract-Conference.html">Contextual Bilevel Reinforcement Learning for Incentive Alignment</a>  </li><li>Haozhe (Tony) Chen &amp; Ang (Leon) Li from Columbia: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/hash/a7f67788f7b4d77fa7cd6887de3dcbe7-Abstract-Datasets_and_Benchmarks_Track.html">QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers</a>  </li></ul>]]>
      </content:encoded>
      <pubDate>Sun, 09 Mar 2025 14:25:53 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/255161e2/93762dc6.mp3" length="9716741" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/cD7otNzkeH82X-0qnRf2RpQgKZoZajOnpOXOEULqcV4/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS85MDQz/MTRmOTVmMDdiMmE4/N2M4NDVjZjhiM2I2/MmIyMi53ZWJw.jpg"/>
      <itunes:duration>601</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at NeurIPS 2024 in Vancouver BC Canada.   </p><p><strong>Featuring</strong>  </p><ul><li>Claire Bizon Monroc from Inria: <a href="https://papers.nips.cc/paper_files/paper/2024/hash/f0a4a0ecdc29a0087c0848948e2fce81-Abstract-Datasets_and_Benchmarks_Track.html">WFCRL: A Multi-Agent Reinforcement Learning Benchmark for Wind Farm Control</a>  </li><li>Andrew Wagenmaker from UC Berkeley: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/hash/8fa068ffe59817175d176bd75641fe16-Abstract-Conference.html">Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL</a>  </li><li>Harley Wiltzer from MILA: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/hash/b76bec34ef5e0c0ceedff6edfbefc9f5-Abstract-Conference.html">Foundations of Multivariate Distributional Reinforcement Learning</a>  </li><li>Vinzenz Thoma from ETH AI Center: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/hash/e66309ead63bc1410d2df261a28f602d-Abstract-Conference.html">Contextual Bilevel Reinforcement Learning for Incentive Alignment</a>  </li><li>Haozhe (Tony) Chen &amp; Ang (Leon) Li from Columbia: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/hash/a7f67788f7b4d77fa7cd6887de3dcbe7-Abstract-Datasets_and_Benchmarks_Track.html">QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers</a>  </li></ul>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/255161e2/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/255161e2/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/255161e2/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/255161e2/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/255161e2/transcription" type="text/html"/>
    </item>
    <item>
      <title>NeurIPS 2024 - Posters and Hallways 2</title>
      <itunes:episode>64</itunes:episode>
      <podcast:episode>64</podcast:episode>
      <itunes:title>NeurIPS 2024 - Posters and Hallways 2</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">cdb9a0f2-ff33-4f0a-a01e-984809f2f9e5</guid>
      <link>https://share.transistor.fm/s/ce850fa1</link>
      <description>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at NeurIPS 2024 in Vancouver BC Canada.   </p><p><strong>Featuring  </strong></p><ul><li>Jonathan Cook from University of Oxford: <a href="https://arxiv.org/abs/2406.00392">Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning</a>  </li><li>Yifei Zhou from Berkeley AI Research: <a href="https://arxiv.org/abs/2406.11896">DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning</a>  </li><li>Rory Young from University of Glasgow: <a href="https://arxiv.org/abs/2410.10674">Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach</a>  </li><li>Glen Berseth from MILA: <a href="https://arxiv.org/abs/2409.04792">Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn</a>  </li><li>Alexander Rutherford from University of Oxford: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/file/5aee125f052c90e326dcf6f380df94f6-Paper-Datasets_and_Benchmarks_Track.pdf">JaxMARL: Multi-Agent RL Environments and Algorithms in JAX</a>  <p></p></li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at NeurIPS 2024 in Vancouver BC Canada.   </p><p><strong>Featuring  </strong></p><ul><li>Jonathan Cook from University of Oxford: <a href="https://arxiv.org/abs/2406.00392">Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning</a>  </li><li>Yifei Zhou from Berkeley AI Research: <a href="https://arxiv.org/abs/2406.11896">DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning</a>  </li><li>Rory Young from University of Glasgow: <a href="https://arxiv.org/abs/2410.10674">Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach</a>  </li><li>Glen Berseth from MILA: <a href="https://arxiv.org/abs/2409.04792">Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn</a>  </li><li>Alexander Rutherford from University of Oxford: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/file/5aee125f052c90e326dcf6f380df94f6-Paper-Datasets_and_Benchmarks_Track.pdf">JaxMARL: Multi-Agent RL Environments and Algorithms in JAX</a>  <p></p></li></ul>]]>
      </content:encoded>
      <pubDate>Tue, 04 Mar 2025 16:03:16 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/ce850fa1/38ea6fa4.mp3" length="8573844" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/3L53zQuWmiUPN6cnRlmAUykhcf4njhD4JkgkF9RkY-8/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8wMmE1/ZDEyZDY2NmU3Njc0/NjA2NTMzMDI5NTky/MjY1YS53ZWJw.jpg"/>
      <itunes:duration>528</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at NeurIPS 2024 in Vancouver BC Canada.   </p><p><strong>Featuring  </strong></p><ul><li>Jonathan Cook from University of Oxford: <a href="https://arxiv.org/abs/2406.00392">Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning</a>  </li><li>Yifei Zhou from Berkeley AI Research: <a href="https://arxiv.org/abs/2406.11896">DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning</a>  </li><li>Rory Young from University of Glasgow: <a href="https://arxiv.org/abs/2410.10674">Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach</a>  </li><li>Glen Berseth from MILA: <a href="https://arxiv.org/abs/2409.04792">Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn</a>  </li><li>Alexander Rutherford from University of Oxford: <a href="https://proceedings.neurips.cc/paper_files/paper/2024/file/5aee125f052c90e326dcf6f380df94f6-Paper-Datasets_and_Benchmarks_Track.pdf">JaxMARL: Multi-Agent RL Environments and Algorithms in JAX</a>  <p></p></li></ul>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/ce850fa1/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/ce850fa1/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/ce850fa1/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/ce850fa1/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/ce850fa1/transcription" type="text/html"/>
    </item>
    <item>
      <title>NeurIPS 2024 - Posters and Hallways 1</title>
      <itunes:episode>63</itunes:episode>
      <podcast:episode>63</podcast:episode>
      <itunes:title>NeurIPS 2024 - Posters and Hallways 1</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">5ac13395-4832-4111-b09e-00f589c21ed9</guid>
      <link>https://share.transistor.fm/s/2ee1f287</link>
      <description>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at NeurIPS 2024 in Vancouver BC Canada.   </p><p><strong>Featuring  </strong></p><ul><li>Jiaheng Hu of University of Texas: <a href="https://openreview.net/forum?id=ePOBcWfNFC&amp;referrer=%5Bthe%20profile%20of%20Roberto%20Mart%C3%ADn-Mart%C3%ADn%5D(%2Fprofile%3Fid%3D~Roberto_Mart%C3%ADn-Mart%C3%ADn1)">Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning</a>  </li><li>Skander Moalla of EPFL: <a href="https://arxiv.org/abs/2405.00662">No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO</a>  </li><li>Adil Zouitine of IRT Saint Exupery/Hugging Face : <a href="https://arxiv.org/abs/2406.08395">Time-Constrained Robust MDPs</a>  </li><li>Soumyendu Sarkar of HP Labs : <a href="https://arxiv.org/abs/2408.07841">SustainDC: Benchmarking for Sustainable Data Center Control</a>  </li><li>Matteo Bettini of Cambridge University: <a href="https://arxiv.org/abs/2312.01472">BenchMARL: Benchmarking Multi-Agent Reinforcement Learning</a>  </li><li>Michael Bowling of U Alberta : <a href="https://openreview.net/forum?id=k6ZHvF1vkg">Beyond Optimism: Exploration With Partially Observable Rewards</a>  </li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at NeurIPS 2024 in Vancouver BC Canada.   </p><p><strong>Featuring  </strong></p><ul><li>Jiaheng Hu of University of Texas: <a href="https://openreview.net/forum?id=ePOBcWfNFC&amp;referrer=%5Bthe%20profile%20of%20Roberto%20Mart%C3%ADn-Mart%C3%ADn%5D(%2Fprofile%3Fid%3D~Roberto_Mart%C3%ADn-Mart%C3%ADn1)">Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning</a>  </li><li>Skander Moalla of EPFL: <a href="https://arxiv.org/abs/2405.00662">No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO</a>  </li><li>Adil Zouitine of IRT Saint Exupery/Hugging Face : <a href="https://arxiv.org/abs/2406.08395">Time-Constrained Robust MDPs</a>  </li><li>Soumyendu Sarkar of HP Labs : <a href="https://arxiv.org/abs/2408.07841">SustainDC: Benchmarking for Sustainable Data Center Control</a>  </li><li>Matteo Bettini of Cambridge University: <a href="https://arxiv.org/abs/2312.01472">BenchMARL: Benchmarking Multi-Agent Reinforcement Learning</a>  </li><li>Michael Bowling of U Alberta : <a href="https://openreview.net/forum?id=k6ZHvF1vkg">Beyond Optimism: Exploration With Partially Observable Rewards</a>  </li></ul>]]>
      </content:encoded>
      <pubDate>Sun, 02 Mar 2025 20:53:38 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/2ee1f287/388cfa92.mp3" length="9282702" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/7Nk3ubNy5AB4HQ9FFfHS2pYrU8QSZcBH_pnOrxo75Ig/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lZWM1/M2RjNjY4NmIwNzA0/OGQ0MDI2NDUwYjc5/NzYwZC53ZWJw.jpg"/>
      <itunes:duration>572</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at NeurIPS 2024 in Vancouver BC Canada.   </p><p><strong>Featuring  </strong></p><ul><li>Jiaheng Hu of University of Texas: <a href="https://openreview.net/forum?id=ePOBcWfNFC&amp;referrer=%5Bthe%20profile%20of%20Roberto%20Mart%C3%ADn-Mart%C3%ADn%5D(%2Fprofile%3Fid%3D~Roberto_Mart%C3%ADn-Mart%C3%ADn1)">Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning</a>  </li><li>Skander Moalla of EPFL: <a href="https://arxiv.org/abs/2405.00662">No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO</a>  </li><li>Adil Zouitine of IRT Saint Exupery/Hugging Face : <a href="https://arxiv.org/abs/2406.08395">Time-Constrained Robust MDPs</a>  </li><li>Soumyendu Sarkar of HP Labs : <a href="https://arxiv.org/abs/2408.07841">SustainDC: Benchmarking for Sustainable Data Center Control</a>  </li><li>Matteo Bettini of Cambridge University: <a href="https://arxiv.org/abs/2312.01472">BenchMARL: Benchmarking Multi-Agent Reinforcement Learning</a>  </li><li>Michael Bowling of U Alberta : <a href="https://openreview.net/forum?id=k6ZHvF1vkg">Beyond Optimism: Exploration With Partially Observable Rewards</a>  </li></ul>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/2ee1f287/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/2ee1f287/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/2ee1f287/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/2ee1f287/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/2ee1f287/transcription" type="text/html"/>
    </item>
    <item>
      <title>Abhishek Naik on Continuing RL &amp; Average Reward</title>
      <itunes:episode>62</itunes:episode>
      <podcast:episode>62</podcast:episode>
      <itunes:title>Abhishek Naik on Continuing RL &amp; Average Reward</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">15cf60ef-085b-4614-aac7-4e9c70572908</guid>
      <link>https://share.transistor.fm/s/778f5a6c</link>
      <description>
        <![CDATA[<p><a href="https://abhisheknaik96.github.io/">Abhishek Naik</a> was a student at University of Alberta and Alberta Machine Intelligence Institute, and he just finished his PhD in reinforcement learning, working with Rich Sutton.  Now he is a postdoc fellow at the National Research Council of Canada, where he does AI research on Space applications.  </p><p><strong>Featured References  </strong></p><p><a href="https://era.library.ualberta.ca/items/42307739-a774-4d6b-b1a3-de9fbc949575">Reinforcement Learning for Continuing Problems Using Average Reward</a> <br>Abhishek Naik Ph.D. dissertation 2024  </p><p><a href="https://arxiv.org/abs/2405.09999">Reward Centering</a> <br>Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton 2024   </p><p><a href="https://arxiv.org/abs/2006.16318">Learning and Planning in Average-Reward Markov Decision Processes</a> <br>Yi Wan, Abhishek Naik, Richard S. Sutton 2020  </p><p><a href="https://arxiv.org/abs/1910.02140">Discounted Reinforcement Learning Is Not an Optimization Problem</a> <br>Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S. Sutton 2019  </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://www.nature.com/articles/s41593-024-01705-4">Explaining dopamine through prediction errors and beyond</a>, Gershman et al 2024 (proposes Differential-TD-like learning mechanism in the brain around Box 4)  </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://abhisheknaik96.github.io/">Abhishek Naik</a> was a student at University of Alberta and Alberta Machine Intelligence Institute, and he just finished his PhD in reinforcement learning, working with Rich Sutton.  Now he is a postdoc fellow at the National Research Council of Canada, where he does AI research on Space applications.  </p><p><strong>Featured References  </strong></p><p><a href="https://era.library.ualberta.ca/items/42307739-a774-4d6b-b1a3-de9fbc949575">Reinforcement Learning for Continuing Problems Using Average Reward</a> <br>Abhishek Naik Ph.D. dissertation 2024  </p><p><a href="https://arxiv.org/abs/2405.09999">Reward Centering</a> <br>Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton 2024   </p><p><a href="https://arxiv.org/abs/2006.16318">Learning and Planning in Average-Reward Markov Decision Processes</a> <br>Yi Wan, Abhishek Naik, Richard S. Sutton 2020  </p><p><a href="https://arxiv.org/abs/1910.02140">Discounted Reinforcement Learning Is Not an Optimization Problem</a> <br>Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S. Sutton 2019  </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://www.nature.com/articles/s41593-024-01705-4">Explaining dopamine through prediction errors and beyond</a>, Gershman et al 2024 (proposes Differential-TD-like learning mechanism in the brain around Box 4)  </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Sun, 09 Feb 2025 20:49:32 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/778f5a6c/24e90c0b.mp3" length="78435185" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/CNFLn5sRRPIviBp3ciVwRQeml0_yQ4BiQmukfVtzcFI/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8wNjE1/ZGUwNjgwNWI0ZWE5/OGQyNDQ5MjE4NTU1/MDEzZS5qcGVn.jpg"/>
      <itunes:duration>4900</itunes:duration>
      <itunes:summary>
        <![CDATA[<p><a href="https://abhisheknaik96.github.io/">Abhishek Naik</a> was a student at University of Alberta and Alberta Machine Intelligence Institute, and he just finished his PhD in reinforcement learning, working with Rich Sutton.  Now he is a postdoc fellow at the National Research Council of Canada, where he does AI research on Space applications.  </p><p><strong>Featured References  </strong></p><p><a href="https://era.library.ualberta.ca/items/42307739-a774-4d6b-b1a3-de9fbc949575">Reinforcement Learning for Continuing Problems Using Average Reward</a> <br>Abhishek Naik Ph.D. dissertation 2024  </p><p><a href="https://arxiv.org/abs/2405.09999">Reward Centering</a> <br>Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton 2024   </p><p><a href="https://arxiv.org/abs/2006.16318">Learning and Planning in Average-Reward Markov Decision Processes</a> <br>Yi Wan, Abhishek Naik, Richard S. Sutton 2020  </p><p><a href="https://arxiv.org/abs/1910.02140">Discounted Reinforcement Learning Is Not an Optimization Problem</a> <br>Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S. Sutton 2019  </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://www.nature.com/articles/s41593-024-01705-4">Explaining dopamine through prediction errors and beyond</a>, Gershman et al 2024 (proposes Differential-TD-like learning mechanism in the brain around Box 4)  </li></ul><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/778f5a6c/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/778f5a6c/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/778f5a6c/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/778f5a6c/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/778f5a6c/transcription" type="text/html"/>
    </item>
    <item>
      <title>Neurips 2024 RL meetup Hot takes: What sucks about RL?</title>
      <itunes:episode>61</itunes:episode>
      <podcast:episode>61</podcast:episode>
      <itunes:title>Neurips 2024 RL meetup Hot takes: What sucks about RL?</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">5a92049a-a1bb-4b94-b164-f73625edfbc2</guid>
      <link>https://share.transistor.fm/s/80102b36</link>
      <description>
        <![CDATA[<p>What do RL researchers complain about after hours at the bar?  In this "Hot takes" episode, we find out!  </p><p>Recorded at The Pearl in downtown Vancouver, during the RL meetup after a day of Neurips 2024.  </p><p>Special thanks to "David Beckham" for the inspiration :)  </p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>What do RL researchers complain about after hours at the bar?  In this "Hot takes" episode, we find out!  </p><p>Recorded at The Pearl in downtown Vancouver, during the RL meetup after a day of Neurips 2024.  </p><p>Special thanks to "David Beckham" for the inspiration :)  </p>]]>
      </content:encoded>
      <pubDate>Mon, 23 Dec 2024 00:12:15 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/80102b36/400aa2f9.mp3" length="17109628" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/JwzcFHXUjUkPZV9x-Qn1b42EpDIzqbnGKtm7DUapCgo/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8wNjI1/OWUzZmU5MWE5YzJl/M2Q3Njc2YzRmMDdk/OTUyYS5qcGc.jpg"/>
      <itunes:duration>1065</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>What do RL researchers complain about after hours at the bar?  In this "Hot takes" episode, we find out!  </p><p>Recorded at The Pearl in downtown Vancouver, during the RL meetup after a day of Neurips 2024.  </p><p>Special thanks to "David Beckham" for the inspiration :)  </p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/80102b36/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/80102b36/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/80102b36/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/80102b36/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/80102b36/transcription" type="text/html"/>
    </item>
    <item>
      <title>RLC 2024 - Posters and Hallways 5</title>
      <itunes:episode>60</itunes:episode>
      <podcast:episode>60</podcast:episode>
      <itunes:title>RLC 2024 - Posters and Hallways 5</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">9d9cc967-6e0e-41dc-bc50-7c4e0bb749e6</guid>
      <link>https://share.transistor.fm/s/c50a9c08</link>
      <description>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at <a href="https://rl-conference.cc/">RLC 2024</a> in Amherst MA.   </p><p>Featuring:  </p><ul><li>0:01 <a href="https://scholar.google.com/citations?user=LFNtNMgAAAAJ&amp;hl=en">David Radke</a> of the Chicago Blackhawks NHL on RL for professional sports  </li><li>0:56 <a href="https://scholar.google.com/citations?hl=en&amp;user=9gV4DRsAAAAJ">Abhishek Naik</a> from the National Research Council on Continuing RL and Average Reward  </li><li>2:42 <a href="https://scholar.google.co.id/citations?user=ScR5fBYAAAAJ">Daphne Cornelisse</a> from NYU on Autonomous Driving and Multi-Agent RL  </li><li>08:58 <a href="https://scholar.google.co.id/citations?hl=tr&amp;user=e-GEqxoAAAAJ">Shray Bansal</a> from Georgia Tech on Cognitive Bias for Human AI Ad hoc Teamwork  </li><li>10:21 <a href="https://scholar.google.co.id/citations?user=e-GEqxoAAAAJ">Claas Voelcker</a> from University of Toronto on Can we hop in general?  </li><li>11:23 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=iqkTonoAAAAJ">Brent Venable</a> from The Institute for Human &amp; Machine Cognition on Cooperative information dissemination  <p><br></p></li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at <a href="https://rl-conference.cc/">RLC 2024</a> in Amherst MA.   </p><p>Featuring:  </p><ul><li>0:01 <a href="https://scholar.google.com/citations?user=LFNtNMgAAAAJ&amp;hl=en">David Radke</a> of the Chicago Blackhawks NHL on RL for professional sports  </li><li>0:56 <a href="https://scholar.google.com/citations?hl=en&amp;user=9gV4DRsAAAAJ">Abhishek Naik</a> from the National Research Council on Continuing RL and Average Reward  </li><li>2:42 <a href="https://scholar.google.co.id/citations?user=ScR5fBYAAAAJ">Daphne Cornelisse</a> from NYU on Autonomous Driving and Multi-Agent RL  </li><li>08:58 <a href="https://scholar.google.co.id/citations?hl=tr&amp;user=e-GEqxoAAAAJ">Shray Bansal</a> from Georgia Tech on Cognitive Bias for Human AI Ad hoc Teamwork  </li><li>10:21 <a href="https://scholar.google.co.id/citations?user=e-GEqxoAAAAJ">Claas Voelcker</a> from University of Toronto on Can we hop in general?  </li><li>11:23 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=iqkTonoAAAAJ">Brent Venable</a> from The Institute for Human &amp; Machine Cognition on Cooperative information dissemination  <p><br></p></li></ul>]]>
      </content:encoded>
      <pubDate>Fri, 20 Sep 2024 07:40:04 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/c50a9c08/5b83ed71.mp3" length="19225338" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/NY8XeR88y5FWmQQMGgOX1Bx-QnTTQz1cCepnJX81hC4/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS83OGUz/OTU4NmQ4MGMzYzZk/MmMyZDA4NjNkNGUy/ZjdiNy5wbmc.jpg"/>
      <itunes:duration>797</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at <a href="https://rl-conference.cc/">RLC 2024</a> in Amherst MA.   </p><p>Featuring:  </p><ul><li>0:01 <a href="https://scholar.google.com/citations?user=LFNtNMgAAAAJ&amp;hl=en">David Radke</a> of the Chicago Blackhawks NHL on RL for professional sports  </li><li>0:56 <a href="https://scholar.google.com/citations?hl=en&amp;user=9gV4DRsAAAAJ">Abhishek Naik</a> from the National Research Council on Continuing RL and Average Reward  </li><li>2:42 <a href="https://scholar.google.co.id/citations?user=ScR5fBYAAAAJ">Daphne Cornelisse</a> from NYU on Autonomous Driving and Multi-Agent RL  </li><li>08:58 <a href="https://scholar.google.co.id/citations?hl=tr&amp;user=e-GEqxoAAAAJ">Shray Bansal</a> from Georgia Tech on Cognitive Bias for Human AI Ad hoc Teamwork  </li><li>10:21 <a href="https://scholar.google.co.id/citations?user=e-GEqxoAAAAJ">Claas Voelcker</a> from University of Toronto on Can we hop in general?  </li><li>11:23 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=iqkTonoAAAAJ">Brent Venable</a> from The Institute for Human &amp; Machine Cognition on Cooperative information dissemination  <p><br></p></li></ul>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/c50a9c08/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/c50a9c08/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/c50a9c08/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/c50a9c08/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/c50a9c08/transcription" type="text/html"/>
    </item>
    <item>
      <title>RLC 2024 - Posters and Hallways 4</title>
      <itunes:episode>59</itunes:episode>
      <podcast:episode>59</podcast:episode>
      <itunes:title>RLC 2024 - Posters and Hallways 4</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">00f93e18-1666-401c-a0f0-c4f000fb228d</guid>
      <link>https://share.transistor.fm/s/2d4d125a</link>
      <description>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at <a href="https://rl-conference.cc/">RLC 2024</a> in Amherst MA.   </p><p>Featuring:  </p><ul><li>0:01  <a href="https://scholar.google.ca/citations?user=lvBJlmwAAAAJ&amp;hl=en&amp;oi=ao">David Abel</a> from DeepMind on 3 Dogmas of RL  </li><li>0:55 <a href="https://scholar.google.com/citations?user=Q06Rh6oAAAAJ&amp;hl=en">Kevin Wang</a> from Brown on learning variable depth search for MCTS  </li><li>2:17 <a href="https://scholar.google.com/citations?user=HZ2INsEAAAAJ&amp;hl=en">Ashwin Kumar</a> from Washington University in St Louis on fairness in resource allocation  </li><li>3:36 <a href="https://scholar.google.com/citations?user=Gjjj8IQAAAAJ&amp;hl=en">Prabhat Nagarajan</a> from UAlberta on Value overestimation  </li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at <a href="https://rl-conference.cc/">RLC 2024</a> in Amherst MA.   </p><p>Featuring:  </p><ul><li>0:01  <a href="https://scholar.google.ca/citations?user=lvBJlmwAAAAJ&amp;hl=en&amp;oi=ao">David Abel</a> from DeepMind on 3 Dogmas of RL  </li><li>0:55 <a href="https://scholar.google.com/citations?user=Q06Rh6oAAAAJ&amp;hl=en">Kevin Wang</a> from Brown on learning variable depth search for MCTS  </li><li>2:17 <a href="https://scholar.google.com/citations?user=HZ2INsEAAAAJ&amp;hl=en">Ashwin Kumar</a> from Washington University in St Louis on fairness in resource allocation  </li><li>3:36 <a href="https://scholar.google.com/citations?user=Gjjj8IQAAAAJ&amp;hl=en">Prabhat Nagarajan</a> from UAlberta on Value overestimation  </li></ul>]]>
      </content:encoded>
      <pubDate>Wed, 18 Sep 2024 17:54:24 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/2d4d125a/cb809679.mp3" length="7103563" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/prcq9bik2ZDajWI86URVBHdrEXgjF_qznsXpGveWgMY/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9hZDBm/NmU1NGQ0YWM4NTM1/MmRkNTRkMzQxZDE5/M2JhYi5wbmc.jpg"/>
      <itunes:duration>292</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at <a href="https://rl-conference.cc/">RLC 2024</a> in Amherst MA.   </p><p>Featuring:  </p><ul><li>0:01  <a href="https://scholar.google.ca/citations?user=lvBJlmwAAAAJ&amp;hl=en&amp;oi=ao">David Abel</a> from DeepMind on 3 Dogmas of RL  </li><li>0:55 <a href="https://scholar.google.com/citations?user=Q06Rh6oAAAAJ&amp;hl=en">Kevin Wang</a> from Brown on learning variable depth search for MCTS  </li><li>2:17 <a href="https://scholar.google.com/citations?user=HZ2INsEAAAAJ&amp;hl=en">Ashwin Kumar</a> from Washington University in St Louis on fairness in resource allocation  </li><li>3:36 <a href="https://scholar.google.com/citations?user=Gjjj8IQAAAAJ&amp;hl=en">Prabhat Nagarajan</a> from UAlberta on Value overestimation  </li></ul>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/2d4d125a/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/2d4d125a/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/2d4d125a/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/2d4d125a/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/2d4d125a/transcription" type="text/html"/>
    </item>
    <item>
      <title>RLC 2024 - Posters and Hallways 3</title>
      <itunes:episode>58</itunes:episode>
      <podcast:episode>58</podcast:episode>
      <itunes:title>RLC 2024 - Posters and Hallways 3</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">c3d58621-ef7d-4bed-a793-89f87fe4c021</guid>
      <link>https://share.transistor.fm/s/f4e655c3</link>
      <description>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.  </p><p>Featuring:  </p><ul><li>0:01 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=NCPKYKUAAAAJ">Kris De Asis</a> from Openmind on Time Discretization  </li><li>2:23 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=55WuTVQAAAAJ">Anna Hakhverdyan</a> from U of Alberta on Online Hyperparameters  </li><li>3:59 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=gzHbYVQAAAAJ">Dilip Arumugam</a> from Princeton on Information Theory and Exploration  </li><li>5:04 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=MeNbzgIAAAAJ">Micah Carroll</a> from UC Berkeley on Changing preferences and AI alignment  </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.  </p><p>Featuring:  </p><ul><li>0:01 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=NCPKYKUAAAAJ">Kris De Asis</a> from Openmind on Time Discretization  </li><li>2:23 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=55WuTVQAAAAJ">Anna Hakhverdyan</a> from U of Alberta on Online Hyperparameters  </li><li>3:59 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=gzHbYVQAAAAJ">Dilip Arumugam</a> from Princeton on Information Theory and Exploration  </li><li>5:04 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=MeNbzgIAAAAJ">Micah Carroll</a> from UC Berkeley on Changing preferences and AI alignment  </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Wed, 18 Sep 2024 07:16:11 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/f4e655c3/094ab9b0.mp3" length="9733292" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/JsNmzkrCcSe8GSeQ5Fgw_JGRVcF0-Rnt2rex2P3VIv4/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9mNWE0/MDdhN2NiNjg3OGNm/OWQxZGNhM2I1M2Fh/ZGViNC5wbmc.jpg"/>
      <itunes:duration>403</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.  </p><p>Featuring:  </p><ul><li>0:01 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=NCPKYKUAAAAJ">Kris De Asis</a> from Openmind on Time Discretization  </li><li>2:23 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=55WuTVQAAAAJ">Anna Hakhverdyan</a> from U of Alberta on Online Hyperparameters  </li><li>3:59 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=gzHbYVQAAAAJ">Dilip Arumugam</a> from Princeton on Information Theory and Exploration  </li><li>5:04 <a href="https://scholar.google.co.id/citations?hl=en&amp;user=MeNbzgIAAAAJ">Micah Carroll</a> from UC Berkeley on Changing preferences and AI alignment  </li></ul><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/f4e655c3/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/f4e655c3/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/f4e655c3/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/f4e655c3/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/f4e655c3/transcription" type="text/html"/>
    </item>
    <item>
      <title>RLC 2024 - Posters and Hallways 2</title>
      <itunes:episode>57</itunes:episode>
      <podcast:episode>57</podcast:episode>
      <itunes:title>RLC 2024 - Posters and Hallways 2</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">f08e8c7c-1174-43f8-9b3d-340fe243d19d</guid>
      <link>https://share.transistor.fm/s/d257bea6</link>
      <description>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.  </p><p>Featuring:  </p><ul><li>0:01 Hector Kohler from Centre Inria de l'Université de Lille with "<a href="https://openreview.net/forum?id=zafp5CwoTq">Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning</a>"  </li><li>2:29 Quentin Delfosse from TU Darmstadt on "<a href="https://openreview.net/forum?id=t4BjjTfxFa">Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents</a>"  </li><li>4:15 Sonja Johnson-Yu from Harvard on "<a href="https://openreview.net/forum?id=FX7YtfEYj8">Understanding biological active sensing behaviors by interpreting learned artificial agent policies</a>"  </li><li>6:42 Jannis Blüml from TU Darmstadt on "<a href="https://openreview.net/forum?id=HVd5e0OS9R">OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments</a>"  </li><li>8:20 Cameron Allen from UC Berkeley on "<a href="https://openreview.net/forum?id=lkIRFglmTp">Resolving Partial Observability in Decision Processes via the Lambda Discrepancy</a>"  </li><li>9:48 James Staley from Tufts on "<a href="https://rlj.cs.umass.edu/2024/papers/Paper236.html">Agent-Centric Human Demonstrations Train World Models</a>"  </li><li>14:54 Jonathan Li from Rensselaer Polytechnic Institute  </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.  </p><p>Featuring:  </p><ul><li>0:01 Hector Kohler from Centre Inria de l'Université de Lille with "<a href="https://openreview.net/forum?id=zafp5CwoTq">Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning</a>"  </li><li>2:29 Quentin Delfosse from TU Darmstadt on "<a href="https://openreview.net/forum?id=t4BjjTfxFa">Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents</a>"  </li><li>4:15 Sonja Johnson-Yu from Harvard on "<a href="https://openreview.net/forum?id=FX7YtfEYj8">Understanding biological active sensing behaviors by interpreting learned artificial agent policies</a>"  </li><li>6:42 Jannis Blüml from TU Darmstadt on "<a href="https://openreview.net/forum?id=HVd5e0OS9R">OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments</a>"  </li><li>8:20 Cameron Allen from UC Berkeley on "<a href="https://openreview.net/forum?id=lkIRFglmTp">Resolving Partial Observability in Decision Processes via the Lambda Discrepancy</a>"  </li><li>9:48 James Staley from Tufts on "<a href="https://rlj.cs.umass.edu/2024/papers/Paper236.html">Agent-Centric Human Demonstrations Train World Models</a>"  </li><li>14:54 Jonathan Li from Rensselaer Polytechnic Institute  </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Sun, 15 Sep 2024 19:34:51 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/d257bea6/ca653dab.mp3" length="22960843" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/yzYFaOeuqRWEWwovivYrrDRnhp1fzbfTz0YHP7auk-0/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS83Y2Qz/YmE2ZmQ3MGJlOGQ4/MDQ2NzI3NDU2NDQz/ODA3MS5wbmc.jpg"/>
      <itunes:duration>952</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.  </p><p>Featuring:  </p><ul><li>0:01 Hector Kohler from Centre Inria de l'Université de Lille with "<a href="https://openreview.net/forum?id=zafp5CwoTq">Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning</a>"  </li><li>2:29 Quentin Delfosse from TU Darmstadt on "<a href="https://openreview.net/forum?id=t4BjjTfxFa">Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents</a>"  </li><li>4:15 Sonja Johnson-Yu from Harvard on "<a href="https://openreview.net/forum?id=FX7YtfEYj8">Understanding biological active sensing behaviors by interpreting learned artificial agent policies</a>"  </li><li>6:42 Jannis Blüml from TU Darmstadt on "<a href="https://openreview.net/forum?id=HVd5e0OS9R">OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments</a>"  </li><li>8:20 Cameron Allen from UC Berkeley on "<a href="https://openreview.net/forum?id=lkIRFglmTp">Resolving Partial Observability in Decision Processes via the Lambda Discrepancy</a>"  </li><li>9:48 James Staley from Tufts on "<a href="https://rlj.cs.umass.edu/2024/papers/Paper236.html">Agent-Centric Human Demonstrations Train World Models</a>"  </li><li>14:54 Jonathan Li from Rensselaer Polytechnic Institute  </li></ul><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/d257bea6/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/d257bea6/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/d257bea6/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/d257bea6/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/d257bea6/transcription" type="text/html"/>
    </item>
    <item>
      <title>RLC 2024 - Posters and Hallways 1</title>
      <itunes:episode>56</itunes:episode>
      <podcast:episode>56</podcast:episode>
      <itunes:title>RLC 2024 - Posters and Hallways 1</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">3fd09f34-54c2-49b5-9431-fba1170d7ca4</guid>
      <link>https://share.transistor.fm/s/2601fd96</link>
      <description>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.  </p><p>Featuring:  </p><ul><li>0:01 <a href="https://scholar.google.co.id/citations?user=zPJUEzsAAAAJ&amp;hl=tr">Ann Huang</a> from Harvard on <a href="https://openreview.net/forum?id=SbbpTtB6B4">Learning Dynamics and the Geometry of Neural Dynamics in Recurrent Neural Controllers</a>  </li><li>1:37 <a href="https://scholar.google.co.id/citations?hl=tr&amp;user=RN7YYrEAAAAJ">Jannis Blüml</a> from TU Darmstadt on <a href="https://openreview.net/forum?id=Th5OOmiHVo">HackAtari: Atari Learning Environments for Robust and Continual Reinforcement Learning</a>  </li><li>3:13 <a href="https://scholar.google.co.id/citations?hl=tr&amp;user=BIp5PGEAAAAJ">Benjamin Fuhrer</a> from NVIDIA on <a href="https://openreview.net/forum?id=MIJrYoharR">Gradient Boosting Reinforcement Learning</a>  </li><li>3:54 <a href="https://scholar.google.co.id/citations?hl=tr&amp;user=3hpaqHAAAAAJ">Paul Festor</a> from Imperial College London on <a href="https://openreview.net/forum?id=QnzNm9feL4&amp;referrer=%5Bthe%20profile%20of%20Paul%20Festor%5D(%2Fprofile%3Fid%3D~Paul_Festor1)">Evaluating the impact of explainable RL on physician decision-making in high-fidelity simulations: insights from eye-tracking metrics</a>  </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.  </p><p>Featuring:  </p><ul><li>0:01 <a href="https://scholar.google.co.id/citations?user=zPJUEzsAAAAJ&amp;hl=tr">Ann Huang</a> from Harvard on <a href="https://openreview.net/forum?id=SbbpTtB6B4">Learning Dynamics and the Geometry of Neural Dynamics in Recurrent Neural Controllers</a>  </li><li>1:37 <a href="https://scholar.google.co.id/citations?hl=tr&amp;user=RN7YYrEAAAAJ">Jannis Blüml</a> from TU Darmstadt on <a href="https://openreview.net/forum?id=Th5OOmiHVo">HackAtari: Atari Learning Environments for Robust and Continual Reinforcement Learning</a>  </li><li>3:13 <a href="https://scholar.google.co.id/citations?hl=tr&amp;user=BIp5PGEAAAAJ">Benjamin Fuhrer</a> from NVIDIA on <a href="https://openreview.net/forum?id=MIJrYoharR">Gradient Boosting Reinforcement Learning</a>  </li><li>3:54 <a href="https://scholar.google.co.id/citations?hl=tr&amp;user=3hpaqHAAAAAJ">Paul Festor</a> from Imperial College London on <a href="https://openreview.net/forum?id=QnzNm9feL4&amp;referrer=%5Bthe%20profile%20of%20Paul%20Festor%5D(%2Fprofile%3Fid%3D~Paul_Festor1)">Evaluating the impact of explainable RL on physician decision-making in high-fidelity simulations: insights from eye-tracking metrics</a>  </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Tue, 10 Sep 2024 15:35:56 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/2601fd96/b4194b35.mp3" length="8355450" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/VT_EoafE8EmOHUK_qkKKK6wy4axxHXYWzH4PJ2uhgKk/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8yYmQ5/MDY0NjZiYTRiZDFi/YmIyM2VjY2ViMDVl/NjAwZS5wbmc.jpg"/>
      <itunes:duration>346</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.  </p><p>Featuring:  </p><ul><li>0:01 <a href="https://scholar.google.co.id/citations?user=zPJUEzsAAAAJ&amp;hl=tr">Ann Huang</a> from Harvard on <a href="https://openreview.net/forum?id=SbbpTtB6B4">Learning Dynamics and the Geometry of Neural Dynamics in Recurrent Neural Controllers</a>  </li><li>1:37 <a href="https://scholar.google.co.id/citations?hl=tr&amp;user=RN7YYrEAAAAJ">Jannis Blüml</a> from TU Darmstadt on <a href="https://openreview.net/forum?id=Th5OOmiHVo">HackAtari: Atari Learning Environments for Robust and Continual Reinforcement Learning</a>  </li><li>3:13 <a href="https://scholar.google.co.id/citations?hl=tr&amp;user=BIp5PGEAAAAJ">Benjamin Fuhrer</a> from NVIDIA on <a href="https://openreview.net/forum?id=MIJrYoharR">Gradient Boosting Reinforcement Learning</a>  </li><li>3:54 <a href="https://scholar.google.co.id/citations?hl=tr&amp;user=3hpaqHAAAAAJ">Paul Festor</a> from Imperial College London on <a href="https://openreview.net/forum?id=QnzNm9feL4&amp;referrer=%5Bthe%20profile%20of%20Paul%20Festor%5D(%2Fprofile%3Fid%3D~Paul_Festor1)">Evaluating the impact of explainable RL on physician decision-making in high-fidelity simulations: insights from eye-tracking metrics</a>  </li></ul><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/2601fd96/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/2601fd96/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/2601fd96/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/2601fd96/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/2601fd96/transcription" type="text/html"/>
    </item>
    <item>
      <title>Finale Doshi-Velez on RL for Healthcare @ RCL 2024</title>
      <itunes:episode>55</itunes:episode>
      <podcast:episode>55</podcast:episode>
      <itunes:title>Finale Doshi-Velez on RL for Healthcare @ RCL 2024</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">3d4ce51a-d427-44d5-a007-25e960c0f0a3</guid>
      <link>https://share.transistor.fm/s/440e0810</link>
      <description>
        <![CDATA[<p><a href="https://finale.seas.harvard.edu/">Finale Doshi-Velez</a> is a Professor at the Harvard Paulson School of Engineering and Applied Sciences.  </p><p>This off-the-cuff interview was recorded at UMass Amherst during the workshop day of RL Conference on August 9th 2024.   </p><p><em>Host notes: I've been a fan of some of Prof Doshi-Velez' past work on clinical RL and hoped to feature her for some time now, so I jumped at the chance to get a few minutes of her thoughts -- even though you can tell I was not prepared and a bit flustered tbh.  Thanks to Prof Doshi-Velez for taking a moment for this, and I hope to cross paths in future for a more in depth interview. </em></p><p><strong>References  </strong></p><ul><li><a href="https://finale.seas.harvard.edu/">Finale Doshi-Velez</a> Homepage @ Harvard  </li><li><a href="https://scholar.google.ca/citations?user=hwQtFB0AAAAJ&amp;hl=en">Finale Doshi-Velez</a> on Google Scholar  </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://finale.seas.harvard.edu/">Finale Doshi-Velez</a> is a Professor at the Harvard Paulson School of Engineering and Applied Sciences.  </p><p>This off-the-cuff interview was recorded at UMass Amherst during the workshop day of RL Conference on August 9th 2024.   </p><p><em>Host notes: I've been a fan of some of Prof Doshi-Velez' past work on clinical RL and hoped to feature her for some time now, so I jumped at the chance to get a few minutes of her thoughts -- even though you can tell I was not prepared and a bit flustered tbh.  Thanks to Prof Doshi-Velez for taking a moment for this, and I hope to cross paths in future for a more in depth interview. </em></p><p><strong>References  </strong></p><ul><li><a href="https://finale.seas.harvard.edu/">Finale Doshi-Velez</a> Homepage @ Harvard  </li><li><a href="https://scholar.google.ca/citations?user=hwQtFB0AAAAJ&amp;hl=en">Finale Doshi-Velez</a> on Google Scholar  </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 02 Sep 2024 01:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/440e0810/839d5a22.mp3" length="10948299" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/DmHhXI5jC02P34s7U6wymdgCEFijC3KBZdrxEBsXx1k/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8zYmQw/YzU4MDMxOWE1MjY3/MGYyYjMyZTE4Njcx/NzM3Zi5qcGc.jpg"/>
      <itunes:duration>455</itunes:duration>
      <itunes:summary>
        <![CDATA[<p><a href="https://finale.seas.harvard.edu/">Finale Doshi-Velez</a> is a Professor at the Harvard Paulson School of Engineering and Applied Sciences.  </p><p>This off-the-cuff interview was recorded at UMass Amherst during the workshop day of RL Conference on August 9th 2024.   </p><p><em>Host notes: I've been a fan of some of Prof Doshi-Velez' past work on clinical RL and hoped to feature her for some time now, so I jumped at the chance to get a few minutes of her thoughts -- even though you can tell I was not prepared and a bit flustered tbh.  Thanks to Prof Doshi-Velez for taking a moment for this, and I hope to cross paths in future for a more in depth interview. </em></p><p><strong>References  </strong></p><ul><li><a href="https://finale.seas.harvard.edu/">Finale Doshi-Velez</a> Homepage @ Harvard  </li><li><a href="https://scholar.google.ca/citations?user=hwQtFB0AAAAJ&amp;hl=en">Finale Doshi-Velez</a> on Google Scholar  </li></ul><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/440e0810/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/440e0810/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/440e0810/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/440e0810/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/440e0810/transcription" type="text/html"/>
    </item>
    <item>
      <title>David Silver 2 - Discussion after Keynote @ RCL 2024</title>
      <itunes:episode>54</itunes:episode>
      <podcast:episode>54</podcast:episode>
      <itunes:title>David Silver 2 - Discussion after Keynote @ RCL 2024</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">6e34de39-a005-42ec-93b9-9567389ea8f7</guid>
      <link>https://share.transistor.fm/s/28980ea4</link>
      <description>
        <![CDATA[<p>Thanks to Professor Silver for permission to record this discussion after his RLC 2024 keynote lecture.   </p><p>Recorded at UMass Amherst during RCL 2024.</p><p><em>Due to the live recording environment, audio quality varies.  We publish this audio in its raw form to preserve the authenticity and immediacy of the discussion.   <br></em><br><strong>References  </strong></p><ul><li><a href="https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/">AlphaProof</a> announcement on DeepMind's blog</li><li><a href="https://arxiv.org/abs/2007.08794">Discovering Reinforcement Learning Algorithms</a>, Oh et al  -- His keynote at RLC 2024 referred to more recent update to this work, yet to be published  </li><li><a href="https://rl-conference.cc/">Reinforcement Learning Conference</a> 2024  </li><li><a href="https://scholar.google.ca/citations?user=-8DNE4UAAAAJ&amp;hl=en">David Silver</a> on Google Scholar  </li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Thanks to Professor Silver for permission to record this discussion after his RLC 2024 keynote lecture.   </p><p>Recorded at UMass Amherst during RCL 2024.</p><p><em>Due to the live recording environment, audio quality varies.  We publish this audio in its raw form to preserve the authenticity and immediacy of the discussion.   <br></em><br><strong>References  </strong></p><ul><li><a href="https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/">AlphaProof</a> announcement on DeepMind's blog</li><li><a href="https://arxiv.org/abs/2007.08794">Discovering Reinforcement Learning Algorithms</a>, Oh et al  -- His keynote at RLC 2024 referred to more recent update to this work, yet to be published  </li><li><a href="https://rl-conference.cc/">Reinforcement Learning Conference</a> 2024  </li><li><a href="https://scholar.google.ca/citations?user=-8DNE4UAAAAJ&amp;hl=en">David Silver</a> on Google Scholar  </li></ul>]]>
      </content:encoded>
      <pubDate>Wed, 28 Aug 2024 01:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/28980ea4/8fc4c0b3.mp3" length="23483271" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/m4JFezdkLAhOFHG9TMm2RjBkqVlrip2cqtO29V6m1XI/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8yODc4/NTliY2FiOWQyMzE3/YWNlMDMwNGI5MTVj/ODEyMi5qcGc.jpg"/>
      <itunes:duration>977</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Thanks to Professor Silver for permission to record this discussion after his RLC 2024 keynote lecture.   </p><p>Recorded at UMass Amherst during RCL 2024.</p><p><em>Due to the live recording environment, audio quality varies.  We publish this audio in its raw form to preserve the authenticity and immediacy of the discussion.   <br></em><br><strong>References  </strong></p><ul><li><a href="https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/">AlphaProof</a> announcement on DeepMind's blog</li><li><a href="https://arxiv.org/abs/2007.08794">Discovering Reinforcement Learning Algorithms</a>, Oh et al  -- His keynote at RLC 2024 referred to more recent update to this work, yet to be published  </li><li><a href="https://rl-conference.cc/">Reinforcement Learning Conference</a> 2024  </li><li><a href="https://scholar.google.ca/citations?user=-8DNE4UAAAAJ&amp;hl=en">David Silver</a> on Google Scholar  </li></ul>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/28980ea4/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/28980ea4/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/28980ea4/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/28980ea4/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/28980ea4/transcription" type="text/html"/>
    </item>
    <item>
      <title>David Silver @ RCL 2024</title>
      <itunes:episode>53</itunes:episode>
      <podcast:episode>53</podcast:episode>
      <itunes:title>David Silver @ RCL 2024</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">a56bfa89-6e2b-45a9-bb39-90191d0ba22a</guid>
      <link>https://share.transistor.fm/s/2453a9a3</link>
      <description>
        <![CDATA[<p>David Silver is a principal research scientist at DeepMind and a professor at University College London.  </p><p>This interview was recorded at UMass Amherst during RLC 2024.   </p><p><strong>References  </strong></p><ul><li><a href="https://arxiv.org/abs/2007.08794">Discovering Reinforcement Learning Algorithms</a>, Oh et al  -- His keynote at RLC 2024 referred to more recent update to this work, yet to be published  </li><li><a href="https://arxiv.org/abs/1712.01815">Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm</a>, Silver et al 2017 -- the AlphaZero algo was used   in his recent work on AlphaProof  </li><li><a href="https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/">AlphaProof</a> on the DeepMind blog </li><li><a href="https://deepmind.google/technologies/alphafold/">AlphaFold</a> on the DeepMind blog </li><li><a href="https://rl-conference.cc/">Reinforcement Learning Conference</a> 2024  </li><li><a href="https://scholar.google.ca/citations?user=-8DNE4UAAAAJ&amp;hl=en">David Silver</a> on Google Scholar  </li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>David Silver is a principal research scientist at DeepMind and a professor at University College London.  </p><p>This interview was recorded at UMass Amherst during RLC 2024.   </p><p><strong>References  </strong></p><ul><li><a href="https://arxiv.org/abs/2007.08794">Discovering Reinforcement Learning Algorithms</a>, Oh et al  -- His keynote at RLC 2024 referred to more recent update to this work, yet to be published  </li><li><a href="https://arxiv.org/abs/1712.01815">Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm</a>, Silver et al 2017 -- the AlphaZero algo was used   in his recent work on AlphaProof  </li><li><a href="https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/">AlphaProof</a> on the DeepMind blog </li><li><a href="https://deepmind.google/technologies/alphafold/">AlphaFold</a> on the DeepMind blog </li><li><a href="https://rl-conference.cc/">Reinforcement Learning Conference</a> 2024  </li><li><a href="https://scholar.google.ca/citations?user=-8DNE4UAAAAJ&amp;hl=en">David Silver</a> on Google Scholar  </li></ul>]]>
      </content:encoded>
      <pubDate>Mon, 26 Aug 2024 07:56:06 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/2453a9a3/5ff4ce24.mp3" length="16531047" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/yT7vrsyizUewOkqbKnj3r4qDs5Q3-tOx2OJ5Ra-Qt6M/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8wNmYz/YWRjZDRkZTE1ZTA5/NjdjODQ3OWQ2YzUx/MjI0Mi5qcGc.jpg"/>
      <itunes:duration>687</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>David Silver is a principal research scientist at DeepMind and a professor at University College London.  </p><p>This interview was recorded at UMass Amherst during RLC 2024.   </p><p><strong>References  </strong></p><ul><li><a href="https://arxiv.org/abs/2007.08794">Discovering Reinforcement Learning Algorithms</a>, Oh et al  -- His keynote at RLC 2024 referred to more recent update to this work, yet to be published  </li><li><a href="https://arxiv.org/abs/1712.01815">Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm</a>, Silver et al 2017 -- the AlphaZero algo was used   in his recent work on AlphaProof  </li><li><a href="https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/">AlphaProof</a> on the DeepMind blog </li><li><a href="https://deepmind.google/technologies/alphafold/">AlphaFold</a> on the DeepMind blog </li><li><a href="https://rl-conference.cc/">Reinforcement Learning Conference</a> 2024  </li><li><a href="https://scholar.google.ca/citations?user=-8DNE4UAAAAJ&amp;hl=en">David Silver</a> on Google Scholar  </li></ul>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/2453a9a3/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/2453a9a3/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/2453a9a3/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/2453a9a3/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/2453a9a3/transcription" type="text/html"/>
    </item>
    <item>
      <title>Vincent Moens on TorchRL</title>
      <itunes:episode>52</itunes:episode>
      <podcast:episode>52</podcast:episode>
      <itunes:title>Vincent Moens on TorchRL</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">864eb2b3-a75a-4914-b93f-f325253f2df5</guid>
      <link>https://share.transistor.fm/s/f305f206</link>
      <description>
        <![CDATA[<p>Dr. Vincent Moens is an Applied Machine Learning Research Scientist at Meta, and an author of TorchRL and TensorDict in pytorch.  </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/2306.00577">TorchRL: A data-driven decision-making library for PyTorch</a> <br>Albert Bou, Matteo Bettini, Sebastian Dittert, Vikash Kumar, Shagun Sodhani, Xiaomeng Yang, Gianni De Fabritiis, Vincent Moens  <strong></strong></p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://pytorch.org/rl/">TorchRL on github</a>  </li><li><a href="https://pytorch.org/tensordict/">TensorDict Documentation</a>  </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Dr. Vincent Moens is an Applied Machine Learning Research Scientist at Meta, and an author of TorchRL and TensorDict in pytorch.  </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/2306.00577">TorchRL: A data-driven decision-making library for PyTorch</a> <br>Albert Bou, Matteo Bettini, Sebastian Dittert, Vikash Kumar, Shagun Sodhani, Xiaomeng Yang, Gianni De Fabritiis, Vincent Moens  <strong></strong></p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://pytorch.org/rl/">TorchRL on github</a>  </li><li><a href="https://pytorch.org/tensordict/">TensorDict Documentation</a>  </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 08 Apr 2024 12:45:12 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/f305f206/6e54a0bf.mp3" length="38661336" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/83OMBFTdbvFnoZ0fY1tWfsgZMeVESxfR-I5y7l0_MTI/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS80ZDQ5/MjllNzU4MzFlMGU2/NmRlMzE0ZjJhYzhm/NWU1ZS5qcGVn.jpg"/>
      <itunes:duration>2414</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Dr. Vincent Moens is an Applied Machine Learning Research Scientist at Meta, and an author of TorchRL and TensorDict in pytorch.  </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/2306.00577">TorchRL: A data-driven decision-making library for PyTorch</a> <br>Albert Bou, Matteo Bettini, Sebastian Dittert, Vikash Kumar, Shagun Sodhani, Xiaomeng Yang, Gianni De Fabritiis, Vincent Moens  <strong></strong></p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://pytorch.org/rl/">TorchRL on github</a>  </li><li><a href="https://pytorch.org/tensordict/">TensorDict Documentation</a>  </li></ul><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/f305f206/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/f305f206/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/f305f206/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/f305f206/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/f305f206/transcription" type="text/html"/>
    </item>
    <item>
      <title>Arash Ahmadian on Rethinking RLHF</title>
      <itunes:episode>51</itunes:episode>
      <podcast:episode>51</podcast:episode>
      <itunes:title>Arash Ahmadian on Rethinking RLHF</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">439bda84-fc79-478d-af12-ff8a5c770b7a</guid>
      <link>https://share.transistor.fm/s/e54fabe1</link>
      <description>
        <![CDATA[<p>Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.</p><p><strong>Featured Reference</strong></p><p><a href="https://arxiv.org/abs/2402.14740">Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs</a></p><p>Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker</p><p><br></p><p><strong>Additional References</strong></p><ul><li><a href="https://arxiv.org/abs/2401.10020">Self-Rewarding Language Models</a>, Yuan et al 2024 </li><li><a href="http://incompleteideas.net/book/the-book-2nd.html">Reinforcement Learning: An Introduction</a>, Sutton and Barto 1992</li><li><a href="https://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf">Learning from Delayed Rewards</a>, Chris Watkins 1989</li><li><a href="https://citeseerx.ist.psu.edu/document?repid=rep1&amp;type=pdf&amp;doi=e526a65b9ef5afb6639fd3a062f4045d24448232">Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning</a>, Williams 1992</li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.</p><p><strong>Featured Reference</strong></p><p><a href="https://arxiv.org/abs/2402.14740">Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs</a></p><p>Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker</p><p><br></p><p><strong>Additional References</strong></p><ul><li><a href="https://arxiv.org/abs/2401.10020">Self-Rewarding Language Models</a>, Yuan et al 2024 </li><li><a href="http://incompleteideas.net/book/the-book-2nd.html">Reinforcement Learning: An Introduction</a>, Sutton and Barto 1992</li><li><a href="https://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf">Learning from Delayed Rewards</a>, Chris Watkins 1989</li><li><a href="https://citeseerx.ist.psu.edu/document?repid=rep1&amp;type=pdf&amp;doi=e526a65b9ef5afb6639fd3a062f4045d24448232">Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning</a>, Williams 1992</li></ul>]]>
      </content:encoded>
      <pubDate>Sun, 24 Mar 2024 23:46:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/e54fabe1/a1990c97.mp3" length="32190827" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/wWSRsWBqq29mfXzQXC65wTVCeuDrbdqQkah23bM8pbs/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzE4MDY5NzMv/MTcxMTIyNjA1Mi1h/cnR3b3JrLmpwZw.jpg"/>
      <itunes:duration>2010</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.</p><p><strong>Featured Reference</strong></p><p><a href="https://arxiv.org/abs/2402.14740">Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs</a></p><p>Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker</p><p><br></p><p><strong>Additional References</strong></p><ul><li><a href="https://arxiv.org/abs/2401.10020">Self-Rewarding Language Models</a>, Yuan et al 2024 </li><li><a href="http://incompleteideas.net/book/the-book-2nd.html">Reinforcement Learning: An Introduction</a>, Sutton and Barto 1992</li><li><a href="https://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf">Learning from Delayed Rewards</a>, Chris Watkins 1989</li><li><a href="https://citeseerx.ist.psu.edu/document?repid=rep1&amp;type=pdf&amp;doi=e526a65b9ef5afb6639fd3a062f4045d24448232">Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning</a>, Williams 1992</li></ul>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/e54fabe1/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/e54fabe1/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/e54fabe1/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/e54fabe1/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/e54fabe1/transcription" type="text/html"/>
    </item>
    <item>
      <title>Glen Berseth on RL Conference</title>
      <itunes:episode>50</itunes:episode>
      <podcast:episode>50</podcast:episode>
      <itunes:title>Glen Berseth on RL Conference</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">90c4f549-2551-4ec9-bd1a-353e70234b36</guid>
      <link>https://share.transistor.fm/s/6be8918e</link>
      <description>
        <![CDATA[<p>Glen Berseth is an assistant professor at the Université de Montréal, a core academic member of the Mila - Quebec AI Institute, a Canada CIFAR AI chair, member l'Institute Courtios, and co-director of the Robotics and Embodied AI Lab (REAL).  </p><p><strong>Featured Links  </strong></p><p><a href="https://rl-conference.cc/">Reinforcement Learning Conference</a>  <strong></strong></p><p><a href="https://arxiv.org/html/2401.11237v1">Closing the Gap between TD Learning and Supervised Learning--A Generalisation Point of View</a> <br>Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach<strong><br></strong><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Glen Berseth is an assistant professor at the Université de Montréal, a core academic member of the Mila - Quebec AI Institute, a Canada CIFAR AI chair, member l'Institute Courtios, and co-director of the Robotics and Embodied AI Lab (REAL).  </p><p><strong>Featured Links  </strong></p><p><a href="https://rl-conference.cc/">Reinforcement Learning Conference</a>  <strong></strong></p><p><a href="https://arxiv.org/html/2401.11237v1">Closing the Gap between TD Learning and Supervised Learning--A Generalisation Point of View</a> <br>Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach<strong><br></strong><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 11 Mar 2024 09:00:33 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/6be8918e/6e963f3c.mp3" length="20830138" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/q62pg3WEQTJBl6YWnzhOKuapNrLHhYO3noyoJ6VwHpw/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzE3ODA1NTIv/MTcwOTkzODUzMS1h/cnR3b3JrLmpwZw.jpg"/>
      <itunes:duration>1298</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Glen Berseth is an assistant professor at the Université de Montréal, a core academic member of the Mila - Quebec AI Institute, a Canada CIFAR AI chair, member l'Institute Courtios, and co-director of the Robotics and Embodied AI Lab (REAL).  </p><p><strong>Featured Links  </strong></p><p><a href="https://rl-conference.cc/">Reinforcement Learning Conference</a>  <strong></strong></p><p><a href="https://arxiv.org/html/2401.11237v1">Closing the Gap between TD Learning and Supervised Learning--A Generalisation Point of View</a> <br>Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach<strong><br></strong><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/6be8918e/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/6be8918e/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/6be8918e/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/6be8918e/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/6be8918e/transcription" type="text/html"/>
    </item>
    <item>
      <title>Ian Osband</title>
      <itunes:episode>49</itunes:episode>
      <podcast:episode>49</podcast:episode>
      <itunes:title>Ian Osband</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">ed7f7018-fe41-4fb1-a70b-0173d802c0f7</guid>
      <link>https://share.transistor.fm/s/f818c3cb</link>
      <description>
        <![CDATA[<p>Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty.  </p><p>We spoke about: </p><p>- Information theory and RL </p><p>- Exploration, epistemic uncertainty and joint predictions </p><p>- Epistemic Neural Networks and scaling to LLMs </p><p><br><strong>Featured References  </strong></p><p><a href="https://arxiv.org/abs/2103.04047">Reinforcement Learning, Bit by Bit</a>  <br>Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen  </p><p><a href="https://arxiv.org/abs/2107.09224">From Predictions to Decisions: The Importance of Joint Predictive Distributions</a> </p><p>Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy  </p><p> </p><p><a href="https://arxiv.org/abs/2107.08924%20">Epistemic Neural Networks</a> </p><p>Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy  </p><p><br><a href="https://arxiv.org/abs/2302.09205%20">Approximate Thompson Sampling via Epistemic Neural Networks</a> </p><p>Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy </p><p>  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://www.youtube.com/watch?v=ck4GixLs4ZQ">Thesis defence</a>, Ian Osband </li><li><a href="https://iosband.github.io/research.html">Homepage</a>, Ian Osband </li><li><a href="https://www.youtube.com/watch?v=j8an0dKcX4A">Epistemic Neural Networks</a> at Stanford RL Forum </li><li><a href="https://arxiv.org/abs/1908.03568">Behaviour Suite for Reinforcement Learning</a>, Osband et al 2019 </li><li><a href="https://arxiv.org/abs/2402.00396">Efficient Exploration for LLMs</a>, Dwaracherla et al 2024 </li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty.  </p><p>We spoke about: </p><p>- Information theory and RL </p><p>- Exploration, epistemic uncertainty and joint predictions </p><p>- Epistemic Neural Networks and scaling to LLMs </p><p><br><strong>Featured References  </strong></p><p><a href="https://arxiv.org/abs/2103.04047">Reinforcement Learning, Bit by Bit</a>  <br>Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen  </p><p><a href="https://arxiv.org/abs/2107.09224">From Predictions to Decisions: The Importance of Joint Predictive Distributions</a> </p><p>Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy  </p><p> </p><p><a href="https://arxiv.org/abs/2107.08924%20">Epistemic Neural Networks</a> </p><p>Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy  </p><p><br><a href="https://arxiv.org/abs/2302.09205%20">Approximate Thompson Sampling via Epistemic Neural Networks</a> </p><p>Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy </p><p>  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://www.youtube.com/watch?v=ck4GixLs4ZQ">Thesis defence</a>, Ian Osband </li><li><a href="https://iosband.github.io/research.html">Homepage</a>, Ian Osband </li><li><a href="https://www.youtube.com/watch?v=j8an0dKcX4A">Epistemic Neural Networks</a> at Stanford RL Forum </li><li><a href="https://arxiv.org/abs/1908.03568">Behaviour Suite for Reinforcement Learning</a>, Osband et al 2019 </li><li><a href="https://arxiv.org/abs/2402.00396">Efficient Exploration for LLMs</a>, Dwaracherla et al 2024 </li></ul>]]>
      </content:encoded>
      <pubDate>Thu, 07 Mar 2024 11:24:48 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/f818c3cb/b64a588e.mp3" length="65745777" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/dJIAcNBLu8rZLLSma1b9AzY9Gvo1-90eWinnt_dT58w/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzE3NDk0MTgv/MTcwODY0MTUyNi1h/cnR3b3JrLmpwZw.jpg"/>
      <itunes:duration>4106</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty.  </p><p>We spoke about: </p><p>- Information theory and RL </p><p>- Exploration, epistemic uncertainty and joint predictions </p><p>- Epistemic Neural Networks and scaling to LLMs </p><p><br><strong>Featured References  </strong></p><p><a href="https://arxiv.org/abs/2103.04047">Reinforcement Learning, Bit by Bit</a>  <br>Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen  </p><p><a href="https://arxiv.org/abs/2107.09224">From Predictions to Decisions: The Importance of Joint Predictive Distributions</a> </p><p>Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy  </p><p> </p><p><a href="https://arxiv.org/abs/2107.08924%20">Epistemic Neural Networks</a> </p><p>Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy  </p><p><br><a href="https://arxiv.org/abs/2302.09205%20">Approximate Thompson Sampling via Epistemic Neural Networks</a> </p><p>Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy </p><p>  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://www.youtube.com/watch?v=ck4GixLs4ZQ">Thesis defence</a>, Ian Osband </li><li><a href="https://iosband.github.io/research.html">Homepage</a>, Ian Osband </li><li><a href="https://www.youtube.com/watch?v=j8an0dKcX4A">Epistemic Neural Networks</a> at Stanford RL Forum </li><li><a href="https://arxiv.org/abs/1908.03568">Behaviour Suite for Reinforcement Learning</a>, Osband et al 2019 </li><li><a href="https://arxiv.org/abs/2402.00396">Efficient Exploration for LLMs</a>, Dwaracherla et al 2024 </li></ul>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/f818c3cb/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/f818c3cb/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/f818c3cb/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/f818c3cb/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/f818c3cb/transcription" type="text/html"/>
    </item>
    <item>
      <title>Sharath Chandra Raparthy</title>
      <itunes:episode>48</itunes:episode>
      <podcast:episode>48</podcast:episode>
      <itunes:title>Sharath Chandra Raparthy</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">a573770f-3180-4a02-a34d-45a1ecf838f0</guid>
      <link>https://share.transistor.fm/s/f4b1c7d2</link>
      <description>
        <![CDATA[<p>Sharath Chandra Raparthy on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more!  </p><p>Sharath Chandra Raparthy is an AI Resident at FAIR at Meta, and did his Master's at Mila.  </p><p><br><strong>Featured Reference  <br></strong><br><a href="https://arxiv.org/abs/2312.03801">Generalization to New Sequential Decision Making Tasks with In-Context Learning   <br></a>Sharath Chandra Raparthy , Eric Hambro, Robert Kirk , Mikael Henaff, , Roberta Raileanu  <br> <br><strong>Additional References  </strong></p><ul><li><a href="https://sharathraparthy.github.io/">Sharath Chandra Raparthy</a> Homepage  </li><li><a href="https://arxiv.org/abs/2301.07608">Human-Timescale Adaptation in an Open-Ended Task Space</a>, Adaptive Agent Team 2023</li><li><a href="https://arxiv.org/abs/2205.05055">Data Distributional Properties Drive Emergent In-Context Learning in Transformers</a>, Chan et al 2022  </li><li><a href="https://arxiv.org/abs/2106.01345">Decision Transformer: Reinforcement Learning via Sequence Modeling</a>, Chen et al  2021</li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Sharath Chandra Raparthy on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more!  </p><p>Sharath Chandra Raparthy is an AI Resident at FAIR at Meta, and did his Master's at Mila.  </p><p><br><strong>Featured Reference  <br></strong><br><a href="https://arxiv.org/abs/2312.03801">Generalization to New Sequential Decision Making Tasks with In-Context Learning   <br></a>Sharath Chandra Raparthy , Eric Hambro, Robert Kirk , Mikael Henaff, , Roberta Raileanu  <br> <br><strong>Additional References  </strong></p><ul><li><a href="https://sharathraparthy.github.io/">Sharath Chandra Raparthy</a> Homepage  </li><li><a href="https://arxiv.org/abs/2301.07608">Human-Timescale Adaptation in an Open-Ended Task Space</a>, Adaptive Agent Team 2023</li><li><a href="https://arxiv.org/abs/2205.05055">Data Distributional Properties Drive Emergent In-Context Learning in Transformers</a>, Chan et al 2022  </li><li><a href="https://arxiv.org/abs/2106.01345">Decision Transformer: Reinforcement Learning via Sequence Modeling</a>, Chen et al  2021</li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Sun, 11 Feb 2024 17:43:56 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/f4b1c7d2/002fe73e.mp3" length="39071725" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/5ZSTnzTODpVw-nIW60UJKkZGNqVjv6dv41WXx1tnaNw/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzE2OTcyMDkv/MTcwNTc5NTUwNi1h/cnR3b3JrLmpwZw.jpg"/>
      <itunes:duration>2441</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Sharath Chandra Raparthy on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more!  </p><p>Sharath Chandra Raparthy is an AI Resident at FAIR at Meta, and did his Master's at Mila.  </p><p><br><strong>Featured Reference  <br></strong><br><a href="https://arxiv.org/abs/2312.03801">Generalization to New Sequential Decision Making Tasks with In-Context Learning   <br></a>Sharath Chandra Raparthy , Eric Hambro, Robert Kirk , Mikael Henaff, , Roberta Raileanu  <br> <br><strong>Additional References  </strong></p><ul><li><a href="https://sharathraparthy.github.io/">Sharath Chandra Raparthy</a> Homepage  </li><li><a href="https://arxiv.org/abs/2301.07608">Human-Timescale Adaptation in an Open-Ended Task Space</a>, Adaptive Agent Team 2023</li><li><a href="https://arxiv.org/abs/2205.05055">Data Distributional Properties Drive Emergent In-Context Learning in Transformers</a>, Chan et al 2022  </li><li><a href="https://arxiv.org/abs/2106.01345">Decision Transformer: Reinforcement Learning via Sequence Modeling</a>, Chen et al  2021</li></ul><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/f4b1c7d2/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/f4b1c7d2/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/f4b1c7d2/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/f4b1c7d2/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/f4b1c7d2/transcription" type="text/html"/>
    </item>
    <item>
      <title>Pierluca D'Oro and Martin Klissarov</title>
      <itunes:episode>47</itunes:episode>
      <podcast:episode>47</podcast:episode>
      <itunes:title>Pierluca D'Oro and Martin Klissarov</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">067040b6-a420-47b8-8a6f-b21c8e1e4c1b</guid>
      <link>https://share.transistor.fm/s/bc10889c</link>
      <description>
        <![CDATA[<p>Pierluca D'Oro and Martin Klissarov on Motif and RLAIF, Noisy Neighborhoods and Return Landscapes, and more!  </p><p>Pierluca D'Oro is PhD student at Mila and visiting researcher at Meta.</p><p><br>Martin Klissarov is a PhD student at Mila and McGill and research scientist intern at Meta.  </p><p><br><strong>Featured References  </strong></p><p><a href="https://arxiv.org/abs/2310.00166"><strong>Motif: Intrinsic Motivation from Artificial Intelligence Feedback  <br></strong></a>Martin Klissarov*, Pierluca D'Oro*, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff  <br><strong><br></strong><a href="https://arxiv.org/abs/2309.14597%20"><strong>Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control  <br></strong></a>Nate Rahn*, Pierluca D'Oro*, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare  </p><p><a href="https://www.scienceofaiagents.com/p/to-keep-doing-rl-research-stop-calling"><strong>To keep doing RL research, stop calling yourself an RL researcher</strong></a><strong> <br></strong>Pierluca D'Oro </p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Pierluca D'Oro and Martin Klissarov on Motif and RLAIF, Noisy Neighborhoods and Return Landscapes, and more!  </p><p>Pierluca D'Oro is PhD student at Mila and visiting researcher at Meta.</p><p><br>Martin Klissarov is a PhD student at Mila and McGill and research scientist intern at Meta.  </p><p><br><strong>Featured References  </strong></p><p><a href="https://arxiv.org/abs/2310.00166"><strong>Motif: Intrinsic Motivation from Artificial Intelligence Feedback  <br></strong></a>Martin Klissarov*, Pierluca D'Oro*, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff  <br><strong><br></strong><a href="https://arxiv.org/abs/2309.14597%20"><strong>Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control  <br></strong></a>Nate Rahn*, Pierluca D'Oro*, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare  </p><p><a href="https://www.scienceofaiagents.com/p/to-keep-doing-rl-research-stop-calling"><strong>To keep doing RL research, stop calling yourself an RL researcher</strong></a><strong> <br></strong>Pierluca D'Oro </p>]]>
      </content:encoded>
      <pubDate>Mon, 13 Nov 2023 09:32:13 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/bc10889c/c788513e.mp3" length="82713777" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/fFIztxVGViLUAlIbFIj-IOT1rYcdRArSeIOjxXjw7hM/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzE1OTY3NDAv/MTY5OTgzNTAzNy1h/cnR3b3JrLmpwZw.jpg"/>
      <itunes:duration>3444</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Pierluca D'Oro and Martin Klissarov on Motif and RLAIF, Noisy Neighborhoods and Return Landscapes, and more!  </p><p>Pierluca D'Oro is PhD student at Mila and visiting researcher at Meta.</p><p><br>Martin Klissarov is a PhD student at Mila and McGill and research scientist intern at Meta.  </p><p><br><strong>Featured References  </strong></p><p><a href="https://arxiv.org/abs/2310.00166"><strong>Motif: Intrinsic Motivation from Artificial Intelligence Feedback  <br></strong></a>Martin Klissarov*, Pierluca D'Oro*, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff  <br><strong><br></strong><a href="https://arxiv.org/abs/2309.14597%20"><strong>Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control  <br></strong></a>Nate Rahn*, Pierluca D'Oro*, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare  </p><p><a href="https://www.scienceofaiagents.com/p/to-keep-doing-rl-research-stop-calling"><strong>To keep doing RL research, stop calling yourself an RL researcher</strong></a><strong> <br></strong>Pierluca D'Oro </p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/bc10889c/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/bc10889c/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/bc10889c/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/bc10889c/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/bc10889c/transcription" type="text/html"/>
    </item>
    <item>
      <title>Martin Riedmiller</title>
      <itunes:episode>46</itunes:episode>
      <podcast:episode>46</podcast:episode>
      <itunes:title>Martin Riedmiller</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">dc2566ce-6b2e-42ef-a01c-b769ad7c7268</guid>
      <link>https://share.transistor.fm/s/c2d12a9b</link>
      <description>
        <![CDATA[<p>Martin Riedmiller of Google DeepMind on controlling nuclear fusion plasma in a tokamak with RL, the original Deep Q-Network, Neural Fitted Q-Iteration, Collect and Infer, AGI for control systems, and tons more!  </p><p><br><a href="https://sites.google.com/view/riedmiller/home">Martin Riedmiller</a> is a research scientist and team lead at DeepMind.   </p><p><br><strong>Featured References   </strong></p><p><br><a href="https://www.nature.com/articles/s41586-021-04301-9">Magnetic control of tokamak plasmas through deep reinforcement learning  <br></a>Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de las Casas, Craig Donner, Leslie Fritz, Cristian Galperti, Andrea Huber, James Keeling, Maria Tsimpoukelli, Jackie Kay, Antoine Merle, Jean-Marc Moret, Seb Noury, Federico Pesamosca, David Pfau, Olivier Sauter, Cristian Sommariva, Stefano Coda, Basil Duval, Ambrogio Fasoli, Pushmeet Kohli, Koray Kavukcuoglu, Demis Hassabis &amp; Martin Riedmiller </p><p><br><a href="https://www.nature.com/articles/nature14236">Human-level control through deep reinforcement learning <br></a>Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis  <br> </p><p><a href="https://link.springer.com/content/pdf/10.1007/11564096_32.pdf">Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method</a> <br>Martin Riedmiller  </p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Martin Riedmiller of Google DeepMind on controlling nuclear fusion plasma in a tokamak with RL, the original Deep Q-Network, Neural Fitted Q-Iteration, Collect and Infer, AGI for control systems, and tons more!  </p><p><br><a href="https://sites.google.com/view/riedmiller/home">Martin Riedmiller</a> is a research scientist and team lead at DeepMind.   </p><p><br><strong>Featured References   </strong></p><p><br><a href="https://www.nature.com/articles/s41586-021-04301-9">Magnetic control of tokamak plasmas through deep reinforcement learning  <br></a>Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de las Casas, Craig Donner, Leslie Fritz, Cristian Galperti, Andrea Huber, James Keeling, Maria Tsimpoukelli, Jackie Kay, Antoine Merle, Jean-Marc Moret, Seb Noury, Federico Pesamosca, David Pfau, Olivier Sauter, Cristian Sommariva, Stefano Coda, Basil Duval, Ambrogio Fasoli, Pushmeet Kohli, Koray Kavukcuoglu, Demis Hassabis &amp; Martin Riedmiller </p><p><br><a href="https://www.nature.com/articles/nature14236">Human-level control through deep reinforcement learning <br></a>Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis  <br> </p><p><a href="https://link.springer.com/content/pdf/10.1007/11564096_32.pdf">Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method</a> <br>Martin Riedmiller  </p>]]>
      </content:encoded>
      <pubDate>Tue, 22 Aug 2023 09:18:44 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/c2d12a9b/60e5851a.mp3" length="106510693" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/Lik7s95kxE-p_zLchXWCfGOpVscNkzL-8fTS3qHIcSc/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzE0Njk3NTgv/MTY5MjcyMDk5OC1h/cnR3b3JrLmpwZw.jpg"/>
      <itunes:duration>4436</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Martin Riedmiller of Google DeepMind on controlling nuclear fusion plasma in a tokamak with RL, the original Deep Q-Network, Neural Fitted Q-Iteration, Collect and Infer, AGI for control systems, and tons more!  </p><p><br><a href="https://sites.google.com/view/riedmiller/home">Martin Riedmiller</a> is a research scientist and team lead at DeepMind.   </p><p><br><strong>Featured References   </strong></p><p><br><a href="https://www.nature.com/articles/s41586-021-04301-9">Magnetic control of tokamak plasmas through deep reinforcement learning  <br></a>Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de las Casas, Craig Donner, Leslie Fritz, Cristian Galperti, Andrea Huber, James Keeling, Maria Tsimpoukelli, Jackie Kay, Antoine Merle, Jean-Marc Moret, Seb Noury, Federico Pesamosca, David Pfau, Olivier Sauter, Cristian Sommariva, Stefano Coda, Basil Duval, Ambrogio Fasoli, Pushmeet Kohli, Koray Kavukcuoglu, Demis Hassabis &amp; Martin Riedmiller </p><p><br><a href="https://www.nature.com/articles/nature14236">Human-level control through deep reinforcement learning <br></a>Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis  <br> </p><p><a href="https://link.springer.com/content/pdf/10.1007/11564096_32.pdf">Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method</a> <br>Martin Riedmiller  </p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/c2d12a9b/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/c2d12a9b/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/c2d12a9b/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/c2d12a9b/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/c2d12a9b/transcription" type="text/html"/>
    </item>
    <item>
      <title>Max Schwarzer</title>
      <itunes:episode>45</itunes:episode>
      <podcast:episode>45</podcast:episode>
      <itunes:title>Max Schwarzer</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">3f19a6e0-bdf4-4d21-add5-45d5eef48592</guid>
      <link>https://share.transistor.fm/s/7c6ce232</link>
      <description>
        <![CDATA[<p>Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science.  Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research.   </p><p><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/2305.19452">Bigger, Better, Faster: Human-level Atari with human-level efficiency  <br></a>Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro  <br><strong><br></strong><a href="https://openreview.net/forum?id=OpC-9aBBVJe">Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier <br></a>Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville  <br> <br><a href="https://arxiv.org/abs/2205.07802">The Primacy Bias in Deep Reinforcement Learning <br></a>Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville  <br><strong><br></strong><br><strong>Additional References   </strong></p><ul><li><a href="https://arxiv.org/abs/1710.02298">Rainbow: Combining Improvements in Deep Reinforcement Learning</a>, Hessel et al 2017  </li><li><a href="https://arxiv.org/abs/1906.05243">When to use parametric models in reinforcement learning?</a> Hasselt et al 2019 </li><li><a href="https://arxiv.org/abs/2007.05929">Data-Efficient Reinforcement Learning with Self-Predictive Representations</a>, Schwarzer et al 2020  </li><li><a href="https://arxiv.org/abs/2106.04799%20">Pretraining Representations for Data-Efficient Reinforcement Learning</a>, Schwarzer et al 2021  </li></ul><p><strong><br></strong><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science.  Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research.   </p><p><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/2305.19452">Bigger, Better, Faster: Human-level Atari with human-level efficiency  <br></a>Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro  <br><strong><br></strong><a href="https://openreview.net/forum?id=OpC-9aBBVJe">Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier <br></a>Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville  <br> <br><a href="https://arxiv.org/abs/2205.07802">The Primacy Bias in Deep Reinforcement Learning <br></a>Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville  <br><strong><br></strong><br><strong>Additional References   </strong></p><ul><li><a href="https://arxiv.org/abs/1710.02298">Rainbow: Combining Improvements in Deep Reinforcement Learning</a>, Hessel et al 2017  </li><li><a href="https://arxiv.org/abs/1906.05243">When to use parametric models in reinforcement learning?</a> Hasselt et al 2019 </li><li><a href="https://arxiv.org/abs/2007.05929">Data-Efficient Reinforcement Learning with Self-Predictive Representations</a>, Schwarzer et al 2020  </li><li><a href="https://arxiv.org/abs/2106.04799%20">Pretraining Representations for Data-Efficient Reinforcement Learning</a>, Schwarzer et al 2021  </li></ul><p><strong><br></strong><br></p>]]>
      </content:encoded>
      <pubDate>Tue, 08 Aug 2023 13:22:18 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/7c6ce232/628dcf91.mp3" length="101251660" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/OZRm_AWzXgPOgxsCsBmIGLeJ3jmIWEO_TcQoBUQi5IE/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzE0MzEwMTUv/MTY5MTUxNjU4Mi1h/cnR3b3JrLmpwZw.jpg"/>
      <itunes:duration>4218</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science.  Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research.   </p><p><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/2305.19452">Bigger, Better, Faster: Human-level Atari with human-level efficiency  <br></a>Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro  <br><strong><br></strong><a href="https://openreview.net/forum?id=OpC-9aBBVJe">Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier <br></a>Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville  <br> <br><a href="https://arxiv.org/abs/2205.07802">The Primacy Bias in Deep Reinforcement Learning <br></a>Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville  <br><strong><br></strong><br><strong>Additional References   </strong></p><ul><li><a href="https://arxiv.org/abs/1710.02298">Rainbow: Combining Improvements in Deep Reinforcement Learning</a>, Hessel et al 2017  </li><li><a href="https://arxiv.org/abs/1906.05243">When to use parametric models in reinforcement learning?</a> Hasselt et al 2019 </li><li><a href="https://arxiv.org/abs/2007.05929">Data-Efficient Reinforcement Learning with Self-Predictive Representations</a>, Schwarzer et al 2020  </li><li><a href="https://arxiv.org/abs/2106.04799%20">Pretraining Representations for Data-Efficient Reinforcement Learning</a>, Schwarzer et al 2021  </li></ul><p><strong><br></strong><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/7c6ce232/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/7c6ce232/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/7c6ce232/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/7c6ce232/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/7c6ce232/transcription" type="text/html"/>
    </item>
    <item>
      <title>Julian Togelius</title>
      <itunes:episode>44</itunes:episode>
      <podcast:episode>44</podcast:episode>
      <itunes:title>Julian Togelius</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">d3d0285e-a825-4c16-85b1-555e6b80f356</guid>
      <link>https://share.transistor.fm/s/c3536a57</link>
      <description>
        <![CDATA[<p><a href="http://julian.togelius.com/">Julian Togelius</a> is an Associate Professor of Computer Science and Engineering at NYU, and Cofounder and research director at <a href="https://modl.ai/">modl.ai</a></p><p><br>  </p><p><strong>Featured References  </strong><br><a href="https://arxiv.org/abs/2304.06035">Choose Your Weapon: Survival Strategies for Depressed AI Academics</a></p><p>Julian Togelius, Georgios N. Yannakakis</p><p><br></p><p><a href="https://arxiv.org/abs/2206.13623%20">Learning Controllable 3D Level Generators</a></p><p>Zehua Jiang, Sam Earle, Michael Cerny Green, Julian Togelius</p><p><br></p><p><a href="https://arxiv.org/abs/2001.09212">PCGRL: Procedural Content Generation via Reinforcement Learning</a></p><p>Ahmed Khalifa, Philip Bontrager, Sam Earle, Julian Togelius</p><p><br></p><p><a href="https://arxiv.org/abs/1806.10729%20">Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation</a></p><p>Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, Sebastian Risi</p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="http://julian.togelius.com/">Julian Togelius</a> is an Associate Professor of Computer Science and Engineering at NYU, and Cofounder and research director at <a href="https://modl.ai/">modl.ai</a></p><p><br>  </p><p><strong>Featured References  </strong><br><a href="https://arxiv.org/abs/2304.06035">Choose Your Weapon: Survival Strategies for Depressed AI Academics</a></p><p>Julian Togelius, Georgios N. Yannakakis</p><p><br></p><p><a href="https://arxiv.org/abs/2206.13623%20">Learning Controllable 3D Level Generators</a></p><p>Zehua Jiang, Sam Earle, Michael Cerny Green, Julian Togelius</p><p><br></p><p><a href="https://arxiv.org/abs/2001.09212">PCGRL: Procedural Content Generation via Reinforcement Learning</a></p><p>Ahmed Khalifa, Philip Bontrager, Sam Earle, Julian Togelius</p><p><br></p><p><a href="https://arxiv.org/abs/1806.10729%20">Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation</a></p><p>Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, Sebastian Risi</p><p><br></p>]]>
      </content:encoded>
      <pubDate>Tue, 25 Jul 2023 01:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/c3536a57/f8f803b5.mp3" length="57759553" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/eXBGKhY2kOW0tx6xWiKCmt2LiefkJ-VDMdFgUTA8dzQ/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzEzOTc0ODIv/MTY4NzczOTkwMC1h/cnR3b3JrLmpwZw.jpg"/>
      <itunes:duration>2404</itunes:duration>
      <itunes:summary>
        <![CDATA[<p><a href="http://julian.togelius.com/">Julian Togelius</a> is an Associate Professor of Computer Science and Engineering at NYU, and Cofounder and research director at <a href="https://modl.ai/">modl.ai</a></p><p><br>  </p><p><strong>Featured References  </strong><br><a href="https://arxiv.org/abs/2304.06035">Choose Your Weapon: Survival Strategies for Depressed AI Academics</a></p><p>Julian Togelius, Georgios N. Yannakakis</p><p><br></p><p><a href="https://arxiv.org/abs/2206.13623%20">Learning Controllable 3D Level Generators</a></p><p>Zehua Jiang, Sam Earle, Michael Cerny Green, Julian Togelius</p><p><br></p><p><a href="https://arxiv.org/abs/2001.09212">PCGRL: Procedural Content Generation via Reinforcement Learning</a></p><p>Ahmed Khalifa, Philip Bontrager, Sam Earle, Julian Togelius</p><p><br></p><p><a href="https://arxiv.org/abs/1806.10729%20">Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation</a></p><p>Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, Sebastian Risi</p><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/c3536a57/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/c3536a57/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/c3536a57/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/c3536a57/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/c3536a57/transcription" type="text/html"/>
    </item>
    <item>
      <title> Jakob Foerster</title>
      <itunes:episode>43</itunes:episode>
      <podcast:episode>43</podcast:episode>
      <itunes:title> Jakob Foerster</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">62f855ed-e9c3-4454-96ac-c5ea15d7a2b7</guid>
      <link>https://share.transistor.fm/s/11ed7595</link>
      <description>
        <![CDATA[<p>Jakob Foerster on Multi-Agent learning, Cooperation vs Competition, Emergent Communication, Zero-shot coordination, Opponent Shaping, agents for Hanabi and Prisoner's Dilemma, and more.  </p><p><a href="https://www.jakobfoerster.com/">Jakob Foerster</a> is an Associate Professor at University of Oxford.  </p><p><strong>Featured References  </strong></p><p><a href="https://arxiv.org/abs/1709.04326">Learning with Opponent-Learning Awareness <br></a>Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch  </p><p><a href="https://arxiv.org/abs/2205.01447">Model-Free Opponent Shaping</a> <br>Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster  </p><p><a href="https://arxiv.org/abs/2103.04000">Off-Belief Learning <br></a>Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster  <br><strong><br></strong><a href="https://arxiv.org/abs/1605.06676">Learning to Communicate with Deep Multi-Agent Reinforcement Learning <br></a>Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson  </p><p><a href="https://arxiv.org/abs/2211.11030">Adversarial Cheap Talk <br></a>Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster  <br><strong><br></strong><a href="https://arxiv.org/abs/2303.10733">Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning <br></a>Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://www.youtube.com/playlist?list=PLruBu5BI5n4ZMvdYXXZLl6w2_D6S3EU-j">Lectures by Jakob on youtube</a> </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Jakob Foerster on Multi-Agent learning, Cooperation vs Competition, Emergent Communication, Zero-shot coordination, Opponent Shaping, agents for Hanabi and Prisoner's Dilemma, and more.  </p><p><a href="https://www.jakobfoerster.com/">Jakob Foerster</a> is an Associate Professor at University of Oxford.  </p><p><strong>Featured References  </strong></p><p><a href="https://arxiv.org/abs/1709.04326">Learning with Opponent-Learning Awareness <br></a>Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch  </p><p><a href="https://arxiv.org/abs/2205.01447">Model-Free Opponent Shaping</a> <br>Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster  </p><p><a href="https://arxiv.org/abs/2103.04000">Off-Belief Learning <br></a>Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster  <br><strong><br></strong><a href="https://arxiv.org/abs/1605.06676">Learning to Communicate with Deep Multi-Agent Reinforcement Learning <br></a>Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson  </p><p><a href="https://arxiv.org/abs/2211.11030">Adversarial Cheap Talk <br></a>Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster  <br><strong><br></strong><a href="https://arxiv.org/abs/2303.10733">Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning <br></a>Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://www.youtube.com/playlist?list=PLruBu5BI5n4ZMvdYXXZLl6w2_D6S3EU-j">Lectures by Jakob on youtube</a> </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Sun, 07 May 2023 23:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/11ed7595/0a33428b.mp3" length="45977185" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/8OvAx5veAQCvzSPHHRWXWoStIcL1jQPeSt2CHKVVojY/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzEyOTUzMTIv/MTY4MzQ5MTk4Ny1h/cnR3b3JrLmpwZw.jpg"/>
      <itunes:duration>3825</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Jakob Foerster on Multi-Agent learning, Cooperation vs Competition, Emergent Communication, Zero-shot coordination, Opponent Shaping, agents for Hanabi and Prisoner's Dilemma, and more.  </p><p><a href="https://www.jakobfoerster.com/">Jakob Foerster</a> is an Associate Professor at University of Oxford.  </p><p><strong>Featured References  </strong></p><p><a href="https://arxiv.org/abs/1709.04326">Learning with Opponent-Learning Awareness <br></a>Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch  </p><p><a href="https://arxiv.org/abs/2205.01447">Model-Free Opponent Shaping</a> <br>Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster  </p><p><a href="https://arxiv.org/abs/2103.04000">Off-Belief Learning <br></a>Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster  <br><strong><br></strong><a href="https://arxiv.org/abs/1605.06676">Learning to Communicate with Deep Multi-Agent Reinforcement Learning <br></a>Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson  </p><p><a href="https://arxiv.org/abs/2211.11030">Adversarial Cheap Talk <br></a>Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster  <br><strong><br></strong><a href="https://arxiv.org/abs/2303.10733">Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning <br></a>Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://www.youtube.com/playlist?list=PLruBu5BI5n4ZMvdYXXZLl6w2_D6S3EU-j">Lectures by Jakob on youtube</a> </li></ul><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/11ed7595/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/11ed7595/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/11ed7595/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/11ed7595/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/11ed7595/transcription" type="text/html"/>
    </item>
    <item>
      <title>Danijar Hafner 2</title>
      <itunes:episode>42</itunes:episode>
      <podcast:episode>42</podcast:episode>
      <itunes:title>Danijar Hafner 2</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">816cb689-de85-40c3-a898-0c0365430da9</guid>
      <link>https://share.transistor.fm/s/aa2b14d9</link>
      <description>
        <![CDATA[<p>Danijar Hafner on the DreamerV3 agent and world models, the Director agent and heirarchical RL,  realtime RL on robots with DayDreamer, and his framework for unsupervised agent design! </p><p><a href="https://danijar.com/">Danijar Hafner</a> is a PhD candidate at the University of Toronto with Jimmy Ba, a visiting student at UC Berkeley with Pieter Abbeel, and an intern at DeepMind.  He has been our guest before back on episode 11.  </p><p><br><strong>Featured References   <br></strong><br><a href="https://arxiv.org/abs/2301.04104v1">Mastering Diverse Domains through World Models</a> [ <a href="https://danijar.com/project/dreamerv3/">blog</a> ] DreaverV3 </p><p>Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap  </p><p><br><a href="https://arxiv.org/abs/2206.14176">DayDreamer: World Models for Physical Robot Learning</a> [ <a href="https://danijar.com/project/daydreamer/">blog</a> ]  <br>Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel  </p><p><a href="https://arxiv.org/abs/2206.04114">Deep Hierarchical Planning from Pixels</a> [ <a href="https://danijar.com/project/director/">blog</a> ]  <br>Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel   </p><p><a href="https://arxiv.org/abs/2009.01791">Action and Perception as Divergence Minimization</a> [ <a href="https://danijar.com/project/apd/">blog</a> ]  <br>Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://arxiv.org/abs/2010.02193">Mastering Atari with Discrete World Models</a> [ <a href="https://danijar.com/project/dreamerv2/">blog</a> ] DreaverV2 ; Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba  </li><li><a href="https://arxiv.org/abs/1912.01603">Dream to Control: Learning Behaviors by Latent Imagination</a> [ <a href="https://danijar.com/project/dreamer/">blog</a> ] Dreamer ; Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi  </li><li><a href="https://arxiv.org/abs/2005.05960">Planning to Explore via Self-Supervised World Models</a> ; Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak  </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Danijar Hafner on the DreamerV3 agent and world models, the Director agent and heirarchical RL,  realtime RL on robots with DayDreamer, and his framework for unsupervised agent design! </p><p><a href="https://danijar.com/">Danijar Hafner</a> is a PhD candidate at the University of Toronto with Jimmy Ba, a visiting student at UC Berkeley with Pieter Abbeel, and an intern at DeepMind.  He has been our guest before back on episode 11.  </p><p><br><strong>Featured References   <br></strong><br><a href="https://arxiv.org/abs/2301.04104v1">Mastering Diverse Domains through World Models</a> [ <a href="https://danijar.com/project/dreamerv3/">blog</a> ] DreaverV3 </p><p>Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap  </p><p><br><a href="https://arxiv.org/abs/2206.14176">DayDreamer: World Models for Physical Robot Learning</a> [ <a href="https://danijar.com/project/daydreamer/">blog</a> ]  <br>Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel  </p><p><a href="https://arxiv.org/abs/2206.04114">Deep Hierarchical Planning from Pixels</a> [ <a href="https://danijar.com/project/director/">blog</a> ]  <br>Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel   </p><p><a href="https://arxiv.org/abs/2009.01791">Action and Perception as Divergence Minimization</a> [ <a href="https://danijar.com/project/apd/">blog</a> ]  <br>Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://arxiv.org/abs/2010.02193">Mastering Atari with Discrete World Models</a> [ <a href="https://danijar.com/project/dreamerv2/">blog</a> ] DreaverV2 ; Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba  </li><li><a href="https://arxiv.org/abs/1912.01603">Dream to Control: Learning Behaviors by Latent Imagination</a> [ <a href="https://danijar.com/project/dreamer/">blog</a> ] Dreamer ; Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi  </li><li><a href="https://arxiv.org/abs/2005.05960">Planning to Explore via Self-Supervised World Models</a> ; Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak  </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Wed, 12 Apr 2023 01:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/aa2b14d9/fb56410f.mp3" length="43491890" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/W-LI9ACRw3oGu5wfAS_Vdnm4Y85BXLC-8_OPklI1UIQ/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8xMjFi/ZjYxY2YwODg4ODdm/OTIyZjFiOWJhYjEx/Y2UyMy53ZWJw.jpg"/>
      <itunes:duration>2715</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Danijar Hafner on the DreamerV3 agent and world models, the Director agent and heirarchical RL,  realtime RL on robots with DayDreamer, and his framework for unsupervised agent design! </p><p><a href="https://danijar.com/">Danijar Hafner</a> is a PhD candidate at the University of Toronto with Jimmy Ba, a visiting student at UC Berkeley with Pieter Abbeel, and an intern at DeepMind.  He has been our guest before back on episode 11.  </p><p><br><strong>Featured References   <br></strong><br><a href="https://arxiv.org/abs/2301.04104v1">Mastering Diverse Domains through World Models</a> [ <a href="https://danijar.com/project/dreamerv3/">blog</a> ] DreaverV3 </p><p>Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap  </p><p><br><a href="https://arxiv.org/abs/2206.14176">DayDreamer: World Models for Physical Robot Learning</a> [ <a href="https://danijar.com/project/daydreamer/">blog</a> ]  <br>Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel  </p><p><a href="https://arxiv.org/abs/2206.04114">Deep Hierarchical Planning from Pixels</a> [ <a href="https://danijar.com/project/director/">blog</a> ]  <br>Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel   </p><p><a href="https://arxiv.org/abs/2009.01791">Action and Perception as Divergence Minimization</a> [ <a href="https://danijar.com/project/apd/">blog</a> ]  <br>Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://arxiv.org/abs/2010.02193">Mastering Atari with Discrete World Models</a> [ <a href="https://danijar.com/project/dreamerv2/">blog</a> ] DreaverV2 ; Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba  </li><li><a href="https://arxiv.org/abs/1912.01603">Dream to Control: Learning Behaviors by Latent Imagination</a> [ <a href="https://danijar.com/project/dreamer/">blog</a> ] Dreamer ; Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi  </li><li><a href="https://arxiv.org/abs/2005.05960">Planning to Explore via Self-Supervised World Models</a> ; Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak  </li></ul><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/aa2b14d9/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>Jeff Clune</title>
      <itunes:episode>41</itunes:episode>
      <podcast:episode>41</podcast:episode>
      <itunes:title>Jeff Clune</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">6ab7ea99-e3cd-4736-9613-2a2cae98a233</guid>
      <link>https://share.transistor.fm/s/7690a12f</link>
      <description>
        <![CDATA[<p>AI Generating Algos, Learning to play Minecraft with Video PreTraining (VPT), Go-Explore for hard exploration, POET and Open Endedness, AI-GAs and ChatGPT, AGI predictions, and lots more!  </p><p>Professor Jeff Clune is Associate Professor of Computer Science at University of British Columbia, a Canada CIFAR AI Chair and Faculty Member at Vector Institute, and Senior Research Advisor at DeepMind.  </p><p><br><strong>Featured References  </strong></p><p><a href="https://arxiv.org/abs/2206.11795">Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos</a> [ <a href="https://openai.com/research/vpt">Blog Post</a> ] <br>Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune  </p><p><a href="https://www.nature.com/articles/nature14422">Robots that can adapt like animals</a> <br>Antoine Cully, Jeff Clune, Danesh Tarapore, Jean-Baptiste Mouret  <strong></strong></p><p><a href="https://arxiv.org/abs/1504.04909">Illuminating search spaces by mapping elites</a> <br>Jean-Baptiste Mouret, Jeff Clune  <br><strong><br></strong><a href="https://arxiv.org/abs/2003.08536">Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions <br></a>Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley  </p><p><a href="https://arxiv.org/abs/1901.01753">Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions</a> <br>Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley  </p><p><a href="https://www.nature.com/articles/s41586-020-03157-9">First return, then explore</a> <br>Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune <strong><br></strong><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>AI Generating Algos, Learning to play Minecraft with Video PreTraining (VPT), Go-Explore for hard exploration, POET and Open Endedness, AI-GAs and ChatGPT, AGI predictions, and lots more!  </p><p>Professor Jeff Clune is Associate Professor of Computer Science at University of British Columbia, a Canada CIFAR AI Chair and Faculty Member at Vector Institute, and Senior Research Advisor at DeepMind.  </p><p><br><strong>Featured References  </strong></p><p><a href="https://arxiv.org/abs/2206.11795">Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos</a> [ <a href="https://openai.com/research/vpt">Blog Post</a> ] <br>Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune  </p><p><a href="https://www.nature.com/articles/nature14422">Robots that can adapt like animals</a> <br>Antoine Cully, Jeff Clune, Danesh Tarapore, Jean-Baptiste Mouret  <strong></strong></p><p><a href="https://arxiv.org/abs/1504.04909">Illuminating search spaces by mapping elites</a> <br>Jean-Baptiste Mouret, Jeff Clune  <br><strong><br></strong><a href="https://arxiv.org/abs/2003.08536">Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions <br></a>Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley  </p><p><a href="https://arxiv.org/abs/1901.01753">Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions</a> <br>Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley  </p><p><a href="https://www.nature.com/articles/s41586-020-03157-9">First return, then explore</a> <br>Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune <strong><br></strong><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 27 Mar 2023 07:32:53 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/7690a12f/c23e8b03.mp3" length="51310017" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/nAYouzo0oE7eKoTH2NbvX8eG7UWCRtkWKmkoFZIcDiE/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzEyNTIxMDkv/MTY3OTEwMjg4Mi1h/cnR3b3JrLmpwZw.jpg"/>
      <itunes:duration>4271</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>AI Generating Algos, Learning to play Minecraft with Video PreTraining (VPT), Go-Explore for hard exploration, POET and Open Endedness, AI-GAs and ChatGPT, AGI predictions, and lots more!  </p><p>Professor Jeff Clune is Associate Professor of Computer Science at University of British Columbia, a Canada CIFAR AI Chair and Faculty Member at Vector Institute, and Senior Research Advisor at DeepMind.  </p><p><br><strong>Featured References  </strong></p><p><a href="https://arxiv.org/abs/2206.11795">Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos</a> [ <a href="https://openai.com/research/vpt">Blog Post</a> ] <br>Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune  </p><p><a href="https://www.nature.com/articles/nature14422">Robots that can adapt like animals</a> <br>Antoine Cully, Jeff Clune, Danesh Tarapore, Jean-Baptiste Mouret  <strong></strong></p><p><a href="https://arxiv.org/abs/1504.04909">Illuminating search spaces by mapping elites</a> <br>Jean-Baptiste Mouret, Jeff Clune  <br><strong><br></strong><a href="https://arxiv.org/abs/2003.08536">Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions <br></a>Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley  </p><p><a href="https://arxiv.org/abs/1901.01753">Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions</a> <br>Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley  </p><p><a href="https://www.nature.com/articles/s41586-020-03157-9">First return, then explore</a> <br>Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune <strong><br></strong><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/7690a12f/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>Natasha Jaques 2</title>
      <itunes:episode>40</itunes:episode>
      <podcast:episode>40</podcast:episode>
      <itunes:title>Natasha Jaques 2</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">98a5077c-e977-426a-a508-3a99115d8ec8</guid>
      <link>https://share.transistor.fm/s/a18817da</link>
      <description>
        <![CDATA[<p>Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!  </p><p>Dr Natasha Jaques is a Senior Research Scientist at Google Brain. </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/1907.00456">Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog <br></a>Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard  <br><strong><br></strong><a href="https://arxiv.org/abs/1611.02796">Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control <br></a>Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck  <br><strong><br></strong><a href="https://arxiv.org/abs/2102.12">PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning <br></a>Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar  <strong></strong></p><p><a href="https://arxiv.org/abs/2208.04919">Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience <br></a>Marwa Abdulhai, Natasha Jaques, Sergey Levine  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://arxiv.org/abs/1909.08593">Fine-Tuning Language Models from Human Preferences</a>, Daniel M. Ziegler et al 2019  </li><li><a href="https://arxiv.org/abs/2009.01325">Learning to summarize from human feedback</a>, Nisan Stiennon et al 2020  </li><li><a href="https://arxiv.org/abs/2203.02155">Training language models to follow instructions with human feedback</a>, Long Ouyang et al 2022  </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!  </p><p>Dr Natasha Jaques is a Senior Research Scientist at Google Brain. </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/1907.00456">Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog <br></a>Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard  <br><strong><br></strong><a href="https://arxiv.org/abs/1611.02796">Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control <br></a>Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck  <br><strong><br></strong><a href="https://arxiv.org/abs/2102.12">PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning <br></a>Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar  <strong></strong></p><p><a href="https://arxiv.org/abs/2208.04919">Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience <br></a>Marwa Abdulhai, Natasha Jaques, Sergey Levine  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://arxiv.org/abs/1909.08593">Fine-Tuning Language Models from Human Preferences</a>, Daniel M. Ziegler et al 2019  </li><li><a href="https://arxiv.org/abs/2009.01325">Learning to summarize from human feedback</a>, Nisan Stiennon et al 2020  </li><li><a href="https://arxiv.org/abs/2203.02155">Training language models to follow instructions with human feedback</a>, Long Ouyang et al 2022  </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 13 Mar 2023 23:34:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/a18817da/6d4200e9.mp3" length="33169094" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/HxKPDB95beUYXQ2lORiH85GoIEDbiZOsrG2YgokCzmo/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzEyNDMxNjkv/MTY3ODY4MTk4NC1h/cnR3b3JrLmpwZw.jpg"/>
      <itunes:duration>2762</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!  </p><p>Dr Natasha Jaques is a Senior Research Scientist at Google Brain. </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/1907.00456">Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog <br></a>Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard  <br><strong><br></strong><a href="https://arxiv.org/abs/1611.02796">Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control <br></a>Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck  <br><strong><br></strong><a href="https://arxiv.org/abs/2102.12">PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning <br></a>Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar  <strong></strong></p><p><a href="https://arxiv.org/abs/2208.04919">Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience <br></a>Marwa Abdulhai, Natasha Jaques, Sergey Levine  </p><p><br><strong>Additional References  </strong></p><ul><li><a href="https://arxiv.org/abs/1909.08593">Fine-Tuning Language Models from Human Preferences</a>, Daniel M. Ziegler et al 2019  </li><li><a href="https://arxiv.org/abs/2009.01325">Learning to summarize from human feedback</a>, Nisan Stiennon et al 2020  </li><li><a href="https://arxiv.org/abs/2203.02155">Training language models to follow instructions with human feedback</a>, Long Ouyang et al 2022  </li></ul><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/a18817da/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>Jacob Beck and Risto Vuorio</title>
      <itunes:episode>39</itunes:episode>
      <podcast:episode>39</podcast:episode>
      <itunes:title>Jacob Beck and Risto Vuorio</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">a1467229-3ab7-449f-804d-5a263ea4e033</guid>
      <link>https://share.transistor.fm/s/764dcaaa</link>
      <description>
        <![CDATA[<p>Jacob Beck and Risto Vuorio on their recent Survey of Meta-Reinforcement Learning.  Jacob and Risto are Ph.D. students at Whiteson Research Lab at University of Oxford.    </p><p><br><strong>Featured Reference   </strong></p><p><br><a href="https://arxiv.org/abs/2301.08028"><strong>A Survey of Meta-Reinforcement Learning<br></strong></a>Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson   </p><p><br><strong>Additional References  <br></strong><br></p><ul><li><a href="https://arxiv.org/abs/1910.08348">VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning</a>, Luisa Zintgraf et al  </li><li><a href="https://arxiv.org/abs/2301.04104v1">Mastering Diverse Domains through World Models</a> (Dreamerv3), Hafner et al    </li><li><a href="https://arxiv.org/abs/1806.04640">Unsupervised Meta-Learning for Reinforcement Learning</a> (MAML), Gupta et al  </li><li><a href="https://arxiv.org/abs/2008.02790">Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices</a> (DREAM), Liu et al  </li><li><a href="https://arxiv.org/abs/1611.02779">RL2: Fast Reinforcement Learning via Slow Reinforcement Learning</a>, Duan et al  </li><li><a href="https://arxiv.org/abs/1611.05763">Learning to reinforcement learn</a>, Wang et al  </li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Jacob Beck and Risto Vuorio on their recent Survey of Meta-Reinforcement Learning.  Jacob and Risto are Ph.D. students at Whiteson Research Lab at University of Oxford.    </p><p><br><strong>Featured Reference   </strong></p><p><br><a href="https://arxiv.org/abs/2301.08028"><strong>A Survey of Meta-Reinforcement Learning<br></strong></a>Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson   </p><p><br><strong>Additional References  <br></strong><br></p><ul><li><a href="https://arxiv.org/abs/1910.08348">VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning</a>, Luisa Zintgraf et al  </li><li><a href="https://arxiv.org/abs/2301.04104v1">Mastering Diverse Domains through World Models</a> (Dreamerv3), Hafner et al    </li><li><a href="https://arxiv.org/abs/1806.04640">Unsupervised Meta-Learning for Reinforcement Learning</a> (MAML), Gupta et al  </li><li><a href="https://arxiv.org/abs/2008.02790">Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices</a> (DREAM), Liu et al  </li><li><a href="https://arxiv.org/abs/1611.02779">RL2: Fast Reinforcement Learning via Slow Reinforcement Learning</a>, Duan et al  </li><li><a href="https://arxiv.org/abs/1611.05763">Learning to reinforcement learn</a>, Wang et al  </li></ul>]]>
      </content:encoded>
      <pubDate>Tue, 07 Mar 2023 08:19:22 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/764dcaaa/00fe7176.mp3" length="48370569" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/ZXHqYUi1_4nqL4m2yudIAA-2qymWtP4jVkpLJOtJepQ/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzEyMzY1MDEv/MTY3ODIxODMzMS1h/cnR3b3JrLmpwZw.jpg"/>
      <itunes:duration>4025</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Jacob Beck and Risto Vuorio on their recent Survey of Meta-Reinforcement Learning.  Jacob and Risto are Ph.D. students at Whiteson Research Lab at University of Oxford.    </p><p><br><strong>Featured Reference   </strong></p><p><br><a href="https://arxiv.org/abs/2301.08028"><strong>A Survey of Meta-Reinforcement Learning<br></strong></a>Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson   </p><p><br><strong>Additional References  <br></strong><br></p><ul><li><a href="https://arxiv.org/abs/1910.08348">VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning</a>, Luisa Zintgraf et al  </li><li><a href="https://arxiv.org/abs/2301.04104v1">Mastering Diverse Domains through World Models</a> (Dreamerv3), Hafner et al    </li><li><a href="https://arxiv.org/abs/1806.04640">Unsupervised Meta-Learning for Reinforcement Learning</a> (MAML), Gupta et al  </li><li><a href="https://arxiv.org/abs/2008.02790">Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices</a> (DREAM), Liu et al  </li><li><a href="https://arxiv.org/abs/1611.02779">RL2: Fast Reinforcement Learning via Slow Reinforcement Learning</a>, Duan et al  </li><li><a href="https://arxiv.org/abs/1611.05763">Learning to reinforcement learn</a>, Wang et al  </li></ul>]]>
      </itunes:summary>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/764dcaaa/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>John Schulman</title>
      <itunes:episode>38</itunes:episode>
      <podcast:episode>38</podcast:episode>
      <itunes:title>John Schulman</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">8084b1ea-bc8a-4f8c-b81e-bdee03ded672</guid>
      <link>https://share.transistor.fm/s/2bfa4dc4</link>
      <description>
        <![CDATA[<p><a href="http://joschu.net/">John Schulman</a> is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI.</p><p><br><strong>Featured References</strong></p><p><a href="https://arxiv.org/abs/2112.09332">WebGPT: Browser-assisted question-answering with human feedback</a><br>Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman</p><p><a href="https://arxiv.org/abs/2203.02155">Training language models to follow instructions with human feedback<br></a>Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe</p><p><strong>Additional References</strong></p><ul><li><a href="https://openai.com/blog/our-approach-to-alignment-research/">Our approach to alignment research</a>, OpenAI 2022</li><li><a href="https://arxiv.org/abs/2110.14168">Training Verifiers to Solve Math Word Problems</a>, Cobbe et al 2021</li><li><a href="https://www.youtube.com/watch?v=8EcdaCk9KaQ">UC Berkeley Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation</a>, John Schulman 2017</li><li><a href="https://arxiv.org/abs/1707.06347">Proximal Policy Optimization Algorithms</a>, Schulman 2017</li><li><a href="https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-217.html">Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs</a>, Schulman 2016</li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="http://joschu.net/">John Schulman</a> is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI.</p><p><br><strong>Featured References</strong></p><p><a href="https://arxiv.org/abs/2112.09332">WebGPT: Browser-assisted question-answering with human feedback</a><br>Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman</p><p><a href="https://arxiv.org/abs/2203.02155">Training language models to follow instructions with human feedback<br></a>Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe</p><p><strong>Additional References</strong></p><ul><li><a href="https://openai.com/blog/our-approach-to-alignment-research/">Our approach to alignment research</a>, OpenAI 2022</li><li><a href="https://arxiv.org/abs/2110.14168">Training Verifiers to Solve Math Word Problems</a>, Cobbe et al 2021</li><li><a href="https://www.youtube.com/watch?v=8EcdaCk9KaQ">UC Berkeley Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation</a>, John Schulman 2017</li><li><a href="https://arxiv.org/abs/1707.06347">Proximal Policy Optimization Algorithms</a>, Schulman 2017</li><li><a href="https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-217.html">Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs</a>, Schulman 2016</li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Tue, 18 Oct 2022 01:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/2bfa4dc4/b29b0c00.mp3" length="32004375" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/2zpjk6F5CKd3wY3XtNHEXRXvM8MMzqqvYQr7-tl1rzY/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzEwNTc0MjYv/MTY2NTk3OTY5Ny1h/cnR3b3JrLmpwZw.jpg"/>
      <itunes:duration>2661</itunes:duration>
      <itunes:summary>John Schulman, OpenAI cofounder and researcher, inventor of PPO/TRPO talks RL from human feedback, tuning GPT-3 to follow instructions (InstructGPT) and answer long-form questions using the internet (WebGPT), AI alignment, AGI timelines, and more!</itunes:summary>
      <itunes:subtitle>John Schulman, OpenAI cofounder and researcher, inventor of PPO/TRPO talks RL from human feedback, tuning GPT-3 to follow instructions (InstructGPT) and answer long-form questions using the internet (WebGPT), AI alignment, AGI timelines, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/2bfa4dc4/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>Sven Mika</title>
      <itunes:episode>37</itunes:episode>
      <podcast:episode>37</podcast:episode>
      <itunes:title>Sven Mika</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">516fe124-63cb-4547-b918-731b708ccd16</guid>
      <link>https://share.transistor.fm/s/367e37c3</link>
      <description>
        <![CDATA[<p>Sven Mika is the Reinforcement Learning Team Lead at Anyscale, and lead committer of RLlib. He holds a PhD in biomathematics, bioinformatics, and computational biology from Witten/Herdecke University. </p><p><br><strong>Featured References</strong></p><p><a href="https://docs.ray.io/en/latest/rllib/index.html">RLlib Documentation: RLlib: Industry-Grade Reinforcement Learning<br></a><br><a href="https://docs.ray.io/en/latest/">Ray: Documentation</a></p><p><a href="https://arxiv.org/abs/1712.09381">RLlib: Abstractions for Distributed Reinforcement Learning</a><br>Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica</p><p><br><strong>Episode sponsor: </strong><a href="https://anyscale.com/"><strong>Anyscale</strong></a><strong><br></strong><br><a href="http://raysummit.org/">Ray Summit 2022</a> is coming to San Francisco on August 23-24.<br>Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.</p><p>Register at <a href="http://raysummit.org/">raysummit.org</a> and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.</p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Sven Mika is the Reinforcement Learning Team Lead at Anyscale, and lead committer of RLlib. He holds a PhD in biomathematics, bioinformatics, and computational biology from Witten/Herdecke University. </p><p><br><strong>Featured References</strong></p><p><a href="https://docs.ray.io/en/latest/rllib/index.html">RLlib Documentation: RLlib: Industry-Grade Reinforcement Learning<br></a><br><a href="https://docs.ray.io/en/latest/">Ray: Documentation</a></p><p><a href="https://arxiv.org/abs/1712.09381">RLlib: Abstractions for Distributed Reinforcement Learning</a><br>Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica</p><p><br><strong>Episode sponsor: </strong><a href="https://anyscale.com/"><strong>Anyscale</strong></a><strong><br></strong><br><a href="http://raysummit.org/">Ray Summit 2022</a> is coming to San Francisco on August 23-24.<br>Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.</p><p>Register at <a href="http://raysummit.org/">raysummit.org</a> and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.</p>]]>
      </content:encoded>
      <pubDate>Thu, 18 Aug 2022 22:11:33 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/367e37c3/0b59214b.mp3" length="29432860" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/CP2JpZVBKxEa6PMeYc0_dt7NwsdZySjrOkgyO399rzo/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzk5MzU0OC8x/NjYwODYxODgzLWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>2096</itunes:duration>
      <itunes:summary>Sven Mika of Anyscale on RLlib present and future, Ray and Ray Summit 2022, applied RL in Games / Finance / RecSys, and more!</itunes:summary>
      <itunes:subtitle>Sven Mika of Anyscale on RLlib present and future, Ray and Ray Summit 2022, applied RL in Games / Finance / RecSys, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/367e37c3/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/367e37c3/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/367e37c3/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/367e37c3/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/367e37c3/transcription" type="text/html"/>
    </item>
    <item>
      <title>Karol Hausman and Fei Xia</title>
      <itunes:episode>36</itunes:episode>
      <podcast:episode>36</podcast:episode>
      <itunes:title>Karol Hausman and Fei Xia</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">07343fc5-6d05-4100-9b58-1452141064f4</guid>
      <link>https://share.transistor.fm/s/a911824e</link>
      <description>
        <![CDATA[<p>Karol Hausman is a Senior Research Scientist at Google Brain and an Adjunct Professor at Stanford working on robotics and machine learning. Karol is interested in enabling robots to acquire general-purpose skills with minimal supervision in real-world environments. </p><p>Fei Xia is a Research Scientist with Google Research. Fei Xia is mostly interested in robot learning in complex and unstructured environments. Previously he has been approaching this problem by learning in realistic and scalable simulation environments (GibsonEnv, iGibson). Most recently, he has been exploring using foundation models for those challenges.</p><p><strong>Featured References<br></strong><br><a href="https://arxiv.org/abs/2204.01691">Do As I Can, Not As I Say: Grounding Language in Robotic Affordances</a> [ <a href="https://say-can.github.io/">website</a> ] <br> Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan</p><p><a href="https://arxiv.org/abs/2207.05608">Inner Monologue: Embodied Reasoning through Planning with Language Models</a><br>Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter</p><p><strong>Additional References</strong></p><ul><li><a href="https://searchworks.stanford.edu/view/13972041">Large-scale simulation for embodied perception and robot learning</a>, Xia 2021</li><li><a href="https://arxiv.org/abs/1806.10293">QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation</a>, Kalashnikov et al 2018</li><li><a href="https://arxiv.org/abs/2104.08212">MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale</a>, Kalashnikov et al 2021</li><li><a href="https://arxiv.org/abs/2008.07792">ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation</a>, Xia et al 2020</li><li><a href="https://arxiv.org/abs/2104.07749">Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills</a>, Chebotar et al 2021  </li><li><a href="https://arxiv.org/abs/2204.00598">Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language</a>, Zeng et al 2022</li></ul><p><br><strong>Episode sponsor: </strong><a href="https://anyscale.com/"><strong>Anyscale</strong></a><strong><br></strong><br><a href="http://raysummit.org/">Ray Summit 2022</a> is coming to San Francisco on August 23-24.<br>Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.</p><p>Register at <a href="http://raysummit.org/">raysummit.org</a> and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.</p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Karol Hausman is a Senior Research Scientist at Google Brain and an Adjunct Professor at Stanford working on robotics and machine learning. Karol is interested in enabling robots to acquire general-purpose skills with minimal supervision in real-world environments. </p><p>Fei Xia is a Research Scientist with Google Research. Fei Xia is mostly interested in robot learning in complex and unstructured environments. Previously he has been approaching this problem by learning in realistic and scalable simulation environments (GibsonEnv, iGibson). Most recently, he has been exploring using foundation models for those challenges.</p><p><strong>Featured References<br></strong><br><a href="https://arxiv.org/abs/2204.01691">Do As I Can, Not As I Say: Grounding Language in Robotic Affordances</a> [ <a href="https://say-can.github.io/">website</a> ] <br> Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan</p><p><a href="https://arxiv.org/abs/2207.05608">Inner Monologue: Embodied Reasoning through Planning with Language Models</a><br>Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter</p><p><strong>Additional References</strong></p><ul><li><a href="https://searchworks.stanford.edu/view/13972041">Large-scale simulation for embodied perception and robot learning</a>, Xia 2021</li><li><a href="https://arxiv.org/abs/1806.10293">QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation</a>, Kalashnikov et al 2018</li><li><a href="https://arxiv.org/abs/2104.08212">MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale</a>, Kalashnikov et al 2021</li><li><a href="https://arxiv.org/abs/2008.07792">ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation</a>, Xia et al 2020</li><li><a href="https://arxiv.org/abs/2104.07749">Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills</a>, Chebotar et al 2021  </li><li><a href="https://arxiv.org/abs/2204.00598">Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language</a>, Zeng et al 2022</li></ul><p><br><strong>Episode sponsor: </strong><a href="https://anyscale.com/"><strong>Anyscale</strong></a><strong><br></strong><br><a href="http://raysummit.org/">Ray Summit 2022</a> is coming to San Francisco on August 23-24.<br>Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.</p><p>Register at <a href="http://raysummit.org/">raysummit.org</a> and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.</p>]]>
      </content:encoded>
      <pubDate>Tue, 16 Aug 2022 12:05:30 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/a911824e/a83f417f.mp3" length="53113518" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/cTV3neF5HHZijEhqtxjw_bdldBzYqAyuLMsr6iPM19U/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzk5MDgxNy8x/NjYwNjY1MDUxLWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>3789</itunes:duration>
      <itunes:summary>Karol Hausman and Fei Xia of Google Research on newly updated (PaLM-)SayCan, Inner Monologue, robot learning, combining robotics with language models, and more!</itunes:summary>
      <itunes:subtitle>Karol Hausman and Fei Xia of Google Research on newly updated (PaLM-)SayCan, Inner Monologue, robot learning, combining robotics with language models, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/a911824e/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/a911824e/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/a911824e/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/a911824e/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/a911824e/transcription" type="text/html"/>
    </item>
    <item>
      <title>Sai Krishna Gottipati</title>
      <itunes:episode>35</itunes:episode>
      <podcast:episode>35</podcast:episode>
      <itunes:title>Sai Krishna Gottipati</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">d15f27de-681e-4030-81b7-737d57b85228</guid>
      <link>https://share.transistor.fm/s/b803a301</link>
      <description>
        <![CDATA[<p>Saikrishna Gottipati is an RL Researcher at AI Redefined, working on RL, MARL, human in the loop learning.</p><p><strong>Featured References</strong></p><p><a href="https://arxiv.org/abs/2106.11345">Cogment: Open Source Framework For Distributed Multi-actor Training, Deployment &amp; Operations<strong><br></strong></a>AI Redefined, Sai Krishna Gottipati, Sagar Kurandwad, Clodéric Mars, Gregory Szriftgiser, François Chabot</p><p><strong>Do As You Teach: A Multi-Teacher Approach to Self-Play in Deep Reinforcement Learning<br></strong>Currently under review</p><p><a href="http://proceedings.mlr.press/v119/gottipati20a/gottipati20a.pdf">Learning to navigate the synthetically accessible chemical space using reinforcement learning<br></a>Sai Krishna Gottipati, Boris Sattarov, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Karam J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, Yoshua Bengio</p><p><strong>Additional References</strong></p><ul><li><a href="https://arxiv.org/abs/2101.04882">Asymmetric self-play for automatic goal discovery in robotic manipulation</a>, 2021 OpenAI et al </li><li><a href="https://arxiv.org/abs/2103.03216">Continuous Coordination As a Realistic Scenario for Lifelong Learning</a>, 2021 Nekoei et al</li></ul><p><strong>Episode sponsor: </strong><a href="https://anyscale.com/"><strong>Anyscale</strong></a><strong><br></strong><br><a href="http://raysummit.org">Ray Summit 2022</a> is coming to San Francisco on August 23-24.<br>Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.</p><p>Register at <a href="http://raysummit.org/">raysummit.org</a> and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.</p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Saikrishna Gottipati is an RL Researcher at AI Redefined, working on RL, MARL, human in the loop learning.</p><p><strong>Featured References</strong></p><p><a href="https://arxiv.org/abs/2106.11345">Cogment: Open Source Framework For Distributed Multi-actor Training, Deployment &amp; Operations<strong><br></strong></a>AI Redefined, Sai Krishna Gottipati, Sagar Kurandwad, Clodéric Mars, Gregory Szriftgiser, François Chabot</p><p><strong>Do As You Teach: A Multi-Teacher Approach to Self-Play in Deep Reinforcement Learning<br></strong>Currently under review</p><p><a href="http://proceedings.mlr.press/v119/gottipati20a/gottipati20a.pdf">Learning to navigate the synthetically accessible chemical space using reinforcement learning<br></a>Sai Krishna Gottipati, Boris Sattarov, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Karam J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, Yoshua Bengio</p><p><strong>Additional References</strong></p><ul><li><a href="https://arxiv.org/abs/2101.04882">Asymmetric self-play for automatic goal discovery in robotic manipulation</a>, 2021 OpenAI et al </li><li><a href="https://arxiv.org/abs/2103.03216">Continuous Coordination As a Realistic Scenario for Lifelong Learning</a>, 2021 Nekoei et al</li></ul><p><strong>Episode sponsor: </strong><a href="https://anyscale.com/"><strong>Anyscale</strong></a><strong><br></strong><br><a href="http://raysummit.org">Ray Summit 2022</a> is coming to San Francisco on August 23-24.<br>Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.</p><p>Register at <a href="http://raysummit.org/">raysummit.org</a> and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.</p>]]>
      </content:encoded>
      <pubDate>Sun, 31 Jul 2022 19:41:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/b803a301/db38180a.mp3" length="49122326" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/_bIAc3G1dCWZbxaij83djobwCfW5hNtonuo9qmizGww/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzk2OTQxNC8x/NjU5MzIxNjc5LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>4091</itunes:duration>
      <itunes:summary>Sai Krishna Gottipati of AI Redefined on RL for synthesizable drug discovery, Multi-Teacher Self-Play, Cogment framework for realtime multi-actor RL, AI + Chess, and more!</itunes:summary>
      <itunes:subtitle>Sai Krishna Gottipati of AI Redefined on RL for synthesizable drug discovery, Multi-Teacher Self-Play, Cogment framework for realtime multi-actor RL, AI + Chess, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/b803a301/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/b803a301/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/b803a301/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/b803a301/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/b803a301/transcription" type="text/html"/>
    </item>
    <item>
      <title>Aravind Srinivas 2</title>
      <itunes:episode>34</itunes:episode>
      <podcast:episode>34</podcast:episode>
      <itunes:title>Aravind Srinivas 2</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">d8efecad-d868-497c-bfd2-a14a02f5bcdd</guid>
      <link>https://share.transistor.fm/s/cb13a30d</link>
      <description>
        <![CDATA[<p>Aravind Srinivas is back!  He is now a research Scientist at OpenAI.</p><p><strong>Featured References</strong></p><p><a href="https://arxiv.org/abs/2106.01345%20">Decision Transformer: Reinforcement Learning via Sequence Modeling</a><br>Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch</p><p><a href="https://arxiv.org/abs/2104.10157">VideoGPT: Video Generation using VQ-VAE and Transformers</a><br>Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas</p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Aravind Srinivas is back!  He is now a research Scientist at OpenAI.</p><p><strong>Featured References</strong></p><p><a href="https://arxiv.org/abs/2106.01345%20">Decision Transformer: Reinforcement Learning via Sequence Modeling</a><br>Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch</p><p><a href="https://arxiv.org/abs/2104.10157">VideoGPT: Video Generation using VQ-VAE and Transformers</a><br>Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas</p>]]>
      </content:encoded>
      <pubDate>Sun, 08 May 2022 21:41:04 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/cb13a30d/98e58583.mp3" length="42244463" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/CKVpGkAHOncL8Geldtww-TATXjcsoCjIBtXvaUFqBds/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzg3OTg5Mi8x/NjUyMTEwNDU0LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>3513</itunes:duration>
      <itunes:summary>Aravind Srinivas, Research Scientist at OpenAI, returns to talk Decision Transformer, VideoGPT, choosing problems, and explore vs exploit in research careers</itunes:summary>
      <itunes:subtitle>Aravind Srinivas, Research Scientist at OpenAI, returns to talk Decision Transformer, VideoGPT, choosing problems, and explore vs exploit in research careers</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Rohin Shah</title>
      <itunes:episode>33</itunes:episode>
      <podcast:episode>33</podcast:episode>
      <itunes:title>Rohin Shah</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">4a9ae02a-1ee4-45ec-9ebe-b50011c2d3a8</guid>
      <link>https://share.transistor.fm/s/5ba5d6af</link>
      <description>
        <![CDATA[<p>Dr. Rohin Shah is a Research Scientist at DeepMind, and the editor and main contributor of the Alignment Newsletter.</p><p><strong>Featured References</strong></p><p><a href="https://arxiv.org/abs/2107.01969">The MineRL BASALT Competition on Learning from Human Feedback</a><br>Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan<strong></strong></p><p><a href="https://arxiv.org/abs/1902.04198">Preferences Implicit in the State of the World</a><br>Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan</p><p><a href="https://openreview.net/forum?id=DFIoGDZejIB">Benefits of Assistance over Reward Learning</a> <br>Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell</p><p><a href="https://arxiv.org/abs/1910.05789">On the Utility of Learning about Humans for Human-AI Coordination<br></a>Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan</p><p><a href="https://arxiv.org/abs/2101.05507">Evaluating the Robustness of Collaborative Agents<br></a>Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah</p><p><br><strong>Additional References</strong></p><ul><li><a href="https://www.eacambridge.org/technical-alignment-curriculum">AGI Safety Fundamentals</a>, EA Cambridge</li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Dr. Rohin Shah is a Research Scientist at DeepMind, and the editor and main contributor of the Alignment Newsletter.</p><p><strong>Featured References</strong></p><p><a href="https://arxiv.org/abs/2107.01969">The MineRL BASALT Competition on Learning from Human Feedback</a><br>Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan<strong></strong></p><p><a href="https://arxiv.org/abs/1902.04198">Preferences Implicit in the State of the World</a><br>Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan</p><p><a href="https://openreview.net/forum?id=DFIoGDZejIB">Benefits of Assistance over Reward Learning</a> <br>Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell</p><p><a href="https://arxiv.org/abs/1910.05789">On the Utility of Learning about Humans for Human-AI Coordination<br></a>Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan</p><p><a href="https://arxiv.org/abs/2101.05507">Evaluating the Robustness of Collaborative Agents<br></a>Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah</p><p><br><strong>Additional References</strong></p><ul><li><a href="https://www.eacambridge.org/technical-alignment-curriculum">AGI Safety Fundamentals</a>, EA Cambridge</li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 11 Apr 2022 19:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/5ba5d6af/41bb0ae3.mp3" length="81641978" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/CXFzvE3_-lBMeUAb_mgJN28gy8lgczaz-Qh_-4y7L9M/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzg0MjgzNS8x/NjQ4MzU3Njg1LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>5824</itunes:duration>
      <itunes:summary>DeepMind Research Scientist Dr. Rohin Shah on Value Alignment, Learning from Human feedback, Assistance paradigm, the BASALT MineRL competition, his Alignment Newsletter, and more!</itunes:summary>
      <itunes:subtitle>DeepMind Research Scientist Dr. Rohin Shah on Value Alignment, Learning from Human feedback, Assistance paradigm, the BASALT MineRL competition, his Alignment Newsletter, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/5ba5d6af/transcript.vtt" type="text/vtt" rel="captions"/>
    </item>
    <item>
      <title>Robert Lange</title>
      <itunes:episode>31</itunes:episode>
      <podcast:episode>31</podcast:episode>
      <itunes:title>Robert Lange</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">1411e95d-30fa-41ba-928c-d87162090ac9</guid>
      <link>https://share.transistor.fm/s/935a12e6</link>
      <description>
        <![CDATA[<p><a href="https://roberttlange.github.io/">Robert Tjarko Lange</a> is a PhD student working at the Technical University Berlin.</p><p><strong>Featured References</strong></p><p><a href="https://arxiv.org/abs/2010.04466">Learning not to learn: Nature versus nurture in silico</a><br>Lange, R. T., &amp; Sprekeler, H. (2020)</p><p><a href="https://arxiv.org/abs/2105.01648">On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning<br></a>Vischer, M. A., Lange, R. T., &amp; Sprekeler, H. (2021). </p><p><a href="https://arxiv.org/abs/1907.12477">Semantic RL with Action Grammars: Data-Efficient Learning of Hierarchical Task Abstractions<br></a>Lange, R. T., &amp; Faisal, A. (2019).</p><p><a href="https://github.com/mle-infrastructure">MLE-Infrastructure on Github</a><br><strong><br></strong><br><strong>Additional References</strong></p><ul><li><a href="https://arxiv.org/abs/1611.02779">RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning</a>, Duan et al 2016</li><li><a href="https://arxiv.org/abs/1611.05763">Learning to reinforcement learn</a>, Wang et al 2016</li><li><a href="https://arxiv.org/abs/2106.01345">Decision Transformer: Reinforcement Learning via Sequence Modeling</a>, Chen et al 2021</li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://roberttlange.github.io/">Robert Tjarko Lange</a> is a PhD student working at the Technical University Berlin.</p><p><strong>Featured References</strong></p><p><a href="https://arxiv.org/abs/2010.04466">Learning not to learn: Nature versus nurture in silico</a><br>Lange, R. T., &amp; Sprekeler, H. (2020)</p><p><a href="https://arxiv.org/abs/2105.01648">On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning<br></a>Vischer, M. A., Lange, R. T., &amp; Sprekeler, H. (2021). </p><p><a href="https://arxiv.org/abs/1907.12477">Semantic RL with Action Grammars: Data-Efficient Learning of Hierarchical Task Abstractions<br></a>Lange, R. T., &amp; Faisal, A. (2019).</p><p><a href="https://github.com/mle-infrastructure">MLE-Infrastructure on Github</a><br><strong><br></strong><br><strong>Additional References</strong></p><ul><li><a href="https://arxiv.org/abs/1611.02779">RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning</a>, Duan et al 2016</li><li><a href="https://arxiv.org/abs/1611.05763">Learning to reinforcement learn</a>, Wang et al 2016</li><li><a href="https://arxiv.org/abs/2106.01345">Decision Transformer: Reinforcement Learning via Sequence Modeling</a>, Chen et al 2021</li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 20 Dec 2021 01:00:00 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/935a12e6/8a440dc1.mp3" length="51111130" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/o6vEYiEWJJtdrLvYg6u6XQG76SQ_DrLkGfNP6VceQa8/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzc1NDIwMS8x/NjM5NzIyNzc5LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>4257</itunes:duration>
      <itunes:summary>Robert Lange on learning vs hard-coding, meta-RL, Lottery Tickets and Minimal Task Representations, Action Grammars and more!</itunes:summary>
      <itunes:subtitle>Robert Lange on learning vs hard-coding, meta-RL, Lottery Tickets and Minimal Task Representations, Action Grammars and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>NeurIPS 2021 Political Economy of Reinforcement Learning Systems (PERLS) Workshop</title>
      <itunes:episode>30</itunes:episode>
      <podcast:episode>30</podcast:episode>
      <itunes:title>NeurIPS 2021 Political Economy of Reinforcement Learning Systems (PERLS) Workshop</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">54a58df0-1bbb-4841-9b1d-f335eb565ee2</guid>
      <link>https://share.transistor.fm/s/3d58a0b7</link>
      <description>
        <![CDATA[<p>We hear about the idea of PERLS and why its important to talk about.</p><ul><li><a href="https://perls-workshop.github.io/">Political Economy of Reinforcement Learning (PERLS) Workshop at NeurIPS 2021 </a>on Tues Dec 14th </li><li><a href="https://neurips.cc/">NeurIPS 2021</a></li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>We hear about the idea of PERLS and why its important to talk about.</p><ul><li><a href="https://perls-workshop.github.io/">Political Economy of Reinforcement Learning (PERLS) Workshop at NeurIPS 2021 </a>on Tues Dec 14th </li><li><a href="https://neurips.cc/">NeurIPS 2021</a></li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Thu, 18 Nov 2021 15:53:22 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/3d58a0b7/dc66961c.mp3" length="17447343" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/keXi0yQqgzprTPNOhXUQCh1276PQ0F4XxqUEfJg0Cjs/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzcyNzU4MS8x/NjM3NjA2MTMyLWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>1447</itunes:duration>
      <itunes:summary>Dr. Thomas Gilbert and Dr. Mark Nitzberg on the upcoming PERLS Workshop @ NeurIPS 2021</itunes:summary>
      <itunes:subtitle>Dr. Thomas Gilbert and Dr. Mark Nitzberg on the upcoming PERLS Workshop @ NeurIPS 2021</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Amy Zhang</title>
      <itunes:episode>29</itunes:episode>
      <podcast:episode>29</podcast:episode>
      <itunes:title>Amy Zhang</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">1017dad7-fe28-4d9f-a9e0-78aaf1335e13</guid>
      <link>https://share.transistor.fm/s/069ca161</link>
      <description>
        <![CDATA[<p><a href="https://amyzhang.github.io/">Amy Zhang</a> is a postdoctoral scholar at UC Berkeley and a research scientist at Facebook AI Research. She will be starting as an assistant professor at UT Austin in Spring 2023. </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/2003.06016">Invariant Causal Prediction for Block MDPs</a> <br>Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup </p><p><a href="http://proceedings.mlr.press/v139/sodhani21a/sodhani21a.pdf">Multi-Task Reinforcement Learning with Context-based Representations</a> <br>Shagun Sodhani, Amy Zhang, Joelle Pineau </p><p><a href="https://arxiv.org/abs/2104.10159">MBRL-Lib: A Modular Library for Model-based Reinforcement Learning</a> <br>Luis Pineda, Brandon Amos, Amy Zhang, Nathan O. Lambert, Roberto Calandra <strong></strong></p><p><br>Additional References </p><ul><li><a href="https://www.youtube.com/watch?v=akeUVn6WQoU%20%20">Amy Zhang - Exploring Context for Better Generalization in Reinforcement Learning @ UCL DARK</a> </li><li><a href="https://icml.cc/virtual/2020/poster/6475">ICML 2020 Poster session: Invariant Causal Prediction for Block MDPs</a> </li><li><a href="https://www.youtube.com/watch?v=FvQbrE3tyoE">Clare Lyle - Invariant Prediction for Generalization in Reinforcement Learning @ Simons Institute</a> </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://amyzhang.github.io/">Amy Zhang</a> is a postdoctoral scholar at UC Berkeley and a research scientist at Facebook AI Research. She will be starting as an assistant professor at UT Austin in Spring 2023. </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/2003.06016">Invariant Causal Prediction for Block MDPs</a> <br>Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup </p><p><a href="http://proceedings.mlr.press/v139/sodhani21a/sodhani21a.pdf">Multi-Task Reinforcement Learning with Context-based Representations</a> <br>Shagun Sodhani, Amy Zhang, Joelle Pineau </p><p><a href="https://arxiv.org/abs/2104.10159">MBRL-Lib: A Modular Library for Model-based Reinforcement Learning</a> <br>Luis Pineda, Brandon Amos, Amy Zhang, Nathan O. Lambert, Roberto Calandra <strong></strong></p><p><br>Additional References </p><ul><li><a href="https://www.youtube.com/watch?v=akeUVn6WQoU%20%20">Amy Zhang - Exploring Context for Better Generalization in Reinforcement Learning @ UCL DARK</a> </li><li><a href="https://icml.cc/virtual/2020/poster/6475">ICML 2020 Poster session: Invariant Causal Prediction for Block MDPs</a> </li><li><a href="https://www.youtube.com/watch?v=FvQbrE3tyoE">Clare Lyle - Invariant Prediction for Generalization in Reinforcement Learning @ Simons Institute</a> </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 27 Sep 2021 10:27:12 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/069ca161/19719c6c.mp3" length="58534392" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/j6dEpjvJFO-xRRsG5TD8Zb6ptHnz46Ir0g-ho6gurEQ/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzY0NjkxNy8x/NjMyNzY0MTg2LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>4175</itunes:duration>
      <itunes:summary>Amy Zhang shares her work on Invariant Causal Prediction for Block MDPs, Multi-Task Reinforcement Learning with Context-based Representations, MBRL-Lib, shares insight on generalization on RL, and more!</itunes:summary>
      <itunes:subtitle>Amy Zhang shares her work on Invariant Causal Prediction for Block MDPs, Multi-Task Reinforcement Learning with Context-based Representations, MBRL-Lib, shares insight on generalization on RL, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Xianyuan Zhan</title>
      <itunes:episode>28</itunes:episode>
      <podcast:episode>28</podcast:episode>
      <itunes:title>Xianyuan Zhan</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">e9f1b4f5-5faf-421a-92f8-c2f258b57b83</guid>
      <link>https://share.transistor.fm/s/69d3bac0</link>
      <description>
        <![CDATA[<p><a href="http://zhanxianyuan.xyz/">Xianyuan Zhan</a> is currently a research assistant professor at the Institute for AI Industry Research (AIR), Tsinghua University.  He received his Ph.D. degree at Purdue University. Before joining Tsinghua University, Dr. Zhan worked as a researcher at Microsoft Research Asia (MSRA) and a data scientist at JD Technology.  At JD Technology, he led the research that uses offline RL to optimize real-world industrial systems. </p><p><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/2102.11492%20">DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning</a><br>Xianyuan Zhan, Haoran Xu, Yue Zhang, Yusen Huo, Xiangyu Zhu, Honglei Yin, Yu Zheng <br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="http://zhanxianyuan.xyz/">Xianyuan Zhan</a> is currently a research assistant professor at the Institute for AI Industry Research (AIR), Tsinghua University.  He received his Ph.D. degree at Purdue University. Before joining Tsinghua University, Dr. Zhan worked as a researcher at Microsoft Research Asia (MSRA) and a data scientist at JD Technology.  At JD Technology, he led the research that uses offline RL to optimize real-world industrial systems. </p><p><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/2102.11492%20">DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning</a><br>Xianyuan Zhan, Haoran Xu, Yue Zhang, Yusen Huo, Xiangyu Zhu, Honglei Yin, Yu Zheng <br></p>]]>
      </content:encoded>
      <pubDate>Mon, 30 Aug 2021 13:31:25 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/69d3bac0/83392bc4.mp3" length="34947172" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/ylXDop4-C8UFvlh9b30d_ukUIhzZsv71Sh3k5XEwB3w/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzYzNjEzMi8x/NjMyNzk4NTIzLWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>2490</itunes:duration>
      <itunes:summary>Xianyuan Zhan on DeepThermal for controlling thermal power plants, the MORE algorithm for Model-based Offline RL, comparing AI in China and the US, and more! </itunes:summary>
      <itunes:subtitle>Xianyuan Zhan on DeepThermal for controlling thermal power plants, the MORE algorithm for Model-based Offline RL, comparing AI in China and the US, and more! </itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title> Eugene Vinitsky</title>
      <itunes:episode>27</itunes:episode>
      <podcast:episode>27</podcast:episode>
      <itunes:title> Eugene Vinitsky</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">6b1a596a-ec14-4ab3-8f25-4b7a597c404f</guid>
      <link>https://share.transistor.fm/s/1098a9d3</link>
      <description>
        <![CDATA[<p><a href="https://eugenevinitsky.github.io/">Eugene Vinitsky</a> is a PhD student at UC Berkeley advised by Alexandre Bayen. He has interned at Tesla and Deepmind.  </p><p><br><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/2106.09012">A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings <br></a>Eugene Vinitsky, Raphael Köster, John P. Agapiou, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, Joel Z. Leibo <strong></strong></p><p><a href="https://arxiv.org/abs/2011.00120">Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL <br></a>Eugene Vinitsky, Nathan Lichtle, Kanaad Parvate, Alexandre Bayen </p><p><a href="https://ieeexplore.ieee.org/abstract/document/8569615">Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion</a> <br>Eugene Vinitsky; Kanaad Parvate; Aboudy Kreidieh; Cathy Wu; Alexandre Bayen 2018 </p><p><a href="https://arxiv.org/abs/2103.01955">The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games <br></a>Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://www.eclipse.org/sumo/">SUMO: Simulation of Urban MObility</a> </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://eugenevinitsky.github.io/">Eugene Vinitsky</a> is a PhD student at UC Berkeley advised by Alexandre Bayen. He has interned at Tesla and Deepmind.  </p><p><br><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/2106.09012">A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings <br></a>Eugene Vinitsky, Raphael Köster, John P. Agapiou, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, Joel Z. Leibo <strong></strong></p><p><a href="https://arxiv.org/abs/2011.00120">Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL <br></a>Eugene Vinitsky, Nathan Lichtle, Kanaad Parvate, Alexandre Bayen </p><p><a href="https://ieeexplore.ieee.org/abstract/document/8569615">Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion</a> <br>Eugene Vinitsky; Kanaad Parvate; Aboudy Kreidieh; Cathy Wu; Alexandre Bayen 2018 </p><p><a href="https://arxiv.org/abs/2103.01955">The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games <br></a>Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://www.eclipse.org/sumo/">SUMO: Simulation of Urban MObility</a> </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Wed, 18 Aug 2021 08:22:13 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/1098a9d3/e5e88137.mp3" length="55554362" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:duration>3962</itunes:duration>
      <itunes:summary>Eugene Vinitsky of UC Berkeley on social norms and sanctions, traffic simulation, mixed-autonomy traffic, and more!</itunes:summary>
      <itunes:subtitle>Eugene Vinitsky of UC Berkeley on social norms and sanctions, traffic simulation, mixed-autonomy traffic, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Jess Whittlestone</title>
      <itunes:episode>26</itunes:episode>
      <podcast:episode>26</podcast:episode>
      <itunes:title>Jess Whittlestone</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">c0d33c35-c239-43ce-bdad-0be057a6bc2f</guid>
      <link>https://share.transistor.fm/s/6721bf11</link>
      <description>
        <![CDATA[<p><a href="https://jesswhittlestone.com/">Dr. Jess Whittlestone</a> is a Senior Research Fellow at the Centre for the Study of Existential Risk and the Leverhulme Centre for the Future of Intelligence, both at the University of Cambridge. </p><p><br><strong>Featured References </strong></p><p><a href="https://jair.org/index.php/jair/article/view/12360/26667">The Societal Implications of Deep Reinforcement Learning</a> <br>Jess Whittlestone, Kai Arulkumaran, Matthew Crosby </p><p><a href="https://www.ijimai.org/journal/bibcite/reference/2905">Artificial Canaries: Early Warning Signs for Anticipatory and Democratic Governance of AI</a> <br>Carla Zoe Cremer, Jess Whittlestone <br><strong><br></strong><br><strong>Additional References </strong></p><ul><li><a href="https://www.youtube.com/watch?v=PO8-fegV4X0">CogX: Cutting Edge: Understanding AI systems for a better AI policy</a>, featuring Jack Clark and Jess Whittlestone </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://jesswhittlestone.com/">Dr. Jess Whittlestone</a> is a Senior Research Fellow at the Centre for the Study of Existential Risk and the Leverhulme Centre for the Future of Intelligence, both at the University of Cambridge. </p><p><br><strong>Featured References </strong></p><p><a href="https://jair.org/index.php/jair/article/view/12360/26667">The Societal Implications of Deep Reinforcement Learning</a> <br>Jess Whittlestone, Kai Arulkumaran, Matthew Crosby </p><p><a href="https://www.ijimai.org/journal/bibcite/reference/2905">Artificial Canaries: Early Warning Signs for Anticipatory and Democratic Governance of AI</a> <br>Carla Zoe Cremer, Jess Whittlestone <br><strong><br></strong><br><strong>Additional References </strong></p><ul><li><a href="https://www.youtube.com/watch?v=PO8-fegV4X0">CogX: Cutting Edge: Understanding AI systems for a better AI policy</a>, featuring Jack Clark and Jess Whittlestone </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Tue, 20 Jul 2021 10:59:35 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/6721bf11/eec9b5fa.mp3" length="77026215" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:duration>5496</itunes:duration>
      <itunes:summary>Jess Whittlestone on societal implications of deep reinforcement Learning, AI policy, warning signs of transformative progress in AI, and more!</itunes:summary>
      <itunes:subtitle>Jess Whittlestone on societal implications of deep reinforcement Learning, AI policy, warning signs of transformative progress in AI, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title> Aleksandra Faust</title>
      <itunes:episode>25</itunes:episode>
      <podcast:episode>25</podcast:episode>
      <itunes:title> Aleksandra Faust</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">a2e4bf42-e057-4b86-8db8-277a65535a8f</guid>
      <link>https://share.transistor.fm/s/65d745d4</link>
      <description>
        <![CDATA[<p>Dr Aleksandra Faust is a Staff Research Scientist and Reinforcement Learning research team co-founder at Google Brain Research. </p><p><strong>Featured References <br></strong><br><a href="https://www.cs.unm.edu/amprg/People/afaust/afaustThesis.pdf">Reinforcement Learning and Planning for Preference Balancing Tasks</a> <br>Faust 2014 </p><p><a href="https://arxiv.org/abs/1809.10124">Learning Navigation Behaviors End-to-End with AutoRL</a> <br>Hao-Tien Lewis Chiang, Aleksandra Faust, Marek Fiser, Anthony Francis </p><p><a href="https://arxiv.org/abs/1905.07628">Evolving Rewards to Automate Reinforcement Learning</a> <br>Aleksandra Faust, Anthony Francis, Dar Mehta </p><p><a href="https://openreview.net/forum?id=0XXpJ4OtjW">Evolving Reinforcement Learning Algorithms </a></p><p>John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust </p><p><br><a href="https://arxiv.org/abs/2103.01991">Adversarial Environment Generation for Learning to Navigate the Web</a> <br>Izzeddin Gur, Natasha Jaques, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust </p><p><br></p><p><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/2003.03384">AutoML-Zero: Evolving Machine Learning Algorithms From Scratch</a>, Esteban Real, Chen Liang, David R. So, Quoc V. Le </li></ul><p><br></p><p> </p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Dr Aleksandra Faust is a Staff Research Scientist and Reinforcement Learning research team co-founder at Google Brain Research. </p><p><strong>Featured References <br></strong><br><a href="https://www.cs.unm.edu/amprg/People/afaust/afaustThesis.pdf">Reinforcement Learning and Planning for Preference Balancing Tasks</a> <br>Faust 2014 </p><p><a href="https://arxiv.org/abs/1809.10124">Learning Navigation Behaviors End-to-End with AutoRL</a> <br>Hao-Tien Lewis Chiang, Aleksandra Faust, Marek Fiser, Anthony Francis </p><p><a href="https://arxiv.org/abs/1905.07628">Evolving Rewards to Automate Reinforcement Learning</a> <br>Aleksandra Faust, Anthony Francis, Dar Mehta </p><p><a href="https://openreview.net/forum?id=0XXpJ4OtjW">Evolving Reinforcement Learning Algorithms </a></p><p>John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust </p><p><br><a href="https://arxiv.org/abs/2103.01991">Adversarial Environment Generation for Learning to Navigate the Web</a> <br>Izzeddin Gur, Natasha Jaques, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust </p><p><br></p><p><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/2003.03384">AutoML-Zero: Evolving Machine Learning Algorithms From Scratch</a>, Esteban Real, Chen Liang, David R. So, Quoc V. Le </li></ul><p><br></p><p> </p>]]>
      </content:encoded>
      <pubDate>Tue, 06 Jul 2021 03:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/65d745d4/dff855ef.mp3" length="45865626" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:duration>3270</itunes:duration>
      <itunes:summary>Aleksandra Faust of Google Brain Research on AutoRL, meta-RL, learning to learn &amp;amp; learning to teach, curriculum learning, collaborations between senior and junior researchers, and more!</itunes:summary>
      <itunes:subtitle>Aleksandra Faust of Google Brain Research on AutoRL, meta-RL, learning to learn &amp;amp; learning to teach, curriculum learning, collaborations between senior and junior researchers, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Sam Ritter</title>
      <itunes:episode>24</itunes:episode>
      <podcast:episode>24</podcast:episode>
      <itunes:title>Sam Ritter</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">f0cebf08-24de-4cae-9613-c5eddba85e2e</guid>
      <link>https://share.transistor.fm/s/89ba97c3</link>
      <description>
        <![CDATA[<p><a href="https://scholar.google.com/citations?user=dg7wnfAAAAAJ&amp;hl=fr">Sam Ritter</a> is a Research Scientist on the neuroscience team at DeepMind. </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/1803.10760">Unsupervised Predictive Memory in a Goal-Directed Agent</a> (MERLIN) <br>Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap </p><p><a href="https://arxiv.org/abs/1805.09692%20">Meta-RL without forgetting:  Been There, Done That: Meta-Learning with Episodic Recall</a> <br>Samuel Ritter, Jane X. Wang, Zeb Kurth-Nelson, Siddhant M. Jayakumar, Charles Blundell, Razvan Pascanu, Matthew Botvinick </p><p><a href="https://dataspace.princeton.edu/handle/88435/dsp01dz010s84f">Meta-Reinforcement Learning with Episodic Recall: An Integrative Theory of Reward-Driven Learning</a> <br>Samuel Ritter 2019 </p><p><a href="https://arxiv.org/abs/2006.03662%20">Meta-RL exploration and planning: Rapid Task-Solving in Novel Environments <br></a>Sam Ritter, Ryan Faulkner, Laurent Sartran, Adam Santoro, Matt Botvinick, David Raposo </p><p><a href="https://arxiv.org/abs/2102.12425">Synthetic Returns for Long-Term Credit Assignment</a> <br>David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song <br> </p><p><strong>Additional References </strong></p><ul><li><a href="https://www.youtube.com/watch?v=_qpHcmhX9HM">Sam Ritter: Meta-Learning to Make Smart Inferences from Small Data</a> , North Star AI 2019 </li><li><a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">The Bitter Lesson</a>, Rich Sutton 2019 </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://scholar.google.com/citations?user=dg7wnfAAAAAJ&amp;hl=fr">Sam Ritter</a> is a Research Scientist on the neuroscience team at DeepMind. </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/1803.10760">Unsupervised Predictive Memory in a Goal-Directed Agent</a> (MERLIN) <br>Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap </p><p><a href="https://arxiv.org/abs/1805.09692%20">Meta-RL without forgetting:  Been There, Done That: Meta-Learning with Episodic Recall</a> <br>Samuel Ritter, Jane X. Wang, Zeb Kurth-Nelson, Siddhant M. Jayakumar, Charles Blundell, Razvan Pascanu, Matthew Botvinick </p><p><a href="https://dataspace.princeton.edu/handle/88435/dsp01dz010s84f">Meta-Reinforcement Learning with Episodic Recall: An Integrative Theory of Reward-Driven Learning</a> <br>Samuel Ritter 2019 </p><p><a href="https://arxiv.org/abs/2006.03662%20">Meta-RL exploration and planning: Rapid Task-Solving in Novel Environments <br></a>Sam Ritter, Ryan Faulkner, Laurent Sartran, Adam Santoro, Matt Botvinick, David Raposo </p><p><a href="https://arxiv.org/abs/2102.12425">Synthetic Returns for Long-Term Credit Assignment</a> <br>David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song <br> </p><p><strong>Additional References </strong></p><ul><li><a href="https://www.youtube.com/watch?v=_qpHcmhX9HM">Sam Ritter: Meta-Learning to Make Smart Inferences from Small Data</a> , North Star AI 2019 </li><li><a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">The Bitter Lesson</a>, Rich Sutton 2019 </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 21 Jun 2021 03:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/89ba97c3/58aa34a6.mp3" length="84579328" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/YmV6R9kjs1sjfhetkOCoSfLv7IMdZZximru2yS-T2WA/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzUxNDAwMS8x/NjMzMTkxMDA3LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>6035</itunes:duration>
      <itunes:summary>Sam Ritter of DeepMind on Neuroscience and RL, Episodic Memory, Meta-RL, Synthetic Returns, the MERLIN agent, decoding brain activation, and more!</itunes:summary>
      <itunes:subtitle>Sam Ritter of DeepMind on Neuroscience and RL, Episodic Memory, Meta-RL, Synthetic Returns, the MERLIN agent, decoding brain activation, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Thomas Krendl Gilbert</title>
      <itunes:episode>23</itunes:episode>
      <podcast:episode>23</podcast:episode>
      <itunes:title>Thomas Krendl Gilbert</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">929c3dbc-984a-489b-8937-109f7ff85dad</guid>
      <link>https://share.transistor.fm/s/218b3d07</link>
      <description>
        <![CDATA[<p><a href="https://www.thomaskrendlgilbert.com/">Thomas Krendl Gilbert</a> is a PhD student at UC Berkeley’s <a href="https://humancompatible.ai/">Center for Human-Compatible AI</a>, specializing in Machine Ethics and Epistemology. </p><p><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/1911.09005">Hard Choices in Artificial Intelligence: Addressing Normative Uncertainty through Sociotechnical Commitments <br></a>Roel Dobbe, Thomas Krendl Gilbert, Yonatan Mintz </p><p><a href="https://simons.berkeley.edu/news/mapping-political-economy-reinforcement-learning-systems-case-autonomous-vehicles%20">Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles</a> <br>Thomas Krendl Gilbert </p><p><a href="https://arxiv.org/abs/2102.04255">AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks <br></a>McKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert and Tom Zick </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://geesegraduates.org/2020/10/26/political-economy-of-reinforcement-learning/">Political Economy of Reinforcement Learning Systems (PERLS)</a> </li><li><a href="https://lpeproject.org/">The Law and Political Economy (LPE) Project</a> </li><li><a href="https://www.jair.org/index.php/jair/article/view/12360">The Societal Implications of Deep Reinforcement Learning</a>, Jess Whittlestone, Kai Arulkumaran, Matthew Crosby </li><li><a href="https://shows.acast.com/the-robot-brains/episodes/yann-lecun-on-how-he-brought-ai-to-facebook">Robot Brains Podcast: Yann LeCun explains why Facebook would crumble without AI</a> </li></ul><p><br></p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://www.thomaskrendlgilbert.com/">Thomas Krendl Gilbert</a> is a PhD student at UC Berkeley’s <a href="https://humancompatible.ai/">Center for Human-Compatible AI</a>, specializing in Machine Ethics and Epistemology. </p><p><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/1911.09005">Hard Choices in Artificial Intelligence: Addressing Normative Uncertainty through Sociotechnical Commitments <br></a>Roel Dobbe, Thomas Krendl Gilbert, Yonatan Mintz </p><p><a href="https://simons.berkeley.edu/news/mapping-political-economy-reinforcement-learning-systems-case-autonomous-vehicles%20">Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles</a> <br>Thomas Krendl Gilbert </p><p><a href="https://arxiv.org/abs/2102.04255">AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks <br></a>McKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert and Tom Zick </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://geesegraduates.org/2020/10/26/political-economy-of-reinforcement-learning/">Political Economy of Reinforcement Learning Systems (PERLS)</a> </li><li><a href="https://lpeproject.org/">The Law and Political Economy (LPE) Project</a> </li><li><a href="https://www.jair.org/index.php/jair/article/view/12360">The Societal Implications of Deep Reinforcement Learning</a>, Jess Whittlestone, Kai Arulkumaran, Matthew Crosby </li><li><a href="https://shows.acast.com/the-robot-brains/episodes/yann-lecun-on-how-he-brought-ai-to-facebook">Robot Brains Podcast: Yann LeCun explains why Facebook would crumble without AI</a> </li></ul><p><br></p><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 17 May 2021 04:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/218b3d07/27c40412.mp3" length="60759356" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/kyHhsPgQmALusWW-eAzUGGC3v9sm7qsnp_NI2pMkozY/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzUxMzk5OC8x/NjMyNzk5MTE0LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>4334</itunes:duration>
      <itunes:summary>Thomas Krendl Gilbert on the Political Economy of Reinforcement Learning Systems &amp;amp; Autonomous Vehicles, Sociotechnical Commitments, AI Development for the Public Interest, and more!</itunes:summary>
      <itunes:subtitle>Thomas Krendl Gilbert on the Political Economy of Reinforcement Learning Systems &amp;amp; Autonomous Vehicles, Sociotechnical Commitments, AI Development for the Public Interest, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Marc G. Bellemare</title>
      <itunes:episode>22</itunes:episode>
      <podcast:episode>22</podcast:episode>
      <itunes:title>Marc G. Bellemare</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">a373bda6-f952-4b79-8200-c1f8863cf411</guid>
      <link>https://share.transistor.fm/s/b3dfcd7d</link>
      <description>
        <![CDATA[<p><a href="http://www.marcgbellemare.info/">Professor Marc G. Bellemare</a> is a Research Scientist at Google Research (Brain team), An Adjunct Professor at McGill University, and a Canada CIFAR AI Chair. </p><p><strong>Featured References </strong></p><p><a href="https://jair.org/index.php/jair/article/view/10819">The Arcade Learning Environment: An Evaluation Platform for General Agents</a> <br>Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling <br><strong><br></strong><a href="https://www.nature.com/articles/nature14236">Human-level control through deep reinforcement learning</a> <br>Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg &amp; Demis Hassabis <br><strong><br></strong><a href="https://www.nature.com/articles/s41586-020-2939-8">Autonomous navigation of stratospheric balloons using reinforcement learning</a> <br>Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda &amp; Ziyu Wang </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://www.youtube.com/watch?v=5nFtBFD2Od8">CAIDA Talk: A tour of distributional reinforcement learning November 18, 2020 - Marc G. Bellemare</a> </li><li><a href="https://www.youtube.com/watch?v=PBdGge2ipCg">Amii AI Seminar Series:  Autonomous nav of stratospheric balloons using RL</a>, Marlos C. Machado </li><li><a href="https://www.youtube.com/watch?v=F-6sc88xPuA">UMD RLSS | Marc Bellemare | A History of Reinforcement Learning: Atari to Stratospheric Balloons</a> </li><li><a href="https://www.talkrl.com/episodes/marlos-machado">TalkRL: Marlos C. Machado</a>, Dr. Machado also spoke to us about various aspects of ALE and Project Loon in depth </li><li><a href="https://arxiv.org/abs/1902.06865">Hyperbolic discounting and learning over multiple horizons</a>, Fedus et al 2019 </li><li><a href="https://twitter.com/marcgbellemare">Marc G. Bellemare</a> on Twitter </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="http://www.marcgbellemare.info/">Professor Marc G. Bellemare</a> is a Research Scientist at Google Research (Brain team), An Adjunct Professor at McGill University, and a Canada CIFAR AI Chair. </p><p><strong>Featured References </strong></p><p><a href="https://jair.org/index.php/jair/article/view/10819">The Arcade Learning Environment: An Evaluation Platform for General Agents</a> <br>Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling <br><strong><br></strong><a href="https://www.nature.com/articles/nature14236">Human-level control through deep reinforcement learning</a> <br>Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg &amp; Demis Hassabis <br><strong><br></strong><a href="https://www.nature.com/articles/s41586-020-2939-8">Autonomous navigation of stratospheric balloons using reinforcement learning</a> <br>Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda &amp; Ziyu Wang </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://www.youtube.com/watch?v=5nFtBFD2Od8">CAIDA Talk: A tour of distributional reinforcement learning November 18, 2020 - Marc G. Bellemare</a> </li><li><a href="https://www.youtube.com/watch?v=PBdGge2ipCg">Amii AI Seminar Series:  Autonomous nav of stratospheric balloons using RL</a>, Marlos C. Machado </li><li><a href="https://www.youtube.com/watch?v=F-6sc88xPuA">UMD RLSS | Marc Bellemare | A History of Reinforcement Learning: Atari to Stratospheric Balloons</a> </li><li><a href="https://www.talkrl.com/episodes/marlos-machado">TalkRL: Marlos C. Machado</a>, Dr. Machado also spoke to us about various aspects of ALE and Project Loon in depth </li><li><a href="https://arxiv.org/abs/1902.06865">Hyperbolic discounting and learning over multiple horizons</a>, Fedus et al 2019 </li><li><a href="https://twitter.com/marcgbellemare">Marc G. Bellemare</a> on Twitter </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Wed, 12 May 2021 17:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/b3dfcd7d/c5dfe49e.mp3" length="48521177" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/lzuvuM4kTGKSmCB7eRBseITso-UGU_x26Qair30eVH4/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzUxMzk5OS8x/NjMzNTM2ODUxLWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>3460</itunes:duration>
      <itunes:summary>Marc G. Bellemare shares insight on his work including Deep Q-Networks, Distributional RL, Project Loon and RL in the Stratosphere, the origins of the Arcade Learning Environment, the future of Benchmarking in RL -- and more!</itunes:summary>
      <itunes:subtitle>Marc G. Bellemare shares insight on his work including Deep Q-Networks, Distributional RL, Project Loon and RL in the Stratosphere, the origins of the Arcade Learning Environment, the future of Benchmarking in RL -- and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Robert Osazuwa Ness</title>
      <itunes:episode>21</itunes:episode>
      <podcast:episode>21</podcast:episode>
      <itunes:title>Robert Osazuwa Ness</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">c4c3e0f8-e8a5-4860-96bd-a39cfb9dbed5</guid>
      <link>https://share.transistor.fm/s/20d8c879</link>
      <description>
        <![CDATA[<p><a href="https://twitter.com/osazuwa">Robert Osazuwa Ness</a> is an adjunct professor of computer science at Northeastern University, an ML Research Engineer at <a href="https://gamalon.com/">Gamalon</a>, and the founder of <a href="https://www.altdeep.ai/">AltDeep School of AI</a>.  He holds a PhD in statistics.  He studied at Johns Hopkins SAIS and then Purdue University. </p><p><br><strong>References </strong></p><ul><li><a href="https://www.altdeep.ai/">Altdeep School of AI</a>, Altdeep on <a href="https://www.twitch.tv/altdeepai">Twitch</a>, <a href="https://altdeep.substack.com/">Substack</a>, Robert Ness </li><li><a href="https://altdeep.ai/p/causal-ml-minicourse">Altdeep Causal Generative Machine Learning Minicourse</a>, Free course </li><li><a href="https://scholar.google.com/citations?user=8gWTOBAAAAAJ&amp;hl=en">Robert Osazuwa Ness on Google Scholar</a> </li><li><a href="https://gamalon.com/">Gamalon Inc</a> </li><li><a href="https://crl.causalai.net/">Causal Reinforcement Learning</a> talks, Elias Bareinboim </li><li><a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">The Bitter Lesson</a>, Rich Sutton 2019 </li><li><a href="https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.5466">The Need for Biases in Learning Generalizations</a>, Tom Mitchell 1980 </li><li><a href="https://arxiv.org/abs/1706.04317">Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics</a>, Kansky et al 2017 </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://twitter.com/osazuwa">Robert Osazuwa Ness</a> is an adjunct professor of computer science at Northeastern University, an ML Research Engineer at <a href="https://gamalon.com/">Gamalon</a>, and the founder of <a href="https://www.altdeep.ai/">AltDeep School of AI</a>.  He holds a PhD in statistics.  He studied at Johns Hopkins SAIS and then Purdue University. </p><p><br><strong>References </strong></p><ul><li><a href="https://www.altdeep.ai/">Altdeep School of AI</a>, Altdeep on <a href="https://www.twitch.tv/altdeepai">Twitch</a>, <a href="https://altdeep.substack.com/">Substack</a>, Robert Ness </li><li><a href="https://altdeep.ai/p/causal-ml-minicourse">Altdeep Causal Generative Machine Learning Minicourse</a>, Free course </li><li><a href="https://scholar.google.com/citations?user=8gWTOBAAAAAJ&amp;hl=en">Robert Osazuwa Ness on Google Scholar</a> </li><li><a href="https://gamalon.com/">Gamalon Inc</a> </li><li><a href="https://crl.causalai.net/">Causal Reinforcement Learning</a> talks, Elias Bareinboim </li><li><a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">The Bitter Lesson</a>, Rich Sutton 2019 </li><li><a href="https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.5466">The Need for Biases in Learning Generalizations</a>, Tom Mitchell 1980 </li><li><a href="https://arxiv.org/abs/1706.04317">Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics</a>, Kansky et al 2017 </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Sat, 08 May 2021 14:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/20d8c879/ed15d35b.mp3" length="66211304" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/Gz0oEV_8mpt48jKKS02Ou1LltCf0xmn0jmWWX9aAUUo/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzQ4MzQwNi8x/NjMyODYxMjg1LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>4723</itunes:duration>
      <itunes:summary>Dr. Robert Osazuwa Ness on Causal Inference, Probabilistic and Generative Models, Causality and RL, AltDeep School of AI, Pyro, and more!</itunes:summary>
      <itunes:subtitle>Dr. Robert Osazuwa Ness on Causal Inference, Probabilistic and Generative Models, Causality and RL, AltDeep School of AI, Pyro, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Marlos C. Machado</title>
      <itunes:episode>20</itunes:episode>
      <podcast:episode>20</podcast:episode>
      <itunes:title>Marlos C. Machado</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">8c928000-5072-4b30-82f1-ec69fdcfe452</guid>
      <link>https://share.transistor.fm/s/34025ece</link>
      <description>
        <![CDATA[<p>Dr. Marlos C. Machado is a research scientist at DeepMind and an adjunct professor at the University of Alberta. He holds a PhD from the University of Alberta and a MSc and BSc from UFMG, in Brazil. </p><p><br><strong>Featured References </strong></p><p><a href="https://jair.org/index.php/jair/article/view/11182">Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents <br></a>Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew J. Hausknecht, Michael Bowling </p><p><a href="https://openreview.net/pdf?id=qda7-sVg84">Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning</a> [ <a href="https://slideslive.com/38942373/contrastive-behavioral-similarity-embeddings-for-generalization-in-reinforcement-learning">video</a> ] <br>Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare </p><p><a href="https://era.library.ualberta.ca/items/581b87e0-a777-40a1-9776-f85a85864d6c/view/76e2f662-ec7f-4553-9392-3ba0dca44dc6/Marlos%20Cholodovskis%20Machado_Thesis.pdf">Efficient Exploration in Reinforcement Learning through Time-Based Representations</a> <br>Marlos C. Machado </p><p><a href="http://proceedings.mlr.press/v70/machado17a/machado17a.pdf">A Laplacian Framework for Option Discovery in Reinforcement Learning </a>[ <a href="https://vimeo.com/237274347">video</a> ] <br>Marlos C. Machado, Marc G. Bellemare, Michael H. Bowling </p><p><a href="https://openreview.net/forum?id=Bk8ZcAxR-">Eigenoption Discovery through the Deep Successor Representation</a> <br>Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell </p><p><a href="https://openreview.net/forum?id=SkeIyaVtwB">Exploration in Reinforcement Learning with Deep Covering Options</a> <br>Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Dimitri Konidaris </p><p><a href="https://www.nature.com/articles/s41586-020-2939-8">Autonomous navigation of stratospheric balloons using reinforcement learning</a> <br>Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda &amp; Ziyu Wang </p><p><a href="https://openreview.net/forum?id=HkGmDsR9YQ">Generalization and Regularization in DQN</a> <br>Jesse Farebrother, Marlos C. Machado, Michael Bowling </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://www.youtube.com/watch?v=PBdGge2ipCg">Amii AI Seminar Series: Marlos C. Machado - Autonomous navigation of stratospheric balloons using RL</a> </li><li><a href="http://www.ifaamas.org/Proceedings/aamas2016/pdfs/p485.pdf">State of the Art Control of Atari Games Using Shallow Reinforcement Learning</a>, Liang et al </li><li><a href="https://arxiv.org/abs/1606.05593">Introspective Agents: Confidence Measures for General Value Functions</a>, Sherstan et al </li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Dr. Marlos C. Machado is a research scientist at DeepMind and an adjunct professor at the University of Alberta. He holds a PhD from the University of Alberta and a MSc and BSc from UFMG, in Brazil. </p><p><br><strong>Featured References </strong></p><p><a href="https://jair.org/index.php/jair/article/view/11182">Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents <br></a>Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew J. Hausknecht, Michael Bowling </p><p><a href="https://openreview.net/pdf?id=qda7-sVg84">Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning</a> [ <a href="https://slideslive.com/38942373/contrastive-behavioral-similarity-embeddings-for-generalization-in-reinforcement-learning">video</a> ] <br>Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare </p><p><a href="https://era.library.ualberta.ca/items/581b87e0-a777-40a1-9776-f85a85864d6c/view/76e2f662-ec7f-4553-9392-3ba0dca44dc6/Marlos%20Cholodovskis%20Machado_Thesis.pdf">Efficient Exploration in Reinforcement Learning through Time-Based Representations</a> <br>Marlos C. Machado </p><p><a href="http://proceedings.mlr.press/v70/machado17a/machado17a.pdf">A Laplacian Framework for Option Discovery in Reinforcement Learning </a>[ <a href="https://vimeo.com/237274347">video</a> ] <br>Marlos C. Machado, Marc G. Bellemare, Michael H. Bowling </p><p><a href="https://openreview.net/forum?id=Bk8ZcAxR-">Eigenoption Discovery through the Deep Successor Representation</a> <br>Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell </p><p><a href="https://openreview.net/forum?id=SkeIyaVtwB">Exploration in Reinforcement Learning with Deep Covering Options</a> <br>Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Dimitri Konidaris </p><p><a href="https://www.nature.com/articles/s41586-020-2939-8">Autonomous navigation of stratospheric balloons using reinforcement learning</a> <br>Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda &amp; Ziyu Wang </p><p><a href="https://openreview.net/forum?id=HkGmDsR9YQ">Generalization and Regularization in DQN</a> <br>Jesse Farebrother, Marlos C. Machado, Michael Bowling </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://www.youtube.com/watch?v=PBdGge2ipCg">Amii AI Seminar Series: Marlos C. Machado - Autonomous navigation of stratospheric balloons using RL</a> </li><li><a href="http://www.ifaamas.org/Proceedings/aamas2016/pdfs/p485.pdf">State of the Art Control of Atari Games Using Shallow Reinforcement Learning</a>, Liang et al </li><li><a href="https://arxiv.org/abs/1606.05593">Introspective Agents: Confidence Measures for General Value Functions</a>, Sherstan et al </li></ul>]]>
      </content:encoded>
      <pubDate>Mon, 12 Apr 2021 07:50:51 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/34025ece/f14b3e7d.mp3" length="76961208" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/tDkZuBvc6jZn-6HuS4kC9x834BPb-4w-2Ha1xMp45rk/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzUxMzg5NC8x/NjMyNzc5NjU1LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>5491</itunes:duration>
      <itunes:summary>Marlos C. Machado on Arcade Learning Environment Evaluation, Generalization and Exploration in RL, Eigenoptions, Autonomous navigation of stratospheric balloons with RL, and more!</itunes:summary>
      <itunes:subtitle>Marlos C. Machado on Arcade Learning Environment Evaluation, Generalization and Exploration in RL, Eigenoptions, Autonomous navigation of stratospheric balloons with RL, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Nathan Lambert</title>
      <itunes:episode>19</itunes:episode>
      <podcast:episode>19</podcast:episode>
      <itunes:title>Nathan Lambert</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">fb59de70-033f-4518-b7a8-bac42890f06a</guid>
      <link>https://share.transistor.fm/s/ed96159d</link>
      <description>
        <![CDATA[<p><a href="https://www.natolambert.com/">Nathan Lambert</a> is a PhD Candidate at UC Berkeley. </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/2012.09156">Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning <br></a>Nathan O. Lambert, Albert Wilcox, Howard Zhang, Kristofer S. J. Pister, Roberto Calandra </p><p><a href="https://arxiv.org/abs/2002.04523">Objective Mismatch in Model-based Reinforcement Learning <br></a>Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra </p><p><a href="https://arxiv.org/abs/1901.03737">Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning <br></a>Nathan O. Lambert, Daniel S. Drew, Joseph Yaconelli, Roberto Calandra, Sergey Levine, Kristofer S.J. Pister </p><p><a href="https://arxiv.org/abs/2102.13651">On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning <br></a>Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://robotic.substack.com/">Nathan Lambert's blog</a> </li><li><a href="https://scholar.google.com/citations?user=O4jW7BsAAAAJ&amp;hl">Nathan Lambert</a> on Google scholar </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://www.natolambert.com/">Nathan Lambert</a> is a PhD Candidate at UC Berkeley. </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/2012.09156">Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning <br></a>Nathan O. Lambert, Albert Wilcox, Howard Zhang, Kristofer S. J. Pister, Roberto Calandra </p><p><a href="https://arxiv.org/abs/2002.04523">Objective Mismatch in Model-based Reinforcement Learning <br></a>Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra </p><p><a href="https://arxiv.org/abs/1901.03737">Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning <br></a>Nathan O. Lambert, Daniel S. Drew, Joseph Yaconelli, Roberto Calandra, Sergey Levine, Kristofer S.J. Pister </p><p><a href="https://arxiv.org/abs/2102.13651">On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning <br></a>Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://robotic.substack.com/">Nathan Lambert's blog</a> </li><li><a href="https://scholar.google.com/citations?user=O4jW7BsAAAAJ&amp;hl">Nathan Lambert</a> on Google scholar </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 22 Mar 2021 15:21:27 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/ed96159d/1693fc7e.mp3" length="42577778" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/xOj0j1AhOUDMOpiE3KZxmFHB0H7yhF-U1t4ozLeZmZc/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzQ5MDMyMi8x/NjMyODY5NzM2LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>3035</itunes:duration>
      <itunes:summary>Nathan Lambert on Model-based RL, Trajectory-based models, Quadrotor control, Hyperparameter Optimization for MBRL, RL vs PID control, and more!</itunes:summary>
      <itunes:subtitle>Nathan Lambert on Model-based RL, Trajectory-based models, Quadrotor control, Hyperparameter Optimization for MBRL, RL vs PID control, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Kai Arulkumaran</title>
      <itunes:episode>18</itunes:episode>
      <podcast:episode>18</podcast:episode>
      <itunes:title>Kai Arulkumaran</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">c2266c9e-301f-4f8d-9c1c-4753af257827</guid>
      <link>https://share.transistor.fm/s/422c77b9</link>
      <description>
        <![CDATA[<p>Kai Arulkumaran is a researcher at Araya in Tokyo. </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/1902.01724">AlphaStar: An Evolutionary Computation Perspective</a> <br>Kai Arulkumaran, Antoine Cully, Julian Togelius </p><p><a href="https://arxiv.org/abs/1912.08324">Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation</a> <br>Tianhong Dai, Kai Arulkumaran, Tamara Gerbert, Samyakh Tukra, Feryal Behbahani, Anil Anthony Bharath </p><p><a href="https://arxiv.org/abs/1912.02877">Training Agents using Upside-Down Reinforcement Learning</a> <br>Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber </p><p><strong><br>Additional References </strong></p><ul><li><a href="https://www.araya.org/en/">Araya</a> </li><li><a href="https://nnaisense.com/">NNAISENSE</a> </li><li><a href="https://scholar.google.com/citations?user=QKCypSoAAAAJ">Kai Arulkumaran</a> on Google Scholar </li><li><a href="https://github.com/Kaixhin/rlenvs">https://github.com/Kaixhin/rlenvs</a> </li><li><a href="https://github.com/Kaixhin/Atari">https://github.com/Kaixhin/Atari</a> </li><li><a href="https://github.com/Kaixhin/Rainbow">https://github.com/Kaixhin/Rainbow</a> </li><li>Tschiatschek, S., Arulkumaran, K., Stühmer, J. &amp; Hofmann, K. (2018). <a href="https://arxiv.org/abs/1805.09281">Variational Inference for Data-Efficient Model Learning in POMDPs</a>. arXiv:1805.09281. </li><li>Arulkumaran, K., Dilokthanakul, N., Shanahan, M. &amp; Bharath, A. A. (2016). <a href="https://arxiv.org/abs/1604.08153">Classifying Options for Deep Reinforcement Learning</a>. International Joint Conference on Artificial Intelligence, Deep Reinforcement Learning Workshop. </li><li>Garnelo, M., Arulkumaran, K. &amp; Shanahan, M. (2016). <a href="https://arxiv.org/abs/1609.05518">Towards Deep Symbolic Reinforcement Learning</a>. Annual Conference on Neural Information Processing Systems, Deep Reinforcement Learning Workshop. </li><li>Arulkumaran, K., Deisenroth, M. P., Brundage, M. &amp; Bharath, A. A. (2017). <a href="https://ieeexplore.ieee.org/abstract/document/8103164">Deep reinforcement learning: A brief survey</a>. IEEE Signal Processing Magazine. </li><li>Agostinelli, A., Arulkumaran, K., Sarrico, M., Richemond, P. &amp; Bharath, A. A. (2019). <a href="https://arxiv.org/abs/1911.09560">Memory-Efficient Episodic Control Reinforcement Learning with Dynamic Online k-means</a>. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning. </li><li>Sarrico, M., Arulkumaran, K., Agostinelli, A., Richemond, P. &amp; Bharath, A. A. (2019). <a href="https://arxiv.org/abs/1911.09615">Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control</a>. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning. </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Kai Arulkumaran is a researcher at Araya in Tokyo. </p><p><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/1902.01724">AlphaStar: An Evolutionary Computation Perspective</a> <br>Kai Arulkumaran, Antoine Cully, Julian Togelius </p><p><a href="https://arxiv.org/abs/1912.08324">Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation</a> <br>Tianhong Dai, Kai Arulkumaran, Tamara Gerbert, Samyakh Tukra, Feryal Behbahani, Anil Anthony Bharath </p><p><a href="https://arxiv.org/abs/1912.02877">Training Agents using Upside-Down Reinforcement Learning</a> <br>Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber </p><p><strong><br>Additional References </strong></p><ul><li><a href="https://www.araya.org/en/">Araya</a> </li><li><a href="https://nnaisense.com/">NNAISENSE</a> </li><li><a href="https://scholar.google.com/citations?user=QKCypSoAAAAJ">Kai Arulkumaran</a> on Google Scholar </li><li><a href="https://github.com/Kaixhin/rlenvs">https://github.com/Kaixhin/rlenvs</a> </li><li><a href="https://github.com/Kaixhin/Atari">https://github.com/Kaixhin/Atari</a> </li><li><a href="https://github.com/Kaixhin/Rainbow">https://github.com/Kaixhin/Rainbow</a> </li><li>Tschiatschek, S., Arulkumaran, K., Stühmer, J. &amp; Hofmann, K. (2018). <a href="https://arxiv.org/abs/1805.09281">Variational Inference for Data-Efficient Model Learning in POMDPs</a>. arXiv:1805.09281. </li><li>Arulkumaran, K., Dilokthanakul, N., Shanahan, M. &amp; Bharath, A. A. (2016). <a href="https://arxiv.org/abs/1604.08153">Classifying Options for Deep Reinforcement Learning</a>. International Joint Conference on Artificial Intelligence, Deep Reinforcement Learning Workshop. </li><li>Garnelo, M., Arulkumaran, K. &amp; Shanahan, M. (2016). <a href="https://arxiv.org/abs/1609.05518">Towards Deep Symbolic Reinforcement Learning</a>. Annual Conference on Neural Information Processing Systems, Deep Reinforcement Learning Workshop. </li><li>Arulkumaran, K., Deisenroth, M. P., Brundage, M. &amp; Bharath, A. A. (2017). <a href="https://ieeexplore.ieee.org/abstract/document/8103164">Deep reinforcement learning: A brief survey</a>. IEEE Signal Processing Magazine. </li><li>Agostinelli, A., Arulkumaran, K., Sarrico, M., Richemond, P. &amp; Bharath, A. A. (2019). <a href="https://arxiv.org/abs/1911.09560">Memory-Efficient Episodic Control Reinforcement Learning with Dynamic Online k-means</a>. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning. </li><li>Sarrico, M., Arulkumaran, K., Agostinelli, A., Richemond, P. &amp; Bharath, A. A. (2019). <a href="https://arxiv.org/abs/1911.09615">Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control</a>. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning. </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 15 Mar 2021 21:44:05 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/422c77b9/664e66c7.mp3" length="53760183" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/um1fxRPL6-IuhgIBxKOmCFiqQF35MVU-bNoLwzgiH2w/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzQ5MDMzMC8x/NjMyNzk4NjEyLWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>2786</itunes:duration>
      <itunes:summary>Kai Arulkumaran on AlphaStar and Evolutionary Computation, Domain Randomisation, Upside-Down Reinforcement Learning, Araya, NNAISENSE, and more!</itunes:summary>
      <itunes:subtitle>Kai Arulkumaran on AlphaStar and Evolutionary Computation, Domain Randomisation, Upside-Down Reinforcement Learning, Araya, NNAISENSE, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Michael Dennis</title>
      <itunes:episode>17</itunes:episode>
      <podcast:episode>17</podcast:episode>
      <itunes:title>Michael Dennis</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">5d4ca8bd-b9c6-4e2f-962e-7dd3bd940a6d</guid>
      <link>https://share.transistor.fm/s/ec36bb69</link>
      <description>
        <![CDATA[<p><a href="https://twitter.com/michaeld1729">Michael Dennis</a> is a PhD student at the <a href="https://humancompatible.ai/">Center for Human-Compatible AI</a> at UC Berkeley, supervised by <a href="http://people.eecs.berkeley.edu/~russell/">Professor Stuart Russell</a>. </p>I'm interested in robustness in RL and multi-agent RL, specifically as it applies to making the interaction between AI systems and society at large to be more beneficial.   <p><em>--Michael Dennis </em></p><p><br><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/2012.02096"><strong>Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design</strong></a><strong> </strong>[PAIRED] <strong><br></strong>Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine <br><a href="https://www.youtube.com/channel/UCI6dkF8eNrCz6XiBJlV9fmw/videos">Videos</a> <br> <strong><br></strong><a href="https://arxiv.org/abs/1905.10615"><strong>Adversarial Policies: Attacking Deep Reinforcement Learning</strong></a><strong> </strong></p><p>Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell <br><a href="https://adversarialpolicies.github.io/">Homepage and Videos</a> </p><p><a href="https://arxiv.org/abs/2101.10305"><strong>Accumulating Risk Capital Through Investing in Cooperation</strong></a><strong> <br></strong>Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell </p><p><br><a href="https://arxiv.org/abs/2006.13900"><strong>Quantifying Differences in Reward Functions</strong></a><strong> </strong>[EPIC] <strong><br></strong>Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://dl.acm.org/doi/10.1145/2716322">Safe Opponent Exploitation</a>, Sam Ganzfried And Tuomas Sandholm 2015 </li><li><a href="https://arxiv.org/abs/1810.08647">Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning</a>, Natasha Jaques et al 2019 </li><li><a href="https://arxiv.org/abs/1903.00742">Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research</a>, Leibo et al 2019 </li><li><a href="https://arxiv.org/abs/1912.01588">Leveraging Procedural Generation to Benchmark Reinforcement Learning</a>, Karl Cobbe et al 2019 </li><li><a href="https://arxiv.org/abs/1901.01753">Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions</a>, Wang et al 2019 </li><li><a href="https://proceedings.neurips.cc/paper/2020/hash/b607ba543ad05417b8507ee86c54fcb7-Abstract.html">Consequences of Misaligned AI</a>, Zhuang et al 2020 </li><li><a href="https://arxiv.org/abs/1902.09725">Conservative Agency via Attainable Utility Preservation</a>, Turner et al 2019 </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://twitter.com/michaeld1729">Michael Dennis</a> is a PhD student at the <a href="https://humancompatible.ai/">Center for Human-Compatible AI</a> at UC Berkeley, supervised by <a href="http://people.eecs.berkeley.edu/~russell/">Professor Stuart Russell</a>. </p>I'm interested in robustness in RL and multi-agent RL, specifically as it applies to making the interaction between AI systems and society at large to be more beneficial.   <p><em>--Michael Dennis </em></p><p><br><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/2012.02096"><strong>Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design</strong></a><strong> </strong>[PAIRED] <strong><br></strong>Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine <br><a href="https://www.youtube.com/channel/UCI6dkF8eNrCz6XiBJlV9fmw/videos">Videos</a> <br> <strong><br></strong><a href="https://arxiv.org/abs/1905.10615"><strong>Adversarial Policies: Attacking Deep Reinforcement Learning</strong></a><strong> </strong></p><p>Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell <br><a href="https://adversarialpolicies.github.io/">Homepage and Videos</a> </p><p><a href="https://arxiv.org/abs/2101.10305"><strong>Accumulating Risk Capital Through Investing in Cooperation</strong></a><strong> <br></strong>Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell </p><p><br><a href="https://arxiv.org/abs/2006.13900"><strong>Quantifying Differences in Reward Functions</strong></a><strong> </strong>[EPIC] <strong><br></strong>Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://dl.acm.org/doi/10.1145/2716322">Safe Opponent Exploitation</a>, Sam Ganzfried And Tuomas Sandholm 2015 </li><li><a href="https://arxiv.org/abs/1810.08647">Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning</a>, Natasha Jaques et al 2019 </li><li><a href="https://arxiv.org/abs/1903.00742">Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research</a>, Leibo et al 2019 </li><li><a href="https://arxiv.org/abs/1912.01588">Leveraging Procedural Generation to Benchmark Reinforcement Learning</a>, Karl Cobbe et al 2019 </li><li><a href="https://arxiv.org/abs/1901.01753">Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions</a>, Wang et al 2019 </li><li><a href="https://proceedings.neurips.cc/paper/2020/hash/b607ba543ad05417b8507ee86c54fcb7-Abstract.html">Consequences of Misaligned AI</a>, Zhuang et al 2020 </li><li><a href="https://arxiv.org/abs/1902.09725">Conservative Agency via Attainable Utility Preservation</a>, Turner et al 2019 </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 25 Jan 2021 21:27:00 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/ec36bb69/362214a8.mp3" length="51193523" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/BWUlB6CSClt2WtSNOOR9SMO36sJh8WV-U_nCduKyXiQ/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzQ0MTk4My8x/NjMzOTYwMjYxLWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>3650</itunes:duration>
      <itunes:summary>Michael Dennis on Human-Compatible AI, Game Theory, PAIRED, ARCTIC, EPIC, and lots more!</itunes:summary>
      <itunes:subtitle>Michael Dennis on Human-Compatible AI, Game Theory, PAIRED, ARCTIC, EPIC, and lots more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Roman Ring</title>
      <itunes:episode>16</itunes:episode>
      <podcast:episode>16</podcast:episode>
      <itunes:title>Roman Ring</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">04ff14c7-f393-4cb5-8b4b-b6ec41b21d5b</guid>
      <link>https://share.transistor.fm/s/4e4a1b63</link>
      <description>
        <![CDATA[<p><a href="http://inoryy.com/">Roman Ring</a> is a Research Engineer at DeepMind. </p><p><strong>Featured References </strong></p><p><a href="https://www.nature.com/articles/s41586-019-1724-z.epdf?author_access_token=lZH3nqPYtWJXfDA10W0CNNRgN0jAjWel9jnR3ZoTv0PSZcPzJFGNAZhOlk4deBCKzKm70KfinloafEF1bCCXL6IIHHgKaDkaTkBcTEv7aT-wqDoG1VeO9-wO3GEoAMF9bAOt7mJ0RWQnRVMbyfgH9A%3D%3D">Grandmaster level in StarCraft II using multi-agent reinforcement learning</a> <br>Vinyals et al, 2019 </p><p><a href="http://inoryy.com/files/ring_roman_bsc.pdf">Replicating DeepMind StarCraft II Reinforcement Learning Benchmark with Actor-Critic Methods</a> <br>Roman Ring, 2018 </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/1806.01830">Relational Deep Reinforcement Learning</a>,  Zambaldi et al 2018 </li><li><a href="https://arxiv.org/abs/1708.04782">StarCraft II: A New Challenge for Reinforcement Learning</a>, Vinyals et al 2017 </li><li><a href="https://arxiv.org/abs/1602.0495">Safe and Efficient Off-Policy Reinforcement Learning</a> [Retrace(<em>λ</em>)], Munos et al 2016 </li><li><a href="https://arxiv.org/abs/1611.01224">Sample Efficient Actor-Critic with Experience Replay</a> [ACER], Wang et al 2016 </li><li><a href="https://arxiv.org/abs/1802.01561">IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures</a> [IMPALA/V-trace], Espeholt et al 2018 </li></ul><p><br></p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="http://inoryy.com/">Roman Ring</a> is a Research Engineer at DeepMind. </p><p><strong>Featured References </strong></p><p><a href="https://www.nature.com/articles/s41586-019-1724-z.epdf?author_access_token=lZH3nqPYtWJXfDA10W0CNNRgN0jAjWel9jnR3ZoTv0PSZcPzJFGNAZhOlk4deBCKzKm70KfinloafEF1bCCXL6IIHHgKaDkaTkBcTEv7aT-wqDoG1VeO9-wO3GEoAMF9bAOt7mJ0RWQnRVMbyfgH9A%3D%3D">Grandmaster level in StarCraft II using multi-agent reinforcement learning</a> <br>Vinyals et al, 2019 </p><p><a href="http://inoryy.com/files/ring_roman_bsc.pdf">Replicating DeepMind StarCraft II Reinforcement Learning Benchmark with Actor-Critic Methods</a> <br>Roman Ring, 2018 </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/1806.01830">Relational Deep Reinforcement Learning</a>,  Zambaldi et al 2018 </li><li><a href="https://arxiv.org/abs/1708.04782">StarCraft II: A New Challenge for Reinforcement Learning</a>, Vinyals et al 2017 </li><li><a href="https://arxiv.org/abs/1602.0495">Safe and Efficient Off-Policy Reinforcement Learning</a> [Retrace(<em>λ</em>)], Munos et al 2016 </li><li><a href="https://arxiv.org/abs/1611.01224">Sample Efficient Actor-Critic with Experience Replay</a> [ACER], Wang et al 2016 </li><li><a href="https://arxiv.org/abs/1802.01561">IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures</a> [IMPALA/V-trace], Espeholt et al 2018 </li></ul><p><br></p><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 11 Jan 2021 04:00:00 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/4e4a1b63/d12526e1.mp3" length="35684046" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/CGmZ6osV17Jk9znGV6SeX76BfLMVXGxKOaBhPJ0zygo/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzQzMTU5MC8x/NjMyNzc4MTUxLWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>2543</itunes:duration>
      <itunes:summary>Roman Ring discusses the Research Engineer role at DeepMind, StarCraft II, AlphaStar, his bachelor's thesis, JAX, Julia, IMPALA and more!</itunes:summary>
      <itunes:subtitle>Roman Ring discusses the Research Engineer role at DeepMind, StarCraft II, AlphaStar, his bachelor's thesis, JAX, Julia, IMPALA and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Shimon Whiteson</title>
      <itunes:episode>15</itunes:episode>
      <podcast:episode>15</podcast:episode>
      <itunes:title>Shimon Whiteson</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">8a383fde-da05-47e8-9410-37a4408617d9</guid>
      <link>https://share.transistor.fm/s/7fdfd811</link>
      <description>
        <![CDATA[<p><a href="https://www.cs.ox.ac.uk/people/shimon.whiteson/">Shimon Whiteson</a> is a Professor of Computer Science at Oxford University, the head of WhiRL, the Whiteson Research Lab at Oxford, and Head of Research at Waymo UK. </p><p><br><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/1910.08348">VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning <br></a>Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson <strong></strong></p><p><a href="https://arxiv.org/abs/2003.08839">Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning <br></a>Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://www.youtube.com/watch?v=W_9kcQmaWjo">Shimon Whiteson - Multi-agent RL</a>, MIT Embodied Intelligence Seminar </li><li><a href="https://arxiv.org/abs/1902.04043">The StarCraft Multi-Agent Challenge</a>, Samvelyan et al 2019 </li><li><a href="https://twkillian.github.io/papers/YaoKillianKonidarisFDV2018_ICML.pdf">Direct Policy Transfer with Hidden Parameter Markov Decision Processes</a>, Yao et al  2018 </li><li><a href="https://arxiv.org/abs/1706.05296">Value-Decomposition Networks For Cooperative Multi-Agent Learning</a>, Sunehag et al 2017 </li><li><a href="https://whirl.cs.ox.ac.uk/">Whiteson Research Lab</a> </li><li><a href="https://www.ox.ac.uk/news/2019-12-13-waymo-acquires-latent-logic-accelerate-progress-towards-safe-driverless-vehicles">Waymo acquires Latent Logic to accelerate progress towards safe, driverless vehicles</a>, Oxford News </li><li><a href="https://waymo.com/">Waymo</a> </li></ul><p><br></p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://www.cs.ox.ac.uk/people/shimon.whiteson/">Shimon Whiteson</a> is a Professor of Computer Science at Oxford University, the head of WhiRL, the Whiteson Research Lab at Oxford, and Head of Research at Waymo UK. </p><p><br><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/1910.08348">VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning <br></a>Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson <strong></strong></p><p><a href="https://arxiv.org/abs/2003.08839">Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning <br></a>Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://www.youtube.com/watch?v=W_9kcQmaWjo">Shimon Whiteson - Multi-agent RL</a>, MIT Embodied Intelligence Seminar </li><li><a href="https://arxiv.org/abs/1902.04043">The StarCraft Multi-Agent Challenge</a>, Samvelyan et al 2019 </li><li><a href="https://twkillian.github.io/papers/YaoKillianKonidarisFDV2018_ICML.pdf">Direct Policy Transfer with Hidden Parameter Markov Decision Processes</a>, Yao et al  2018 </li><li><a href="https://arxiv.org/abs/1706.05296">Value-Decomposition Networks For Cooperative Multi-Agent Learning</a>, Sunehag et al 2017 </li><li><a href="https://whirl.cs.ox.ac.uk/">Whiteson Research Lab</a> </li><li><a href="https://www.ox.ac.uk/news/2019-12-13-waymo-acquires-latent-logic-accelerate-progress-towards-safe-driverless-vehicles">Waymo acquires Latent Logic to accelerate progress towards safe, driverless vehicles</a>, Oxford News </li><li><a href="https://waymo.com/">Waymo</a> </li></ul><p><br></p><p><br></p>]]>
      </content:encoded>
      <pubDate>Sun, 06 Dec 2020 13:00:00 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/7fdfd811/bd58f073.mp3" length="45093611" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:duration>3215</itunes:duration>
      <itunes:summary>Shimon Whiteson on his WhiRL lab, his work at Waymo UK, variBAD, QMIX, co-operative multi-agent RL, StarCraft Multi-Agent Challenge, advice to grad students, and much more!</itunes:summary>
      <itunes:subtitle>Shimon Whiteson on his WhiRL lab, his work at Waymo UK, variBAD, QMIX, co-operative multi-agent RL, StarCraft Multi-Agent Challenge, advice to grad students, and much more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Aravind Srinivas</title>
      <itunes:episode>14</itunes:episode>
      <podcast:episode>14</podcast:episode>
      <itunes:title>Aravind Srinivas</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">c592a3a9-4343-47c4-ad90-68ed088d3adf</guid>
      <link>https://share.transistor.fm/s/94195f7d</link>
      <description>
        <![CDATA[<p><a href="https://people.eecs.berkeley.edu/~aravind/">Aravind Srinivas</a> is a 3rd year PhD student at UC Berkeley advised by Prof. Abbeel. <br>He co-created and co-taught a <a href="https://sites.google.com/view/berkeley-cs294-158-sp19/home">grad course on Deep Unsupervised Learning</a> at Berkeley. </p><p><br><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/1905.09272">Data-Efficient Image Recognition with Contrastive Predictive Coding <br></a>Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord </p><p><a href="https://arxiv.org/abs/2004.04136">Contrastive Unsupervised Representations for Reinforcement Learning</a> <br>Aravind Srinivas, Michael Laskin, Pieter Abbeel </p><p><a href="https://arxiv.org/abs/2004.14990">Reinforcement Learning with Augmented Data <br></a>Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas </p><p><a href="https://arxiv.org/abs/2007.04938">SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning <br></a>Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://sites.google.com/view/berkeley-cs294-158-sp20/home">CS294-158-SP20 Deep Unsupervised Learning</a>, Berkeley </li><li><a href="https://arxiv.org/abs/2009.04416">Phasic Policy Gradient</a>, Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman </li><li><a href="https://arxiv.org/abs/2006.07733">Bootstrap your own latent: A new approach to self-supervised Learning</a> , Grill et al 2020 </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://people.eecs.berkeley.edu/~aravind/">Aravind Srinivas</a> is a 3rd year PhD student at UC Berkeley advised by Prof. Abbeel. <br>He co-created and co-taught a <a href="https://sites.google.com/view/berkeley-cs294-158-sp19/home">grad course on Deep Unsupervised Learning</a> at Berkeley. </p><p><br><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/1905.09272">Data-Efficient Image Recognition with Contrastive Predictive Coding <br></a>Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord </p><p><a href="https://arxiv.org/abs/2004.04136">Contrastive Unsupervised Representations for Reinforcement Learning</a> <br>Aravind Srinivas, Michael Laskin, Pieter Abbeel </p><p><a href="https://arxiv.org/abs/2004.14990">Reinforcement Learning with Augmented Data <br></a>Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas </p><p><a href="https://arxiv.org/abs/2007.04938">SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning <br></a>Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://sites.google.com/view/berkeley-cs294-158-sp20/home">CS294-158-SP20 Deep Unsupervised Learning</a>, Berkeley </li><li><a href="https://arxiv.org/abs/2009.04416">Phasic Policy Gradient</a>, Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman </li><li><a href="https://arxiv.org/abs/2006.07733">Bootstrap your own latent: A new approach to self-supervised Learning</a> , Grill et al 2020 </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Sun, 20 Sep 2020 20:46:33 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/94195f7d/54e5f696.mp3" length="71864156" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/pmo3645UPbg_4HxF5fFE0fRV1oLGVwxxdjOOzXpMkCg/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzM1NDY3MC8x/NjMyNzk5MjI0LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>5127</itunes:duration>
      <itunes:summary>Aravind Srinivas on his work including CPC v2, RAD, CURL, and SUNRISE, unsupervised learning, teaching a Berkeley course, and more!</itunes:summary>
      <itunes:subtitle>Aravind Srinivas on his work including CPC v2, RAD, CURL, and SUNRISE, unsupervised learning, teaching a Berkeley course, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Taylor Killian</title>
      <itunes:episode>13</itunes:episode>
      <podcast:episode>13</podcast:episode>
      <itunes:title>Taylor Killian</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">d7937f50-38f1-40ed-9ec1-80acf846e633</guid>
      <link>https://share.transistor.fm/s/b97a1d3f</link>
      <description>
        <![CDATA[<p><a href="https://twkillian.github.io/">Taylor Killian</a> is a Ph.D. student at the University of Toronto and the Vector Institute, and an Intern at Google Brain. </p><p><strong>Featured References <br></strong><br></p><p><a href="https://twkillian.github.io/papers/YaoKillianKonidarisFDV2018_ICML.pdf"><strong>Direct Policy Transfer with Hidden Parameter Markov Decision Processes</strong></a><strong> <br></strong>Yao, Killian, Konidaris, Doshi-Velez </p><p><a href="http://papers.nips.cc/paper/7205-robust-and-efficient-transfer-learning-with-hidden-parameter-markov-decision-processes.pdf"><strong>Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes</strong></a><strong> <br></strong>Killian, Daulton, Konidaris, Doshi-Velez </p><p><a href="https://arxiv.org/abs/1612.00475"><strong>Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes</strong></a><strong> <br></strong>Killian, Konidaris, Doshi-Velez </p><p><a href="https://arxiv.org/pdf/2006.11654.pdf"><strong>Counterfactually Guided Policy Transfer in Clinical Settings</strong></a><strong> <br></strong>Killian, Ghassemi, Joshi </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/1308.3513">Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations</a>, Doshi-Velez, Konidaris </li><li><a href="http://www.nature.com/articles/sdata201635">Mimic III, a freely accessible critical care database</a>. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG </li><li><a href="https://www.nature.com/articles/s41591-018-0213-5">The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care</a>, Komorowski et al <a href="https://mimic.physionet.org/"><br></a><br></li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://twkillian.github.io/">Taylor Killian</a> is a Ph.D. student at the University of Toronto and the Vector Institute, and an Intern at Google Brain. </p><p><strong>Featured References <br></strong><br></p><p><a href="https://twkillian.github.io/papers/YaoKillianKonidarisFDV2018_ICML.pdf"><strong>Direct Policy Transfer with Hidden Parameter Markov Decision Processes</strong></a><strong> <br></strong>Yao, Killian, Konidaris, Doshi-Velez </p><p><a href="http://papers.nips.cc/paper/7205-robust-and-efficient-transfer-learning-with-hidden-parameter-markov-decision-processes.pdf"><strong>Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes</strong></a><strong> <br></strong>Killian, Daulton, Konidaris, Doshi-Velez </p><p><a href="https://arxiv.org/abs/1612.00475"><strong>Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes</strong></a><strong> <br></strong>Killian, Konidaris, Doshi-Velez </p><p><a href="https://arxiv.org/pdf/2006.11654.pdf"><strong>Counterfactually Guided Policy Transfer in Clinical Settings</strong></a><strong> <br></strong>Killian, Ghassemi, Joshi </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/1308.3513">Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations</a>, Doshi-Velez, Konidaris </li><li><a href="http://www.nature.com/articles/sdata201635">Mimic III, a freely accessible critical care database</a>. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG </li><li><a href="https://www.nature.com/articles/s41591-018-0213-5">The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care</a>, Komorowski et al <a href="https://mimic.physionet.org/"><br></a><br></li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 17 Aug 2020 08:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/b97a1d3f/8c595b5d.mp3" length="75611309" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/9S2QkBtuXq4pkoeFjy0cgq6cSiwI-GsRzzhQza0F95k/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzMxODg2My8x/NjMyNzc4NjQ1LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>5395</itunes:duration>
      <itunes:summary>Taylor Killian on the latest in RL for Health, including Hidden Parameter MDPs, Mimic III and Sepsis, Counterfactually Guided Policy Transfer and lots more!</itunes:summary>
      <itunes:subtitle>Taylor Killian on the latest in RL for Health, including Hidden Parameter MDPs, Mimic III and Sepsis, Counterfactually Guided Policy Transfer and lots more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/b97a1d3f/transcription.vtt" type="text/vtt" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/b97a1d3f/transcription.srt" type="application/x-subrip" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/b97a1d3f/transcription.json" type="application/json" rel="captions"/>
      <podcast:transcript url="https://share.transistor.fm/s/b97a1d3f/transcription.txt" type="text/plain"/>
      <podcast:transcript url="https://share.transistor.fm/s/b97a1d3f/transcription" type="text/html"/>
    </item>
    <item>
      <title>Nan Jiang</title>
      <itunes:episode>12</itunes:episode>
      <podcast:episode>12</podcast:episode>
      <itunes:title>Nan Jiang</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">ea9496c9-f6b4-46c2-a811-a5518105e0e0</guid>
      <link>https://share.transistor.fm/s/fbdbe373</link>
      <description>
        <![CDATA[<p><a href="https://nanjiang.cs.illinois.edu/">Nan Jiang</a> is an Assistant Professor of Computer Science at University of Illinois.  He was a Postdoc Microsoft Research, and did his PhD at University of Michigan under Professor Satinder Singh. </p><p><br><strong>Featured References </strong></p><ul><li><a href="https://rltheorybook.github.io/"><strong>Reinforcement Learning: Theory and Algorithms</strong></a><strong> <br></strong>Alekh Agarwal Nan Jiang Sham M. Kakade <p></p></li><li><a href="https://arxiv.org/abs/1811.08540"><strong>Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches</strong></a><strong> <br></strong>Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford <p></p></li><li><a href="https://arxiv.org/abs/1905.00360"><strong>Information-Theoretic Considerations in Batch Reinforcement Learning</strong></a><strong> <br></strong>Jinglin Chen, Nan Jiang </li></ul><p> <br><strong>Additional References </strong></p><ul><li><a href="http://rbr.cs.umass.edu/aimath06/proceedings/P21.pdf">Towards a Unified Theory of State Abstraction for MDPs</a>, Lihong Li, Thomas J. Walsh, Michael L. Littman  </li><li><a href="https://arxiv.org/abs/1511.03722">Doubly Robust Off-policy Value Evaluation for Reinforcement Learning</a>, Nan Jiang, Lihong Li </li><li><a href="https://arxiv.org/abs/2002.02081">Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization</a>, Nan Jiang, Jiawei Huang </li><li><a href="https://arxiv.org/abs/1911.06854">Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning</a>, Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue </li></ul><p><br></p><p><strong>Errata </strong></p><ul><li>[Robin] I misspoke when I said in domain randomization we want the agent to "ignore" domain parameters.  What I should have said is, we want the agent to perform well within some range of domain parameters, it should be robust with respect to domain parameters. </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://nanjiang.cs.illinois.edu/">Nan Jiang</a> is an Assistant Professor of Computer Science at University of Illinois.  He was a Postdoc Microsoft Research, and did his PhD at University of Michigan under Professor Satinder Singh. </p><p><br><strong>Featured References </strong></p><ul><li><a href="https://rltheorybook.github.io/"><strong>Reinforcement Learning: Theory and Algorithms</strong></a><strong> <br></strong>Alekh Agarwal Nan Jiang Sham M. Kakade <p></p></li><li><a href="https://arxiv.org/abs/1811.08540"><strong>Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches</strong></a><strong> <br></strong>Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford <p></p></li><li><a href="https://arxiv.org/abs/1905.00360"><strong>Information-Theoretic Considerations in Batch Reinforcement Learning</strong></a><strong> <br></strong>Jinglin Chen, Nan Jiang </li></ul><p> <br><strong>Additional References </strong></p><ul><li><a href="http://rbr.cs.umass.edu/aimath06/proceedings/P21.pdf">Towards a Unified Theory of State Abstraction for MDPs</a>, Lihong Li, Thomas J. Walsh, Michael L. Littman  </li><li><a href="https://arxiv.org/abs/1511.03722">Doubly Robust Off-policy Value Evaluation for Reinforcement Learning</a>, Nan Jiang, Lihong Li </li><li><a href="https://arxiv.org/abs/2002.02081">Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization</a>, Nan Jiang, Jiawei Huang </li><li><a href="https://arxiv.org/abs/1911.06854">Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning</a>, Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue </li></ul><p><br></p><p><strong>Errata </strong></p><ul><li>[Robin] I misspoke when I said in domain randomization we want the agent to "ignore" domain parameters.  What I should have said is, we want the agent to perform well within some range of domain parameters, it should be robust with respect to domain parameters. </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 06 Jul 2020 08:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/fbdbe373/a07b5ff3.mp3" length="60370486" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/CsnZQyhZcDmEtcUJw-8EM9mklbz3fIL3aodsJqPUql0/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzI4Nzc2MC8x/NjMyNzgwNjIxLWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>4306</itunes:duration>
      <itunes:summary>Nan Jiang takes us deep into Model-based vs Model-free RL, Sim vs Real, Evaluation &amp;amp; Overfitting, RL Theory vs Practice and much more!</itunes:summary>
      <itunes:subtitle>Nan Jiang takes us deep into Model-based vs Model-free RL, Sim vs Real, Evaluation &amp;amp; Overfitting, RL Theory vs Practice and much more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Danijar Hafner</title>
      <itunes:episode>11</itunes:episode>
      <podcast:episode>11</podcast:episode>
      <itunes:title>Danijar Hafner</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">0c8065a8-25cf-4cfc-b7d8-7d1b50297ee3</guid>
      <link>https://share.transistor.fm/s/03a60878</link>
      <description>
        <![CDATA[<p><a href="https://danijar.com/">Danijar Hafner</a> is a PhD student at the University of Toronto, and a student researcher at Google Research, Brain Team and the Vector Institute.  He holds a Masters of Research from University College London. </p><p><strong>Featured References </strong></p><ul><li><a href="https://www.nature.com/articles/s41593-019-0520-2.epdf?shared_access_token=n1zyUZ6-ypeHWkeaEs1FPNRgN0jAjWel9jnR3ZoTv0N5dsTXXcjpcGP7i54eL_L9GTMgy1V6NUDPE4-SxE_8Ip1gIa5G35VU4LeqRZ56IGy5uMJKd6aUZ4JeYonqPfWkstTCNFgazGPl8xJGrQAvuw%3D%3D"><strong>A deep learning framework for neuroscience</strong></a><strong> </strong><br>Blake A. Richards, Timothy P. Lillicrap , Philippe Beaudoin, Yoshua Bengio, Rafal Bogacz, Amelia Christensen, Claudia Clopath, Rui Ponte Costa, Archy de Berker, Surya Ganguli, Colleen J. Gillon , Danijar Hafner, Adam Kepecs, Nikolaus Kriegeskorte, Peter Latham , Grace W. Lindsay, Kenneth D. Miller , Richard Naud , Christopher C. Pack, Panayiota Poirazi , Pieter Roelfsema , João Sacramento, Andrew Saxe, Benjamin Scellier, Anna C. Schapiro , Walter Senn, Greg Wayne, Daniel Yamins, Friedemann Zenke, Joel Zylberberg, Denis Therien, Konrad P. Kording </li><li><a href="https://arxiv.org/abs/1811.04551"><strong>Learning Latent Dynamics for Planning from Pixels</strong></a><strong> </strong><br>Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson </li><li><a href="https://arxiv.org/abs/1912.01603"><strong>Dream to Control: Learning Behaviors by Latent Imagination</strong></a><strong> </strong><br>Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi </li><li><a href="https://arxiv.org/abs/2005.05960"><strong>Planning to Explore via Self-Supervised World Models</strong></a><strong> </strong><br>Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak </li></ul><p><br></p><p><strong>Additional References</strong></p><ul><li><a href="https://arxiv.org/abs/1911.08265">Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model</a> Schrittwieser et al </li><li><a href="https://arxiv.org/abs/1712.01815">Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm</a> Silver et al </li><li><a href="https://arxiv.org/abs/1906.09237">Shaping Belief States with Generative Environment Models for RL</a>  Gregor et al </li><li><a href="https://arxiv.org/abs/1810.12162">Model-Based Active Exploration</a> Shyam et al </li></ul><p> <br><strong>Errata </strong></p><ul><li>[Robin] Around 1:37 I say <em>"some ... world models get confused by random noise".</em> I meant "some curiosity formulations", not "world models" </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://danijar.com/">Danijar Hafner</a> is a PhD student at the University of Toronto, and a student researcher at Google Research, Brain Team and the Vector Institute.  He holds a Masters of Research from University College London. </p><p><strong>Featured References </strong></p><ul><li><a href="https://www.nature.com/articles/s41593-019-0520-2.epdf?shared_access_token=n1zyUZ6-ypeHWkeaEs1FPNRgN0jAjWel9jnR3ZoTv0N5dsTXXcjpcGP7i54eL_L9GTMgy1V6NUDPE4-SxE_8Ip1gIa5G35VU4LeqRZ56IGy5uMJKd6aUZ4JeYonqPfWkstTCNFgazGPl8xJGrQAvuw%3D%3D"><strong>A deep learning framework for neuroscience</strong></a><strong> </strong><br>Blake A. Richards, Timothy P. Lillicrap , Philippe Beaudoin, Yoshua Bengio, Rafal Bogacz, Amelia Christensen, Claudia Clopath, Rui Ponte Costa, Archy de Berker, Surya Ganguli, Colleen J. Gillon , Danijar Hafner, Adam Kepecs, Nikolaus Kriegeskorte, Peter Latham , Grace W. Lindsay, Kenneth D. Miller , Richard Naud , Christopher C. Pack, Panayiota Poirazi , Pieter Roelfsema , João Sacramento, Andrew Saxe, Benjamin Scellier, Anna C. Schapiro , Walter Senn, Greg Wayne, Daniel Yamins, Friedemann Zenke, Joel Zylberberg, Denis Therien, Konrad P. Kording </li><li><a href="https://arxiv.org/abs/1811.04551"><strong>Learning Latent Dynamics for Planning from Pixels</strong></a><strong> </strong><br>Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson </li><li><a href="https://arxiv.org/abs/1912.01603"><strong>Dream to Control: Learning Behaviors by Latent Imagination</strong></a><strong> </strong><br>Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi </li><li><a href="https://arxiv.org/abs/2005.05960"><strong>Planning to Explore via Self-Supervised World Models</strong></a><strong> </strong><br>Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak </li></ul><p><br></p><p><strong>Additional References</strong></p><ul><li><a href="https://arxiv.org/abs/1911.08265">Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model</a> Schrittwieser et al </li><li><a href="https://arxiv.org/abs/1712.01815">Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm</a> Silver et al </li><li><a href="https://arxiv.org/abs/1906.09237">Shaping Belief States with Generative Environment Models for RL</a>  Gregor et al </li><li><a href="https://arxiv.org/abs/1810.12162">Model-Based Active Exploration</a> Shyam et al </li></ul><p> <br><strong>Errata </strong></p><ul><li>[Robin] Around 1:37 I say <em>"some ... world models get confused by random noise".</em> I meant "some curiosity formulations", not "world models" </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Thu, 14 May 2020 03:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/03a60878/0d3bd30e.mp3" length="74857614" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/vitVVvMCJvThB3PtLFDctb82DZRQHz2aH502le2qMgo/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9kNzJh/OGYzMWUyYTFiMDNi/YTE3N2VjMDRlNjVk/ZWQzOC53ZWJw.jpg"/>
      <itunes:duration>7229</itunes:duration>
      <itunes:summary>Danijar Hafner takes us on an odyssey through deep learning &amp;amp; neuroscience, PlaNet, Dreamer, world models, latent dynamics, curious agents, and more!</itunes:summary>
      <itunes:subtitle>Danijar Hafner takes us on an odyssey through deep learning &amp;amp; neuroscience, PlaNet, Dreamer, world models, latent dynamics, curious agents, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
      <podcast:transcript url="https://share.transistor.fm/s/03a60878/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>Csaba Szepesvari</title>
      <itunes:episode>10</itunes:episode>
      <podcast:episode>10</podcast:episode>
      <itunes:title>Csaba Szepesvari</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">cfc35e8a-9c74-4257-bf67-ab18d1ae3dc3</guid>
      <link>https://share.transistor.fm/s/d60b55ce</link>
      <description>
        <![CDATA[<p><a href="https://sites.ualberta.ca/~szepesva/">Csaba Szepesvari</a> is: </p><ul><li>Head of the Foundations Team at DeepMind </li><li>Professor of Computer Science at the University of Alberta </li><li>Canada CIFAR AI Chair </li><li>Fellow at the Alberta Machine Intelligence Institute  </li><li>Co-Author of the book <a href="https://tor-lattimore.com/downloads/book/book.pdf">Bandit Algorithms</a> along with Tor Lattimore, and author of the book <a href="https://sites.ualberta.ca/~szepesva/RLBook.html">Algorithms for Reinforcement Learning</a> </li></ul><p><strong>References </strong></p><ul><li><a href="https://dl.acm.org/doi/10.1007/11871842_29">Bandit based monte-carlo planning</a>, Levente Kocsis, Csaba Szepesvári </li><li><a href="https://tor-lattimore.com/downloads/book/book.pdf">Bandit Algorithms</a>, Tor Lattimore, Csaba Szepesvári </li><li><a href="https://sites.ualberta.ca/~szepesva/RLBook.html">Algorithms for Reinforcement Learning</a>, Csaba Szepesvári </li><li><a href="https://arxiv.org/abs/1612.08810">The Predictron: End-To-End Learning and Planning</a>, David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris </li><li><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.1701">A Bayesian framework for reinforcement learning</a>, Strens </li><li><a href="https://openai.com/blog/solving-rubiks-cube/">Solving Rubik’s Cube with a Robot Hand</a> ; <a href="https://arxiv.org/abs/1910.07113">Paper</a>, OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang </li><li><a href="https://epubs.siam.org/doi/abs/10.1137/S0097539701398375">The Nonstochastic Multiarmed Bandit Problem</a>, Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire </li><li><a href="https://slideslive.com/38923183/deep-learning-with-bayesian-principles">Deep Learning with Bayesian Principles</a>, Mohammad Emtiyaz Khan </li><li><a href="https://arxiv.org/abs/1906.05433">Tackling climate change with Machine Learning</a> David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://sites.ualberta.ca/~szepesva/">Csaba Szepesvari</a> is: </p><ul><li>Head of the Foundations Team at DeepMind </li><li>Professor of Computer Science at the University of Alberta </li><li>Canada CIFAR AI Chair </li><li>Fellow at the Alberta Machine Intelligence Institute  </li><li>Co-Author of the book <a href="https://tor-lattimore.com/downloads/book/book.pdf">Bandit Algorithms</a> along with Tor Lattimore, and author of the book <a href="https://sites.ualberta.ca/~szepesva/RLBook.html">Algorithms for Reinforcement Learning</a> </li></ul><p><strong>References </strong></p><ul><li><a href="https://dl.acm.org/doi/10.1007/11871842_29">Bandit based monte-carlo planning</a>, Levente Kocsis, Csaba Szepesvári </li><li><a href="https://tor-lattimore.com/downloads/book/book.pdf">Bandit Algorithms</a>, Tor Lattimore, Csaba Szepesvári </li><li><a href="https://sites.ualberta.ca/~szepesva/RLBook.html">Algorithms for Reinforcement Learning</a>, Csaba Szepesvári </li><li><a href="https://arxiv.org/abs/1612.08810">The Predictron: End-To-End Learning and Planning</a>, David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris </li><li><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.1701">A Bayesian framework for reinforcement learning</a>, Strens </li><li><a href="https://openai.com/blog/solving-rubiks-cube/">Solving Rubik’s Cube with a Robot Hand</a> ; <a href="https://arxiv.org/abs/1910.07113">Paper</a>, OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang </li><li><a href="https://epubs.siam.org/doi/abs/10.1137/S0097539701398375">The Nonstochastic Multiarmed Bandit Problem</a>, Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire </li><li><a href="https://slideslive.com/38923183/deep-learning-with-bayesian-principles">Deep Learning with Bayesian Principles</a>, Mohammad Emtiyaz Khan </li><li><a href="https://arxiv.org/abs/1906.05433">Tackling climate change with Machine Learning</a> David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Sun, 05 Apr 2020 09:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/d60b55ce/f3d46a86.mp3" length="41002622" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/igyhAE99ObH-oBfGKoMk-lMtsaKa8_rKJbJp324udU0/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzIxNjU5My8x/NjMyODYxNTkyLWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>2922</itunes:duration>
      <itunes:summary>Csaba Szepesvari of DeepMind shares his views on Bandits, Adversaries, PUCT in AlphaGo / AlphaZero / MuZero, AGI and RL, what is timeless, and more!</itunes:summary>
      <itunes:subtitle>Csaba Szepesvari of DeepMind shares his views on Bandits, Adversaries, PUCT in AlphaGo / AlphaZero / MuZero, AGI and RL, what is timeless, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Ben Eysenbach</title>
      <itunes:episode>9</itunes:episode>
      <podcast:episode>9</podcast:episode>
      <itunes:title>Ben Eysenbach</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">836a3261-56b6-455b-a226-c6f7c9e24f6e</guid>
      <link>https://share.transistor.fm/s/c12f63c1</link>
      <description>
        <![CDATA[<p><a href="https://ben-eysenbach.github.io/">Ben Eysenbach</a> is a PhD student in the <a href="https://www.ml.cmu.edu/">Machine Learning Department</a> at Carnegie Mellon University.  He was a Resident at Google Brain, and studied math and computer science at MIT. He co-founded the <a href="https://sites.google.com/view/erl-2019/home">ICML Exploration in Reinforcement Learning workshop</a>.  </p><p><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/1802.06070">Diversity is All You Need: Learning Skills without a Reward Function</a> <br>Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine </p><p><a href="https://arxiv.org/abs/1906.05253">Search on the Replay Buffer: Bridging Planning and Reinforcement Learning <br></a>Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine </p><p><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/1908.03568">Behaviour Suite for Reinforcement Learning</a>, Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt </li><li><a href="https://arxiv.org/abs/1903.01973">Learning Latent Plans from Play</a>, Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, Pierre Sermanet </li><li><a href="https://finale.seas.harvard.edu/">Finale Doshi-Velez</a> </li><li><a href="https://cs.stanford.edu/people/ebrun/">Emma Brunskill</a> </li><li><a href="https://www.nature.com/articles/s41586-020-1994-5">Closed-loop optimization of fast-charging protocols for batteries with machine learning</a>,  Peter Attia, Aditya Grover, Norman Jin, Kristen Severson, Todor Markov, Yang-Hung Liao, Michael Chen, Bryan Cheong, Nicholas Perkins, Zi Yang, Patrick Herring, Muratahan Aykol, Stephen Harris, Richard Braatz, Stefano Ermon, William Chueh </li><li><a href="https://cmudeeprl.github.io/703website/">CMU 10-703 Deep Reinforcement Learning</a>, Fall 2019, Carnegie Mellon University </li><li><a href="https://sites.google.com/view/erl-2019/home">ICML Exploration in Reinforcement Learning workshop</a> </li></ul><p><br></p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://ben-eysenbach.github.io/">Ben Eysenbach</a> is a PhD student in the <a href="https://www.ml.cmu.edu/">Machine Learning Department</a> at Carnegie Mellon University.  He was a Resident at Google Brain, and studied math and computer science at MIT. He co-founded the <a href="https://sites.google.com/view/erl-2019/home">ICML Exploration in Reinforcement Learning workshop</a>.  </p><p><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/1802.06070">Diversity is All You Need: Learning Skills without a Reward Function</a> <br>Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine </p><p><a href="https://arxiv.org/abs/1906.05253">Search on the Replay Buffer: Bridging Planning and Reinforcement Learning <br></a>Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine </p><p><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/1908.03568">Behaviour Suite for Reinforcement Learning</a>, Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt </li><li><a href="https://arxiv.org/abs/1903.01973">Learning Latent Plans from Play</a>, Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, Pierre Sermanet </li><li><a href="https://finale.seas.harvard.edu/">Finale Doshi-Velez</a> </li><li><a href="https://cs.stanford.edu/people/ebrun/">Emma Brunskill</a> </li><li><a href="https://www.nature.com/articles/s41586-020-1994-5">Closed-loop optimization of fast-charging protocols for batteries with machine learning</a>,  Peter Attia, Aditya Grover, Norman Jin, Kristen Severson, Todor Markov, Yang-Hung Liao, Michael Chen, Bryan Cheong, Nicholas Perkins, Zi Yang, Patrick Herring, Muratahan Aykol, Stephen Harris, Richard Braatz, Stefano Ermon, William Chueh </li><li><a href="https://cmudeeprl.github.io/703website/">CMU 10-703 Deep Reinforcement Learning</a>, Fall 2019, Carnegie Mellon University </li><li><a href="https://sites.google.com/view/erl-2019/home">ICML Exploration in Reinforcement Learning workshop</a> </li></ul><p><br></p><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 30 Mar 2020 09:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/c12f63c1/c2d1376a.mp3" length="41494501" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/iSIO39WMoeY50XL7-xI2FpPxVlQFCA0xOKQWYHwpOv4/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzIxNzAxMC8x/NjMyNzg0NDQxLWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>2958</itunes:duration>
      <itunes:summary>Ben Eysenbach schools us on human supervision, SORB, DIAYN, techniques for exploration, teaching RL, virtual conferences, and much more!</itunes:summary>
      <itunes:subtitle>Ben Eysenbach schools us on human supervision, SORB, DIAYN, techniques for exploration, teaching RL, virtual conferences, and much more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>NeurIPS 2019 Deep RL Workshop</title>
      <itunes:episode>8</itunes:episode>
      <podcast:episode>8</podcast:episode>
      <itunes:title>NeurIPS 2019 Deep RL Workshop</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">183ee11e-5dfb-4082-92f6-84cb4abfcf9a</guid>
      <link>https://share.transistor.fm/s/ece23fd8</link>
      <description>
        <![CDATA[<p>Thank you to all the presenters that participated.  I covered as many as I could given the time and crowds, if you were not included and wish to be, please email talkrl@pathwayi.com </p><p>More details on the official <a href="https://sites.google.com/view/deep-rl-workshop-neurips-2019/home">NeurIPS Deep RL Workshop site</a>. </p><ul><li>0:23  <a href="https://drive.google.com/file/d/1aUY63fjl7MxRRGT-PJwjTi1QFWCQcFoG/view?usp=drivesdk">Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms</a>; Matthia Sabatelli (University of Liege); Gilles Louppe (University of Liège); Pierre Geurts (University of Liège); Marco Wiering (University of Groningen) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1909.01779&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEGuYUgp7Qw-WsM2a02HepP18hM9w">[external pdf link]</a> </li><li>4:16  <a href="https://drive.google.com/file/d/10oG4X8cpvVa3TapxhTRZPGYzWk6EAuKm/view?usp=drivesdk">Single Deep Counterfactual Regret Minimization</a>; Eric Steinberger (University of Cambridge). </li><li>5:38  <a href="https://drive.google.com/file/d/1P7BQqOTPGzPf_RPfxirMFGmnyaIVqkrf/view?usp=drivesdk">On the Convergence of Episodic Reinforcement Learning Algorithms at the Example of RUDDER</a>; Markus Holzleitner (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria); José Arjona-Medina (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria); Marius-Constantin Dinu (LIT AI Lab / University Linz ); Sepp Hochreiter (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria). </li><li>9:33  <a href="https://drive.google.com/file/d/1IhARUnbaFkVswI-BtuNP67qQrVZyPns1/view?usp=drivesdk">Objective Mismatch in Model-based Reinforcement Learning</a>; Nathan Lambert (UC Berkeley); Brandon Amos (Facebook); Omry Yadan (Facebook); Roberto Calandra (Facebook). </li><li>10:51  <a href="https://drive.google.com/file/d/17EEcdMvR6HGK64W-2B_3rtDdl_1Lokjd/view?usp=drivesdk">Option Discovery using Deep Skill Chaining</a>; Akhil Bagaria (Brown University); George Konidaris (Brown University). </li><li>13:44  <a href="https://drive.google.com/file/d/18dLTjFt5fCXoYRjRSytunh876Y2oVFm4/view?usp=drivesdk">Blue River Controls: A toolkit for Reinforcement Learning Control Systems on Hardware</a>; Kirill Polzounov (University of Calgary); Ramitha Sundar (Blue River Technology); Lee Reden (Blue River Technology). </li><li>14:52  <a href="https://drive.google.com/file/d/1BZ2MlrBIS26TSEG4tYpc62XK63MG0jlm/view?usp=drivesdk">LeDeepChef: Deep Reinforcement Learning Agent for Families of Text-Based Games</a>; Leonard Adolphs (ETHZ); Thomas Hofmann (ETH Zurich). </li><li>16:30  <a href="https://drive.google.com/file/d/1uP1luedMimy2apnmf9Pa8XgpCSIM0Uln/view?usp=drivesdk">Accelerating Training in Pommerman with Imitation and Reinforcement Learning</a>; Hardik Meisheri (TCS Research); Omkar Shelke (TCS Research); Richa Verma (TCS Research); Harshad Khadilkar (TCS Research). </li><li>17:27  <a href="https://drive.google.com/file/d/10oS8c1VdtlykxvzgatOhggoBoLgxBbCt/view?usp=drivesdk">Dream to Control: Learning Behaviors by Latent Imagination</a>; Danijar Hafner (Google); Timothy Lillicrap (DeepMind); Jimmy Ba (University of Toronto); Mohammad Norouzi (Google Brain) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fpdf%2F1912.01603.pdf&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNFfHeKZ4JKokibBgUXB_YPibeLV7g">[external pdf link]</a>. </li><li>20:48  <a href="https://drive.google.com/file/d/1nUqS4qTqLtNo19rjVGW_CHLAGjTYeBlW/view?usp=drivesdk">Adaptive Temperature Tuning for Mellowmax in Deep Reinforcement Learning</a>; Seungchan Kim (Brown University); George Konidaris (Brown). </li><li>22:05  <a href="https://drive.google.com/file/d/13lNSV936kyX7uzpMB2K1uqfO4kHxLcdy/view?usp=drivesdk">Meta-learning curiosity algorithms</a>; Ferran Alet (MIT); Martin Schneider (MIT); Tomas Lozano-Perez (MIT); Leslie Kaelbling (MIT). </li><li>24:09  <a href="https://drive.google.com/file/d/1BRgvMln6xNETPcSlw1zBnK9tbPMyy8EZ/view?usp=drivesdk">Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards</a>; Xingyu Lu (Berkeley); Stas Tiomkin (BAIR, UC Berkeley); Pieter Abbeel (UC Berkeley). </li><li>25:44   <a href="https://drive.google.com/file/d/1V5EgqGy4LCyTnk-KoGeGJnNtu3Q6jpC3/view?usp=drivesdk">Swarm-inspired Reinforcement Learning via Collaborative Inter-agent Knowledge Distillation</a>; Zhang-Wei Hong (Preferred Networks); Prabhat Nagarajan (Preferred Networks); Guilherme Maeda (Preferred Networks). </li><li>26:35  <a href="https://drive.google.com/file/d/1W6yuUZM-v6o4VgUFayD3YoNg_-Lj156i/view?usp=drivesdk">Multiplayer AlphaZero</a>; Nicholas Petosa (Georgia Institute of Technology); Tucker Balch (Ga Tech) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1910.13012&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNH3945STOW-xNQKetB70So_We-QHg">[external pdf link]</a>. </li><li>27:43  <a href="https://drive.google.com/file/d/1T8JwuENMaDjGSxqr6M2xW67Y0a9ACFD7/view?usp=drivesdk">Prioritized Sequence Experience Replay</a>; Marc Brittain (Iowa State University); Joshua Bertram (Iowa State University); Xuxi Yang (Iowa State University); Peng Wei (Iowa State University) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1905.12726&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNFyBrotqKpnDtzwItikmFe42et9vA">[external pdf link]</a>. </li><li>29:14  <a href="https://drive.google.com/file/d/1i-iaBcEcrR9iScJhDew1TFcRUTXnwlga/view?usp=drivesdk">Recurrent neural-linear posterior sampling for non-stationary bandits</a>; Paulo Rauber (IDSIA); Aditya Ramesh (USI); Jürgen Schmidhuber (IDSIA - Lugano). </li><li>29:36  <a href="https://drive.google.com/file/d/1aLIB63k2OR8HkSY3MY0vXJWqXfzh1jhX/view?usp=drivesdk">Improving Evolutionary Strategies With Past Descent Directions</a>; Asier Mujika (ETH Zurich); Florian Meier (ETH Zurich); Marcelo Matheus Gauy (ETH Zurich); Angelika Steger (ETH Zurich) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1910.05268&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEmCZ2eiAPlFfUw0VSwUap95Ak_eA">[external pdf link]</a>. </li><li>31:40  <a href="https://drive.google.com/file/d/1_IyylMBYnQfpS80dQ38lsSC7yC_oiedH/view?usp=drivesdk">ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations</a>; Daniel Seita (University of California, Berkeley); David Chan (University of California, Berkeley); Roshan Rao (UC Berkeley); Chen Tang (UC Berkeley); Mandi Zhao (UC Berkeley); John Canny (UC Berkeley) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1910.12154&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNFDwzljC2GWQ-DhfSeTogUsDrX7xg">[external pdf link]</a>. </li><li>33:05  <a href="https://drive.google.com/file/d/1rMFuE7mNQQk2-ut7qcFhKolaOjRE-dfR/view?usp=drivesdk">Bottom-Up Meta-Policy Search</a>; Luckeciano Melo (Aeronautics Institute of Technology); Marcos Máximo (Aeronautics Institute of Technology); Adilson Cunha (Aeronautics Institute of Technology) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1910.10232&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHiUlq_Q94qUONidmLRsGnOJAvRCg">[external pdf link]</a>. </li><li>33:37  <a href="https://drive.google.com/file/d/1NNrv7fj0R4XWc_u_D4fP6EX_N2N8R_U0/view?usp=sharing">MERL: Multi-Head Reinforcement Learning</a>; Yannis Flet-Berliac (University of Lille / Inria); Philippe Preux (INRIA) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1909.11939&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHX_MOTNfoJ5xza-o7xAUbbsomUQw">[external pdf link]</a>. </li><li>35:30  <a href="https://drive.google.com/file/d/1Xuis8jh5R5vKzA_TLY3JvaUcKjJ2dLQ2/view?usp=drivesdk">Emergen...</a></li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Thank you to all the presenters that participated.  I covered as many as I could given the time and crowds, if you were not included and wish to be, please email talkrl@pathwayi.com </p><p>More details on the official <a href="https://sites.google.com/view/deep-rl-workshop-neurips-2019/home">NeurIPS Deep RL Workshop site</a>. </p><ul><li>0:23  <a href="https://drive.google.com/file/d/1aUY63fjl7MxRRGT-PJwjTi1QFWCQcFoG/view?usp=drivesdk">Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms</a>; Matthia Sabatelli (University of Liege); Gilles Louppe (University of Liège); Pierre Geurts (University of Liège); Marco Wiering (University of Groningen) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1909.01779&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEGuYUgp7Qw-WsM2a02HepP18hM9w">[external pdf link]</a> </li><li>4:16  <a href="https://drive.google.com/file/d/10oG4X8cpvVa3TapxhTRZPGYzWk6EAuKm/view?usp=drivesdk">Single Deep Counterfactual Regret Minimization</a>; Eric Steinberger (University of Cambridge). </li><li>5:38  <a href="https://drive.google.com/file/d/1P7BQqOTPGzPf_RPfxirMFGmnyaIVqkrf/view?usp=drivesdk">On the Convergence of Episodic Reinforcement Learning Algorithms at the Example of RUDDER</a>; Markus Holzleitner (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria); José Arjona-Medina (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria); Marius-Constantin Dinu (LIT AI Lab / University Linz ); Sepp Hochreiter (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria). </li><li>9:33  <a href="https://drive.google.com/file/d/1IhARUnbaFkVswI-BtuNP67qQrVZyPns1/view?usp=drivesdk">Objective Mismatch in Model-based Reinforcement Learning</a>; Nathan Lambert (UC Berkeley); Brandon Amos (Facebook); Omry Yadan (Facebook); Roberto Calandra (Facebook). </li><li>10:51  <a href="https://drive.google.com/file/d/17EEcdMvR6HGK64W-2B_3rtDdl_1Lokjd/view?usp=drivesdk">Option Discovery using Deep Skill Chaining</a>; Akhil Bagaria (Brown University); George Konidaris (Brown University). </li><li>13:44  <a href="https://drive.google.com/file/d/18dLTjFt5fCXoYRjRSytunh876Y2oVFm4/view?usp=drivesdk">Blue River Controls: A toolkit for Reinforcement Learning Control Systems on Hardware</a>; Kirill Polzounov (University of Calgary); Ramitha Sundar (Blue River Technology); Lee Reden (Blue River Technology). </li><li>14:52  <a href="https://drive.google.com/file/d/1BZ2MlrBIS26TSEG4tYpc62XK63MG0jlm/view?usp=drivesdk">LeDeepChef: Deep Reinforcement Learning Agent for Families of Text-Based Games</a>; Leonard Adolphs (ETHZ); Thomas Hofmann (ETH Zurich). </li><li>16:30  <a href="https://drive.google.com/file/d/1uP1luedMimy2apnmf9Pa8XgpCSIM0Uln/view?usp=drivesdk">Accelerating Training in Pommerman with Imitation and Reinforcement Learning</a>; Hardik Meisheri (TCS Research); Omkar Shelke (TCS Research); Richa Verma (TCS Research); Harshad Khadilkar (TCS Research). </li><li>17:27  <a href="https://drive.google.com/file/d/10oS8c1VdtlykxvzgatOhggoBoLgxBbCt/view?usp=drivesdk">Dream to Control: Learning Behaviors by Latent Imagination</a>; Danijar Hafner (Google); Timothy Lillicrap (DeepMind); Jimmy Ba (University of Toronto); Mohammad Norouzi (Google Brain) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fpdf%2F1912.01603.pdf&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNFfHeKZ4JKokibBgUXB_YPibeLV7g">[external pdf link]</a>. </li><li>20:48  <a href="https://drive.google.com/file/d/1nUqS4qTqLtNo19rjVGW_CHLAGjTYeBlW/view?usp=drivesdk">Adaptive Temperature Tuning for Mellowmax in Deep Reinforcement Learning</a>; Seungchan Kim (Brown University); George Konidaris (Brown). </li><li>22:05  <a href="https://drive.google.com/file/d/13lNSV936kyX7uzpMB2K1uqfO4kHxLcdy/view?usp=drivesdk">Meta-learning curiosity algorithms</a>; Ferran Alet (MIT); Martin Schneider (MIT); Tomas Lozano-Perez (MIT); Leslie Kaelbling (MIT). </li><li>24:09  <a href="https://drive.google.com/file/d/1BRgvMln6xNETPcSlw1zBnK9tbPMyy8EZ/view?usp=drivesdk">Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards</a>; Xingyu Lu (Berkeley); Stas Tiomkin (BAIR, UC Berkeley); Pieter Abbeel (UC Berkeley). </li><li>25:44   <a href="https://drive.google.com/file/d/1V5EgqGy4LCyTnk-KoGeGJnNtu3Q6jpC3/view?usp=drivesdk">Swarm-inspired Reinforcement Learning via Collaborative Inter-agent Knowledge Distillation</a>; Zhang-Wei Hong (Preferred Networks); Prabhat Nagarajan (Preferred Networks); Guilherme Maeda (Preferred Networks). </li><li>26:35  <a href="https://drive.google.com/file/d/1W6yuUZM-v6o4VgUFayD3YoNg_-Lj156i/view?usp=drivesdk">Multiplayer AlphaZero</a>; Nicholas Petosa (Georgia Institute of Technology); Tucker Balch (Ga Tech) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1910.13012&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNH3945STOW-xNQKetB70So_We-QHg">[external pdf link]</a>. </li><li>27:43  <a href="https://drive.google.com/file/d/1T8JwuENMaDjGSxqr6M2xW67Y0a9ACFD7/view?usp=drivesdk">Prioritized Sequence Experience Replay</a>; Marc Brittain (Iowa State University); Joshua Bertram (Iowa State University); Xuxi Yang (Iowa State University); Peng Wei (Iowa State University) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1905.12726&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNFyBrotqKpnDtzwItikmFe42et9vA">[external pdf link]</a>. </li><li>29:14  <a href="https://drive.google.com/file/d/1i-iaBcEcrR9iScJhDew1TFcRUTXnwlga/view?usp=drivesdk">Recurrent neural-linear posterior sampling for non-stationary bandits</a>; Paulo Rauber (IDSIA); Aditya Ramesh (USI); Jürgen Schmidhuber (IDSIA - Lugano). </li><li>29:36  <a href="https://drive.google.com/file/d/1aLIB63k2OR8HkSY3MY0vXJWqXfzh1jhX/view?usp=drivesdk">Improving Evolutionary Strategies With Past Descent Directions</a>; Asier Mujika (ETH Zurich); Florian Meier (ETH Zurich); Marcelo Matheus Gauy (ETH Zurich); Angelika Steger (ETH Zurich) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1910.05268&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNEmCZ2eiAPlFfUw0VSwUap95Ak_eA">[external pdf link]</a>. </li><li>31:40  <a href="https://drive.google.com/file/d/1_IyylMBYnQfpS80dQ38lsSC7yC_oiedH/view?usp=drivesdk">ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations</a>; Daniel Seita (University of California, Berkeley); David Chan (University of California, Berkeley); Roshan Rao (UC Berkeley); Chen Tang (UC Berkeley); Mandi Zhao (UC Berkeley); John Canny (UC Berkeley) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1910.12154&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNFDwzljC2GWQ-DhfSeTogUsDrX7xg">[external pdf link]</a>. </li><li>33:05  <a href="https://drive.google.com/file/d/1rMFuE7mNQQk2-ut7qcFhKolaOjRE-dfR/view?usp=drivesdk">Bottom-Up Meta-Policy Search</a>; Luckeciano Melo (Aeronautics Institute of Technology); Marcos Máximo (Aeronautics Institute of Technology); Adilson Cunha (Aeronautics Institute of Technology) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1910.10232&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHiUlq_Q94qUONidmLRsGnOJAvRCg">[external pdf link]</a>. </li><li>33:37  <a href="https://drive.google.com/file/d/1NNrv7fj0R4XWc_u_D4fP6EX_N2N8R_U0/view?usp=sharing">MERL: Multi-Head Reinforcement Learning</a>; Yannis Flet-Berliac (University of Lille / Inria); Philippe Preux (INRIA) <a href="https://www.google.com/url?q=https%3A%2F%2Farxiv.org%2Fabs%2F1909.11939&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNHX_MOTNfoJ5xza-o7xAUbbsomUQw">[external pdf link]</a>. </li><li>35:30  <a href="https://drive.google.com/file/d/1Xuis8jh5R5vKzA_TLY3JvaUcKjJ2dLQ2/view?usp=drivesdk">Emergen...</a></li></ul>]]>
      </content:encoded>
      <pubDate>Thu, 19 Dec 2019 23:00:00 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/ece23fd8/1ed846c6.mp3" length="47339283" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:duration>3378</itunes:duration>
      <itunes:summary>Hear directly from presenters at the NeurIPS 2019 Deep RL Workshop on their work!</itunes:summary>
      <itunes:subtitle>Hear directly from presenters at the NeurIPS 2019 Deep RL Workshop on their work!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Scott Fujimoto</title>
      <itunes:episode>7</itunes:episode>
      <podcast:episode>7</podcast:episode>
      <itunes:title>Scott Fujimoto</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">2d3f6f81-63b0-4f5f-976e-1be3782130c5</guid>
      <link>https://share.transistor.fm/s/a5c4b784</link>
      <description>
        <![CDATA[<p>Scott Fujimoto is a PhD student at McGill University and Mila. He is the author of TD3 as well as some of the recent developments in batch deep reinforcement learning.  </p><p><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/1802.09477">Addressing Function Approximation Error in Actor-Critic Methods</a> <br>Scott Fujimoto, Herke van Hoof, David Meger </p><p><a href="https://arxiv.org/abs/1812.02900">Off-Policy Deep Reinforcement Learning without Exploration</a> </p><p>Scott Fujimoto, David Meger, Doina Precup </p><p><a href="https://arxiv.org/abs/1910.01708">Benchmarking Batch Deep Reinforcement Learning Algorithms</a> </p><p>Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/1907.04543">Striving for Simplicity in Off-Policy Deep Reinforcement Learning</a> <br>Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi </li><li><a href="https://arxiv.org/abs/1801.01290">Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor</a> <br>Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine </li><li><a href="https://arxiv.org/abs/1907.00456">Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog <br></a>Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard </li><li><a href="https://arxiv.org/abs/1509.02971">Continuous control with deep reinforcement learning</a> <br>Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra </li><li><a href="https://arxiv.org/abs/1804.08617">Distributed Distributional Deterministic Policy Gradients</a> <br>Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Scott Fujimoto is a PhD student at McGill University and Mila. He is the author of TD3 as well as some of the recent developments in batch deep reinforcement learning.  </p><p><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/1802.09477">Addressing Function Approximation Error in Actor-Critic Methods</a> <br>Scott Fujimoto, Herke van Hoof, David Meger </p><p><a href="https://arxiv.org/abs/1812.02900">Off-Policy Deep Reinforcement Learning without Exploration</a> </p><p>Scott Fujimoto, David Meger, Doina Precup </p><p><a href="https://arxiv.org/abs/1910.01708">Benchmarking Batch Deep Reinforcement Learning Algorithms</a> </p><p>Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/1907.04543">Striving for Simplicity in Off-Policy Deep Reinforcement Learning</a> <br>Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi </li><li><a href="https://arxiv.org/abs/1801.01290">Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor</a> <br>Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine </li><li><a href="https://arxiv.org/abs/1907.00456">Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog <br></a>Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard </li><li><a href="https://arxiv.org/abs/1509.02971">Continuous control with deep reinforcement learning</a> <br>Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra </li><li><a href="https://arxiv.org/abs/1804.08617">Distributed Distributional Deterministic Policy Gradients</a> <br>Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 18 Nov 2019 22:00:00 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/a5c4b784/265f7db0.mp3" length="40645968" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/YWVK8RWGDDxcdFQ01cZbkIidwGu-0uIOTx8x0PmDmeU/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzEyMjk3My8x/NjMzMTUyMDk0LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>2897</itunes:duration>
      <itunes:summary>Scott Fujimoto expounds on his TD3 and BCQ algorithms, DDPG, Benchmarking Batch RL, and more!</itunes:summary>
      <itunes:subtitle>Scott Fujimoto expounds on his TD3 and BCQ algorithms, DDPG, Benchmarking Batch RL, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Jessica Hamrick</title>
      <itunes:episode>6</itunes:episode>
      <podcast:episode>6</podcast:episode>
      <itunes:title>Jessica Hamrick</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">e207d82b-f127-4b32-b609-c33e242934ab</guid>
      <link>https://share.transistor.fm/s/7b1b3d83</link>
      <description>
        <![CDATA[<p><a href="http://www.jesshamrick.com/">Dr. Jessica Hamrick</a> is a Research Scientist at DeepMind. She holds a PhD in Psychology from UC Berkeley. </p><p><br><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/1904.03177">Structured agents for physical construction</a> <br>Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly L. Stachenfeld, Pushmeet Kohli, Peter W. Battaglia, Jessica B. Hamrick </p><p><a href="https://www.sciencedirect.com/science/article/pii/S2352154618301670">Analogues of mental simulation and imagination in deep learning</a> </p><p>Jessica Hamrick </p><p><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/1705.02670">Metacontrol for Adaptive Imagination-Based Optimization</a> <br>Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia </li><li><a href="https://arxiv.org/abs/1806.05780">Surprising Negative Results for Generative Adversarial Tree Search</a>  <br>Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Zachary C Lipton, Animashree Anandkumar </li><li><a href="https://escholarship.org/content/qt9tv951bd/qt9tv951bd.pdf">Metareasoning and Mental Simulation</a> <br>Jessica B. Hamrick </li><li><a href="https://arxiv.org/abs/1712.01815">Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm</a> <br>David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis </li><li><a href="https://arxiv.org/abs/1910.14361">Object-oriented state editing for HRL</a> <br>Victor Bapst, Alvaro Sanchez-Gonzalez, Omar Shams, Kimberly Stachenfeld, Peter W. Battaglia, Satinder Singh, Jessica B. Hamrick </li><li><a href="https://arxiv.org/abs/1703.01161">FeUdal Networks for Hierarchical Reinforcement Learning</a> <br>Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu </li><li><a href="http://mlg.eng.cam.ac.uk/pub/pdf/DeiRas11.pdf">PILCO: A Model-Based and Data-Efficient Approach to Policy Search</a> <br>Marc Peter Deisenroth, Carl Edward Rasmussen </li><li><a href="https://arxiv.org/abs/1807.10553">Blueberry Earth</a> <br>Anders Sandberg </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="http://www.jesshamrick.com/">Dr. Jessica Hamrick</a> is a Research Scientist at DeepMind. She holds a PhD in Psychology from UC Berkeley. </p><p><br><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/1904.03177">Structured agents for physical construction</a> <br>Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly L. Stachenfeld, Pushmeet Kohli, Peter W. Battaglia, Jessica B. Hamrick </p><p><a href="https://www.sciencedirect.com/science/article/pii/S2352154618301670">Analogues of mental simulation and imagination in deep learning</a> </p><p>Jessica Hamrick </p><p><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/1705.02670">Metacontrol for Adaptive Imagination-Based Optimization</a> <br>Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia </li><li><a href="https://arxiv.org/abs/1806.05780">Surprising Negative Results for Generative Adversarial Tree Search</a>  <br>Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Zachary C Lipton, Animashree Anandkumar </li><li><a href="https://escholarship.org/content/qt9tv951bd/qt9tv951bd.pdf">Metareasoning and Mental Simulation</a> <br>Jessica B. Hamrick </li><li><a href="https://arxiv.org/abs/1712.01815">Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm</a> <br>David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis </li><li><a href="https://arxiv.org/abs/1910.14361">Object-oriented state editing for HRL</a> <br>Victor Bapst, Alvaro Sanchez-Gonzalez, Omar Shams, Kimberly Stachenfeld, Peter W. Battaglia, Satinder Singh, Jessica B. Hamrick </li><li><a href="https://arxiv.org/abs/1703.01161">FeUdal Networks for Hierarchical Reinforcement Learning</a> <br>Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu </li><li><a href="http://mlg.eng.cam.ac.uk/pub/pdf/DeiRas11.pdf">PILCO: A Model-Based and Data-Efficient Approach to Policy Search</a> <br>Marc Peter Deisenroth, Carl Edward Rasmussen </li><li><a href="https://arxiv.org/abs/1807.10553">Blueberry Earth</a> <br>Anders Sandberg </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Mon, 11 Nov 2019 21:00:00 -0800</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/7b1b3d83/6bf0a236.mp3" length="53603309" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/r6tcaNPjj8m3HGRc07iI0_9ic2LAkCcKozj8E8Qv6so/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzExNDU0Mi8x/NjMyODYxMTUzLWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>3823</itunes:duration>
      <itunes:summary>Jessica Hamrick sheds light on Model-based RL, Structured agents, Mental simulation, Metacontrol, Construction environments, Blueberries, and more!</itunes:summary>
      <itunes:subtitle>Jessica Hamrick sheds light on Model-based RL, Structured agents, Mental simulation, Metacontrol, Construction environments, Blueberries, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Pablo Samuel Castro</title>
      <itunes:episode>5</itunes:episode>
      <podcast:episode>5</podcast:episode>
      <itunes:title>Pablo Samuel Castro</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">6c6c8d9b-17c0-4f9e-846b-1e40b5369c50</guid>
      <link>https://share.transistor.fm/s/4fe97ed3</link>
      <description>
        <![CDATA[<p><a href="https://scholar.google.com/citations?user=jn5r6TsAAAAJ&amp;hl=en">Dr Pablo Samuel Castro</a> is a Staff Research Software Engineer at Google Brain.  He is the main author of the <a href="https://github.com/google/dopamine">Dopamine RL framework</a>. </p><p><br><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/1901.11084">A Comparative Analysis of Expected and Distributional Reinforcement Learning</a> </p><p>Clare Lyle, Pablo Samuel Castro, Marc G. Bellemare  </p><p><br><a href="https://arxiv.org/abs/1901.11530">A Geometric Perspective on Optimal Representations for Reinforcement Learning</a> </p><p>Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle </p><p><br><a href="https://arxiv.org/abs/1812.06110">Dopamine: A Research Framework for Deep Reinforcement Learning</a> <br>Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, Marc G. Bellemare </p><p><a href="https://github.com/google/dopamine">Dopamine RL framework</a> on github <br> </p><p><a href="https://github.com/tensorflow/agents">Tensorflow Agents</a> on github </p><p><strong>Additional References </strong></p><ul><li><a href="https://www.aaai.org/Papers/IJCAI/2007/IJCAI07-392.pdf">Using Linear Programming for Bayesian Exploration in Markov Decision Processes</a> <br>Pablo Samuel Castro, Doina Precup </li><li><a href="https://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/viewFile/1907/2148">Using bisimulation for policy transfer in MDPs</a> <br>Pablo Samuel Castro, Doina Precup </li><li><a href="https://arxiv.org/abs/1710.02298">Rainbow: Combining Improvements in Deep Reinforcement Learning <br></a>Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver </li><li><a href="https://arxiv.org/abs/1806.06923">Implicit Quantile Networks for Distributional Reinforcement Learning <br></a>Will Dabney, Georg Ostrovski, David Silver, Rémi Munos </li><li><a href="https://arxiv.org/abs/1707.06887">A Distributional Perspective on Reinforcement Learning</a> <br>Marc G. Bellemare, Will Dabney, Rémi Munos </li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://scholar.google.com/citations?user=jn5r6TsAAAAJ&amp;hl=en">Dr Pablo Samuel Castro</a> is a Staff Research Software Engineer at Google Brain.  He is the main author of the <a href="https://github.com/google/dopamine">Dopamine RL framework</a>. </p><p><br><strong>Featured References </strong></p><p><a href="https://arxiv.org/abs/1901.11084">A Comparative Analysis of Expected and Distributional Reinforcement Learning</a> </p><p>Clare Lyle, Pablo Samuel Castro, Marc G. Bellemare  </p><p><br><a href="https://arxiv.org/abs/1901.11530">A Geometric Perspective on Optimal Representations for Reinforcement Learning</a> </p><p>Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle </p><p><br><a href="https://arxiv.org/abs/1812.06110">Dopamine: A Research Framework for Deep Reinforcement Learning</a> <br>Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, Marc G. Bellemare </p><p><a href="https://github.com/google/dopamine">Dopamine RL framework</a> on github <br> </p><p><a href="https://github.com/tensorflow/agents">Tensorflow Agents</a> on github </p><p><strong>Additional References </strong></p><ul><li><a href="https://www.aaai.org/Papers/IJCAI/2007/IJCAI07-392.pdf">Using Linear Programming for Bayesian Exploration in Markov Decision Processes</a> <br>Pablo Samuel Castro, Doina Precup </li><li><a href="https://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/viewFile/1907/2148">Using bisimulation for policy transfer in MDPs</a> <br>Pablo Samuel Castro, Doina Precup </li><li><a href="https://arxiv.org/abs/1710.02298">Rainbow: Combining Improvements in Deep Reinforcement Learning <br></a>Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver </li><li><a href="https://arxiv.org/abs/1806.06923">Implicit Quantile Networks for Distributional Reinforcement Learning <br></a>Will Dabney, Georg Ostrovski, David Silver, Rémi Munos </li><li><a href="https://arxiv.org/abs/1707.06887">A Distributional Perspective on Reinforcement Learning</a> <br>Marc G. Bellemare, Will Dabney, Rémi Munos </li></ul>]]>
      </content:encoded>
      <pubDate>Wed, 09 Oct 2019 20:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/4fe97ed3/ee4518c9.mp3" length="47677289" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/ZkY9Fe73wmalsJx2I0LgpIbvvH0PzGfz5PknrqycUAU/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzExNDU0MS8x/NjMyODYxNDExLWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>3399</itunes:duration>
      <itunes:summary>Pablo Samuel Castro drops in and drops knowledge on distributional RL, bisimulation, the Dopamine RL Framework, TF-Agents, and much more!</itunes:summary>
      <itunes:subtitle>Pablo Samuel Castro drops in and drops knowledge on distributional RL, bisimulation, the Dopamine RL Framework, TF-Agents, and much more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Kamyar Azizzadenesheli</title>
      <itunes:episode>4</itunes:episode>
      <podcast:episode>4</podcast:episode>
      <itunes:title>Kamyar Azizzadenesheli</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">b817bbf1-41f1-4c6c-bb9a-15a85b2500b7</guid>
      <link>https://share.transistor.fm/s/107698de</link>
      <description>
        <![CDATA[<p><a href="https://sites.google.com/view/kazizzad">Dr. Kamyar Azizzadenesheli</a> is a post-doctorate scholar at Caltech.  His research interest is mainly in the area of Machine Learning, from theory to practice, with the main focus in Reinforcement Learning.  He will be joining Purdue University as an Assistant CS Professor in Fall 2020. </p><p><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/1802.04412">Efficient Exploration through Bayesian Deep Q-Networks <br></a>Kamyar Azizzadenesheli, Animashree Anandkumar </p><p><a href="https://arxiv.org/abs/1806.05780">Surprising Negative Results for Generative Adversarial Tree Search <br></a>Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Zachary C Lipton, Animashree Anandkumar </p><p><a href="https://openreview.net/forum?id=B1ghkEdzo4">Maybe a few considerations in Reinforcement Learning Research? <br></a>Kamyar Azizzadenesheli <br> </p><p><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/1903.00374">Model-Based Reinforcement Learning for Atari</a>  <br>Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski </li><li><a href="http://www.jmlr.org/papers/volume11/jaksch10a/jaksch10a.pdf">Near-optimal Regret Bounds for Reinforcement Learning</a> <br>Thomas Jaksch, Ronald Ortner, Peter Auer </li><li><a href="https://ieeexplore.ieee.org/abstract/document/170605">Curious Model-Building Control Systems</a> <br>Jürgen Schmidhuber </li><li><a href="https://arxiv.org/abs/1710.02298">Rainbow: Combining Improvements in Deep Reinforcement Learning</a>  <br>Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver </li><li><a href="https://arxiv.org/abs/1706.04317">Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics</a> <br>Ken Kansky, Tom Silver, David A. Mély, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou, Nimrod Dorfman, Szymon Sidor, Scott Phoenix, Dileep George </li><li><a href="https://arxiv.org/abs/1712.01815">Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm</a> <br>David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis </li></ul><p><br></p><p><br></p><p><br></p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://sites.google.com/view/kazizzad">Dr. Kamyar Azizzadenesheli</a> is a post-doctorate scholar at Caltech.  His research interest is mainly in the area of Machine Learning, from theory to practice, with the main focus in Reinforcement Learning.  He will be joining Purdue University as an Assistant CS Professor in Fall 2020. </p><p><strong>Featured References <br></strong><br><a href="https://arxiv.org/abs/1802.04412">Efficient Exploration through Bayesian Deep Q-Networks <br></a>Kamyar Azizzadenesheli, Animashree Anandkumar </p><p><a href="https://arxiv.org/abs/1806.05780">Surprising Negative Results for Generative Adversarial Tree Search <br></a>Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Zachary C Lipton, Animashree Anandkumar </p><p><a href="https://openreview.net/forum?id=B1ghkEdzo4">Maybe a few considerations in Reinforcement Learning Research? <br></a>Kamyar Azizzadenesheli <br> </p><p><strong>Additional References </strong></p><ul><li><a href="https://arxiv.org/abs/1903.00374">Model-Based Reinforcement Learning for Atari</a>  <br>Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski </li><li><a href="http://www.jmlr.org/papers/volume11/jaksch10a/jaksch10a.pdf">Near-optimal Regret Bounds for Reinforcement Learning</a> <br>Thomas Jaksch, Ronald Ortner, Peter Auer </li><li><a href="https://ieeexplore.ieee.org/abstract/document/170605">Curious Model-Building Control Systems</a> <br>Jürgen Schmidhuber </li><li><a href="https://arxiv.org/abs/1710.02298">Rainbow: Combining Improvements in Deep Reinforcement Learning</a>  <br>Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver </li><li><a href="https://arxiv.org/abs/1706.04317">Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics</a> <br>Ken Kansky, Tom Silver, David A. Mély, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou, Nimrod Dorfman, Szymon Sidor, Scott Phoenix, Dileep George </li><li><a href="https://arxiv.org/abs/1712.01815">Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm</a> <br>David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis </li></ul><p><br></p><p><br></p><p><br></p><p><br></p>]]>
      </content:encoded>
      <pubDate>Fri, 20 Sep 2019 18:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/107698de/de92b927.mp3" length="72129654" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:duration>5146</itunes:duration>
      <itunes:summary>Kamyar Azizzadenesheli brings us insight on Bayesian RL, Generative Adversarial Tree search, what goes into great RL papers, and much more!</itunes:summary>
      <itunes:subtitle>Kamyar Azizzadenesheli brings us insight on Bayesian RL, Generative Adversarial Tree search, what goes into great RL papers, and much more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Antonin Raffin and Ashley Hill</title>
      <itunes:episode>3</itunes:episode>
      <podcast:episode>3</podcast:episode>
      <itunes:title>Antonin Raffin and Ashley Hill</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">5389c54d-5292-4aa5-9832-a30ca9317a23</guid>
      <link>https://share.transistor.fm/s/b4a51a16</link>
      <description>
        <![CDATA[<p><a href="https://araffin.github.io/">Antonin Raffin</a> is a researcher at the <a href="https://www.dlr.de/EN/Home/home_node.html">German Aerospace Center (DLR)</a> in Munich, working in the Institute of Robotics and Mechatronics. His research is on using machine learning for controlling real robots (because simulation is not enough), with a particular interest for reinforcement learning. </p><p><br><a href="https://github.com/hill-a">Ashley Hill</a> is doing his thesis on improving control algorithms using machine learning for real time gain tuning. </p><p>He works mainly with neuroevolution, genetic algorithms, and of course reinforcement learning, applied to mobile robots.  He holds a masters degree in Machine learning, and a bachelors in Computer science from the Université Paris-Saclay. </p><p><strong>Featured References </strong></p><p><a href="https://github.com/hill-a/stable-baselines">stable-baselines</a> on github <br>Ashley Hill, Antonin Raffin primary authors. </p><p><a href="https://github.com/araffin/robotics-rl-srl">S-RL Toolbox</a> <br>Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat </p><p><a href="https://arxiv.org/abs/1901.08651">Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics</a> <br>Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://towardsdatascience.com/learning-to-drive-smoothly-in-minutes-450a7cdb35f4">Learning to Drive Smoothly in Minutes</a>, Antonin Raffin </li><li>Multimodal SRL (best paper at ICRA): <a href="https://arxiv.org/abs/1810.10191">Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal  Representations for Contact-Rich Tasks</a>,  Michelle A. Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg </li><li><a href="https://arxiv.org/abs/1907.02057">Benchmarking Model-Based Reinforcement Learning</a>, Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba </li><li><a href="https://tossingbot.cs.princeton.edu/">TossingBot: Learning to Throw Arbitrary Objects with Residual Physics</a> <br>Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser </li><li><a href="https://github.com/hill-a/stable-baselines/projects/1">Stable Baselines roadmap</a> </li><li><a href="https://github.com/openai/baselines/pull/481">OpenAI baselines stable-baselines github pull request</a> </li></ul><p><br></p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://araffin.github.io/">Antonin Raffin</a> is a researcher at the <a href="https://www.dlr.de/EN/Home/home_node.html">German Aerospace Center (DLR)</a> in Munich, working in the Institute of Robotics and Mechatronics. His research is on using machine learning for controlling real robots (because simulation is not enough), with a particular interest for reinforcement learning. </p><p><br><a href="https://github.com/hill-a">Ashley Hill</a> is doing his thesis on improving control algorithms using machine learning for real time gain tuning. </p><p>He works mainly with neuroevolution, genetic algorithms, and of course reinforcement learning, applied to mobile robots.  He holds a masters degree in Machine learning, and a bachelors in Computer science from the Université Paris-Saclay. </p><p><strong>Featured References </strong></p><p><a href="https://github.com/hill-a/stable-baselines">stable-baselines</a> on github <br>Ashley Hill, Antonin Raffin primary authors. </p><p><a href="https://github.com/araffin/robotics-rl-srl">S-RL Toolbox</a> <br>Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat </p><p><a href="https://arxiv.org/abs/1901.08651">Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics</a> <br>Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat </p><p><br><strong>Additional References </strong></p><ul><li><a href="https://towardsdatascience.com/learning-to-drive-smoothly-in-minutes-450a7cdb35f4">Learning to Drive Smoothly in Minutes</a>, Antonin Raffin </li><li>Multimodal SRL (best paper at ICRA): <a href="https://arxiv.org/abs/1810.10191">Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal  Representations for Contact-Rich Tasks</a>,  Michelle A. Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg </li><li><a href="https://arxiv.org/abs/1907.02057">Benchmarking Model-Based Reinforcement Learning</a>, Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba </li><li><a href="https://tossingbot.cs.princeton.edu/">TossingBot: Learning to Throw Arbitrary Objects with Residual Physics</a> <br>Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser </li><li><a href="https://github.com/hill-a/stable-baselines/projects/1">Stable Baselines roadmap</a> </li><li><a href="https://github.com/openai/baselines/pull/481">OpenAI baselines stable-baselines github pull request</a> </li></ul><p><br></p><p><br></p>]]>
      </content:encoded>
      <pubDate>Wed, 04 Sep 2019 17:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/b4a51a16/6cd7e122.mp3" length="29230701" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/Xp_EPVn98GmleT8n-3g_nkYnWb4kf6aoWfSwxivl4uw/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzEwMTY0MS8x/NjMzMDI3OTA3LWFy/dHdvcmsuanBn.jpg"/>
      <itunes:duration>2082</itunes:duration>
      <itunes:summary>Antonin Raffin and Ashley Hill discuss Stable Baselines past, present and future, State Representation Learning, S-RL Toolbox, RL on real robots, big compute for RL and much more!</itunes:summary>
      <itunes:subtitle>Antonin Raffin and Ashley Hill discuss Stable Baselines past, present and future, State Representation Learning, S-RL Toolbox, RL on real robots, big compute for RL and much more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Michael Littman</title>
      <itunes:episode>2</itunes:episode>
      <podcast:episode>2</podcast:episode>
      <itunes:title>Michael Littman</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">7d9fd082-bed7-44a2-9fbb-50d4d3e61e86</guid>
      <link>https://share.transistor.fm/s/3d194836</link>
      <description>
        <![CDATA[<p><a href="https://en.wikipedia.org/wiki/Michael_L._Littman">Michael L Littman</a> is a <a href="http://cs.brown.edu/~mlittman/">professor of Computer Science at Brown University</a>.  He was <a href="http://cs.brown.edu/news/2019/06/17/michael-littman-has-been-named-acm-fellow/">elected ACM Fellow</a> in 2018 "For contributions to the design and analysis of sequential decision making algorithms in artificial intelligence". </p><p><b>Featured References </b></p><p><a href="https://www.semanticscholar.org/paper/Convergent-Actor-Critic-by-Humans-MacGlashan-Littman/7b33392ce631c5f2543a7a602585a4fb6e874935">Convergent Actor Critic by Humans <br></a>James MacGlashan, Michael L. Littman, David L. Roberts, Robert Tyler Loftin, Bei Peng, Matthew E. Taylor </p><p><a href="https://psyarxiv.com/3cd7r/">People teach with rewards and punishments as communication, not reinforcements</a> <br>Mark Ho, Fiery Cushman, Michael L. Littman, Joseph Austerweil </p><p><a href="https://arxiv.org/abs/1901.06085">Theory of Minds: Understanding Behavior in Groups Through Inverse Planning</a> <br>Michael Shum, Max Kleiman-Weiner, Michael L. Littman, Joshua B. Tenenbaum </p><p><a href="https://arxiv.org/abs/1809.10025">Personalized education at scale</a> <br>Saarinen, Cater, Littman <br></p><p><strong>Additional References </strong></p><ul><li>Michael Littman papers on <a href="https://scholar.google.com/citations?user=iRMZ2hoAAAAJ&amp;hl=en&amp;oi=sra">Google Scholar</a>, <a href="https://www.semanticscholar.org/author/Michael-L.-Littman/144885169">Semantic Scholar</a> </li><li><a href="https://www.udacity.com/course/reinforcement-learning--ud600">Reinforcement Learning</a> on Udacity, Charles Isbell, Michael Littman, Chris Pryby  </li><li><a href="https://www.udacity.com/course/machine-learning--ud262">Machine Learning</a> on Udacity, Michael Littman, Charles Isbell, Pushkar Kolhe  </li><li><a href="https://cling.csd.uwo.ca/cs346a/extra/tdgammon.pdf">Temporal Difference Learning and TD-Gammon</a>, Gerald Tesauro </li><li><a href="https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf">Playing Atari with Deep Reinforcement Learning</a>, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller </li><li><a href="https://www.aaai.org/ojs/index.php/aimagazine/article/view/2729">Ask Me Anything about MOOCs</a>, D Fisher, C Isbell, ML Littman, M Wollowski, et al </li><li><a href="http://rldm.org/">Reinforcement Learning and Decision Making (RLDM)</a> Conference </li><li><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.17.4565">Algorithms for Sequential Decision Making</a>, Michael Littman's Thesis </li><li><a href="https://www.youtube.com/watch?v=DQWI1kvmwRg">Machine Learning A Cappella - Overfitting Thriller!</a>, Michael Littman and Charles Isbell feat Infinite Harmony </li><li><a href="https://www.youtube.com/watch?v=eIH5ip9JLTo">Turbotax Ad 2016: Genius Anna/Michael Littman</a> </li></ul><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><a href="https://en.wikipedia.org/wiki/Michael_L._Littman">Michael L Littman</a> is a <a href="http://cs.brown.edu/~mlittman/">professor of Computer Science at Brown University</a>.  He was <a href="http://cs.brown.edu/news/2019/06/17/michael-littman-has-been-named-acm-fellow/">elected ACM Fellow</a> in 2018 "For contributions to the design and analysis of sequential decision making algorithms in artificial intelligence". </p><p><b>Featured References </b></p><p><a href="https://www.semanticscholar.org/paper/Convergent-Actor-Critic-by-Humans-MacGlashan-Littman/7b33392ce631c5f2543a7a602585a4fb6e874935">Convergent Actor Critic by Humans <br></a>James MacGlashan, Michael L. Littman, David L. Roberts, Robert Tyler Loftin, Bei Peng, Matthew E. Taylor </p><p><a href="https://psyarxiv.com/3cd7r/">People teach with rewards and punishments as communication, not reinforcements</a> <br>Mark Ho, Fiery Cushman, Michael L. Littman, Joseph Austerweil </p><p><a href="https://arxiv.org/abs/1901.06085">Theory of Minds: Understanding Behavior in Groups Through Inverse Planning</a> <br>Michael Shum, Max Kleiman-Weiner, Michael L. Littman, Joshua B. Tenenbaum </p><p><a href="https://arxiv.org/abs/1809.10025">Personalized education at scale</a> <br>Saarinen, Cater, Littman <br></p><p><strong>Additional References </strong></p><ul><li>Michael Littman papers on <a href="https://scholar.google.com/citations?user=iRMZ2hoAAAAJ&amp;hl=en&amp;oi=sra">Google Scholar</a>, <a href="https://www.semanticscholar.org/author/Michael-L.-Littman/144885169">Semantic Scholar</a> </li><li><a href="https://www.udacity.com/course/reinforcement-learning--ud600">Reinforcement Learning</a> on Udacity, Charles Isbell, Michael Littman, Chris Pryby  </li><li><a href="https://www.udacity.com/course/machine-learning--ud262">Machine Learning</a> on Udacity, Michael Littman, Charles Isbell, Pushkar Kolhe  </li><li><a href="https://cling.csd.uwo.ca/cs346a/extra/tdgammon.pdf">Temporal Difference Learning and TD-Gammon</a>, Gerald Tesauro </li><li><a href="https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf">Playing Atari with Deep Reinforcement Learning</a>, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller </li><li><a href="https://www.aaai.org/ojs/index.php/aimagazine/article/view/2729">Ask Me Anything about MOOCs</a>, D Fisher, C Isbell, ML Littman, M Wollowski, et al </li><li><a href="http://rldm.org/">Reinforcement Learning and Decision Making (RLDM)</a> Conference </li><li><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.17.4565">Algorithms for Sequential Decision Making</a>, Michael Littman's Thesis </li><li><a href="https://www.youtube.com/watch?v=DQWI1kvmwRg">Machine Learning A Cappella - Overfitting Thriller!</a>, Michael Littman and Charles Isbell feat Infinite Harmony </li><li><a href="https://www.youtube.com/watch?v=eIH5ip9JLTo">Turbotax Ad 2016: Genius Anna/Michael Littman</a> </li></ul><p><br></p>]]>
      </content:encoded>
      <pubDate>Fri, 23 Aug 2019 16:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/3d194836/e083eec8.mp3" length="60237650" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/VEapK1jwQCccKXwPmFGqR3m_MPzko2aQTb6EZFIu4P8/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzg3MTMwLzE2/MzI3OTg2ODYtYXJ0/d29yay5qcGc.jpg"/>
      <itunes:duration>4296</itunes:duration>
      <itunes:summary>ACM Fellow Professor Michael L Littman enlightens us on Human feedback in RL, his Udacity courses, Theory of Mind, organizing the RLDM Conference, RL past and present, Hollywood cameos, and much more!</itunes:summary>
      <itunes:subtitle>ACM Fellow Professor Michael L Littman enlightens us on Human feedback in RL, his Udacity courses, Theory of Mind, organizing the RLDM Conference, RL past and present, Hollywood cameos, and much more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>Natasha Jaques</title>
      <itunes:episode>1</itunes:episode>
      <podcast:episode>1</podcast:episode>
      <itunes:title>Natasha Jaques</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">f1505376-5d88-4fd7-9ca0-906d8f59b5a8</guid>
      <link>https://share.transistor.fm/s/87d55ca4</link>
      <description>
        <![CDATA[<p>Natasha Jaques is a PhD candidate at MIT working on affective and social intelligence.  She has interned with DeepMind and Google Brain, and was an OpenAI Scholars mentor.  Her paper “<a href="https://arxiv.org/abs/1810.08647">Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning</a>” received an honourable mention for best paper at ICML 2019. </p><p><b>Featured References </b></p><p><a href="https://arxiv.org/abs/1810.08647"><strong>Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning</strong></a><strong> <br></strong>Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas </p><p><a href="https://arxiv.org/abs/1906.05433"><strong>Tackling climate change with Machine Learning</strong></a><strong> <br></strong>David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio </p><p><br></p><p><strong>Additional References </strong></p><ul><li><a href="https://offset.media.mit.edu/">MIT Media Lab Flight Offsets</a>,  Caroline Jaffe, Juliana Cherston, Natasha Jaques </li><li><a href="https://arxiv.org/abs/1802.09640">Modeling Others using Oneself in Multi-Agent Reinforcement Learning</a>, <br>Roberta Raileanu, Emily Denton, Arthur Szlam, Rob Fergus </li><li><a href="https://arxiv.org/abs/1803.08884">Inequity aversion improves cooperation in intertemporal social dilemmas</a>,  <br>Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, Thore Graepel </li><li><a href="https://github.com/eugenevinitsky/sequential_social_dilemma_games">Sequential Social Dilemma Games</a> on github<strong>, </strong>Eugene Vinitsky, Natasha Jaques  </li><li><a href="https://rohinshah.com/alignment-newsletter/">AI Alignment newsletter</a>, Rohin Shah </li><li><a href="https://arxiv.org/abs/1901.01753">Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions</a>, Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley </li><li><a href="http://cogprints.org/2694/1/SocialFunctionTxt.pdf">The social function of intellect</a>, Nicholas Humphrey </li><li><a href="https://arxiv.org/abs/1903.00742">Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research</a>, Joel Z. Leibo, Edward Hughes, Marc Lanctot, Thore Graepel </li><li><a href="http://karpathy.github.io/2019/04/25/recipe/">A Recipe for Training Neural Networks</a>, Andrej Karpathy </li><li><a href="https://www.cs.ubc.ca/~jaquesn/POMDPPaper.pdf">Emotionally Adaptive Intelligent Tutoring Systems using POMDPs</a>, Natasha Jaques </li><li><a href="https://www.ynharari.com/book/sapiens/">Sapiens</a>, Yuval Noah Harari <p></p></li></ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Natasha Jaques is a PhD candidate at MIT working on affective and social intelligence.  She has interned with DeepMind and Google Brain, and was an OpenAI Scholars mentor.  Her paper “<a href="https://arxiv.org/abs/1810.08647">Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning</a>” received an honourable mention for best paper at ICML 2019. </p><p><b>Featured References </b></p><p><a href="https://arxiv.org/abs/1810.08647"><strong>Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning</strong></a><strong> <br></strong>Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas </p><p><a href="https://arxiv.org/abs/1906.05433"><strong>Tackling climate change with Machine Learning</strong></a><strong> <br></strong>David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio </p><p><br></p><p><strong>Additional References </strong></p><ul><li><a href="https://offset.media.mit.edu/">MIT Media Lab Flight Offsets</a>,  Caroline Jaffe, Juliana Cherston, Natasha Jaques </li><li><a href="https://arxiv.org/abs/1802.09640">Modeling Others using Oneself in Multi-Agent Reinforcement Learning</a>, <br>Roberta Raileanu, Emily Denton, Arthur Szlam, Rob Fergus </li><li><a href="https://arxiv.org/abs/1803.08884">Inequity aversion improves cooperation in intertemporal social dilemmas</a>,  <br>Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, Thore Graepel </li><li><a href="https://github.com/eugenevinitsky/sequential_social_dilemma_games">Sequential Social Dilemma Games</a> on github<strong>, </strong>Eugene Vinitsky, Natasha Jaques  </li><li><a href="https://rohinshah.com/alignment-newsletter/">AI Alignment newsletter</a>, Rohin Shah </li><li><a href="https://arxiv.org/abs/1901.01753">Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions</a>, Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley </li><li><a href="http://cogprints.org/2694/1/SocialFunctionTxt.pdf">The social function of intellect</a>, Nicholas Humphrey </li><li><a href="https://arxiv.org/abs/1903.00742">Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research</a>, Joel Z. Leibo, Edward Hughes, Marc Lanctot, Thore Graepel </li><li><a href="http://karpathy.github.io/2019/04/25/recipe/">A Recipe for Training Neural Networks</a>, Andrej Karpathy </li><li><a href="https://www.cs.ubc.ca/~jaquesn/POMDPPaper.pdf">Emotionally Adaptive Intelligent Tutoring Systems using POMDPs</a>, Natasha Jaques </li><li><a href="https://www.ynharari.com/book/sapiens/">Sapiens</a>, Yuval Noah Harari <p></p></li></ul>]]>
      </content:encoded>
      <pubDate>Fri, 09 Aug 2019 15:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/87d55ca4/7026c06d.mp3" length="30336775" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/V79omKhqm9z2B79b5WS_0yIy8our34tfSNTMqeqPekM/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lcGlz/b2RlLzg0NTM3LzE2/MzI4NjE3MTItYXJ0/d29yay5qcGc.jpg"/>
      <itunes:duration>3024</itunes:duration>
      <itunes:summary>Natasha Jaques talks about her PhD, her papers on Social Influence in Multi-Agent RL, ML &amp;amp; Climate Change, Sequential Social Dilemmas, internships at DeepMind and Google Brain, Autocurricula, and more!</itunes:summary>
      <itunes:subtitle>Natasha Jaques talks about her PhD, her papers on Social Influence in Multi-Agent RL, ML &amp;amp; Climate Change, Sequential Social Dilemmas, internships at DeepMind and Google Brain, Autocurricula, and more!</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
    <item>
      <title>About TalkRL Podcast: All Reinforcement Learning, All the Time</title>
      <itunes:title>About TalkRL Podcast: All Reinforcement Learning, All the Time</itunes:title>
      <itunes:episodeType>trailer</itunes:episodeType>
      <guid isPermaLink="false">2baf9ba0-a7d8-4f31-95e8-de6c0943964e</guid>
      <link>https://share.transistor.fm/s/eb1eb0e8</link>
      <description>
        <![CDATA[<p>August 2, 2019 </p><p><b>Transcript </b></p><p>The idea with TalkRL Podcast is to hear from brilliant folks from across the world of Reinforcement Learning, both research and applications.  As much as possible, I want to hear from them in their own language.  I try to get to know as much as I can about their work before hand.  </p><p><br>And Im not here to convert anyone, I want to reach people who are already into RL.  So we wont stop to explain what a value function is, for example.  Though we also wont assume everyone has read the very latest papers. </p><p><br></p><p>Why am I doing this? Because it’s a great way to learn from the most inspiring people in the field!  There’s so much happening in the universe of RL, and there’s tons of interesting angles and so many fascinating minds to learn from. </p><p>Now I know there is no shortage of books, papers, and lectures, but so much goes unsaid. </p><p>I mean I guess if you work at MILA or AMII or Vector Institute, you might be having these conversations over coffee all the time, but I live in a little village in the woods in BC, so for me, these remote interviews are like a great way to have these conversations, and I hope sharing with the community makes it more worthwhile for everyone. </p><p><br></p><p>In terms of format, the first 2 episodes were interviews in longer form, around an hour long.  Going forward, some may be a lot shorter, it depends on the guest. </p><p><br></p><p>If you want want to be a guest or suggest a guest, goto <a href="https://www.talkrl.com/about">talkrl.com/about</a>, you will find a link to a suggestion form. </p><p><br></p><p>Thanks for listening! </p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>August 2, 2019 </p><p><b>Transcript </b></p><p>The idea with TalkRL Podcast is to hear from brilliant folks from across the world of Reinforcement Learning, both research and applications.  As much as possible, I want to hear from them in their own language.  I try to get to know as much as I can about their work before hand.  </p><p><br>And Im not here to convert anyone, I want to reach people who are already into RL.  So we wont stop to explain what a value function is, for example.  Though we also wont assume everyone has read the very latest papers. </p><p><br></p><p>Why am I doing this? Because it’s a great way to learn from the most inspiring people in the field!  There’s so much happening in the universe of RL, and there’s tons of interesting angles and so many fascinating minds to learn from. </p><p>Now I know there is no shortage of books, papers, and lectures, but so much goes unsaid. </p><p>I mean I guess if you work at MILA or AMII or Vector Institute, you might be having these conversations over coffee all the time, but I live in a little village in the woods in BC, so for me, these remote interviews are like a great way to have these conversations, and I hope sharing with the community makes it more worthwhile for everyone. </p><p><br></p><p>In terms of format, the first 2 episodes were interviews in longer form, around an hour long.  Going forward, some may be a lot shorter, it depends on the guest. </p><p><br></p><p>If you want want to be a guest or suggest a guest, goto <a href="https://www.talkrl.com/about">talkrl.com/about</a>, you will find a link to a suggestion form. </p><p><br></p><p>Thanks for listening! </p><p><br></p>]]>
      </content:encoded>
      <pubDate>Thu, 01 Aug 2019 12:00:00 -0700</pubDate>
      <author>Robin Ranjit Singh Chauhan</author>
      <enclosure url="https://media.transistor.fm/eb1eb0e8/e7088950.mp3" length="1634446" type="audio/mpeg"/>
      <itunes:author>Robin Ranjit Singh Chauhan</itunes:author>
      <itunes:duration>110</itunes:duration>
      <itunes:summary>Introducing TalkRL Podcast!  Also check out our website at talkRL.com</itunes:summary>
      <itunes:subtitle>Introducing TalkRL Podcast!  Also check out our website at talkRL.com</itunes:subtitle>
      <itunes:keywords>Reinforcement Learning, Machine Learning, Artificial Intelligence</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:person role="Host" href="https://www.robinc.net/" img="https://img.transistorcdn.com/uQaF4ejnqhaxlvKOFsaMZFK_IuVR5Y07Uvh2KpxKx_c/rs:fill:0:0:1/w:800/h:800/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9wZXJz/b24vOWI4Yjg4ZWYt/YmRmZi00ZWFkLTk4/YTgtMTRkZGFlZDFh/ZTJkLzE2Njc0MjY0/MDktaW1hZ2UuanBn.jpg">Robin Ranjit Singh Chauhan</podcast:person>
    </item>
  </channel>
</rss>
