Log inSign up
Bartłomiej Cupiał
131 posts
user avatar
Bartłomiej Cupiał
@CupiaBart
PhD Student @ University of Warsaw | @IDEAS_NCBR bartekcupial.github.io
Warsaw, Poland
Joined May 2019
635
Following
1,105
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • Pinned
    user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    So here's a story of, by far, the weirdest bug I've encountered in my CS career. Along with @maciejwolczyk we've been training a neural network that learns how to play NetHack, an old roguelike game, that looks like in the screenshot. Recenlty, something unexpected happened.
    2.2M
  • user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    Replying to @CupiaBart
    The moral is, if you encounter an unexpected bug, be sure to consult lunar calendar. Big thanks to @JensTuyls for solving this for us!
    87K
  • user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    Replying to @CupiaBart
    So apparently NetHack has a mechanic that slightly changes how the game plays every time it's full moon according to your system clock: nethackwiki.com/wiki/Time The player character is luckier, werewolves appear in their animal form, and the dogs howl ominously.
    86K
  • user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    Replying to @CupiaBart
    Next day in the morning I see a lot of messages on slack. Jens replied "Oh yes, it's probably a full moon today." What.
    88K
  • user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    Replying to @CupiaBart
    It doesn't make the game harder, but the model hasn't seen full moon data in its training set, so the score drops. In this particular case, it drops from 5k points to 3k points. We override the time so it's not a full moon, we evaluate the model - and it's 5k points again.
    83K
  • user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    Replying to @CupiaBart
    I check a moon phase calendar, and yes, it's a full moon today. Hands shaking, I start a new NetHack game, and the message says "You are lucky! Full moon tonight." What.
    84K
  • user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    Replying to @CupiaBart
    By the point we've spent several hours on this, it's 7 PM. I am starting to feel like a madman. I can't even watch a TV show constantly thinking about the bug. Before going to sleep I decide to ask @JensTuyls, the author of the model, if he knows what might be broken.
    84K
  • user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    Replying to @CupiaBart
    Namely, the CUDA libraries that allow us to compute things quickly on GPU. So we suspect that maybe something about these libraries changed that degraded the model. Because what else could have? And yes, recently the version was changed from 11.8 to 12.4.
    90K
  • user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    Replying to @CupiaBart
    We use a model by @JensTuyls that clones expert behavior on NetHack, and we improve it using RL methods. That model gets 5000 points and we finetune it in the game so that the score improves. However, suddenly in a recent run, Jens' model only got 3000 points. Quite a drop.
    105K
  • user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    Replying to @CupiaBart
    Revert code a few weeks back? Still 3000 points. Luckily, the server we run our experiments on saves the files from the previous runs. We find the files corresponding to a run that previously got 5000 points, we re-run, and, well, it gets 3000. Nothing about the code changed.
    98K
  • user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    Replying to @CupiaBart
    The CUDA mismatch probably shouldn't impact the results in this particular way, but we see no other explanation. We override the version to 11.8 - we still get 3000 points. We build a new environment from scratch, for CUDA 12.4 - 3000 points. Welp.
    89K
  • user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    Replying to @CupiaBart
    We repeat the evaluation on a personal laptop. This is slow and expensive without the specialized hardware, but we make it work. Again, 3000 points. We disable multithreading, GPU, and some other things that have at least a conceivable chance of causing the problem - 3000 points.
    85K
  • user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    Replying to @CupiaBart
    We start suspecting our software stack. Thankfully, we use Singularity which means that our whole environment is in a single, self-contained file. That file hasn't changed for a few months, so that shouldn't be the problem. However, the container loads one thing from the server.
    93K
  • user avatar
    Bartłomiej Cupiał
    @CupiaBart
    May 24, 2024
    Replying to @CupiaBart
    This problem is consistent between seeds so it's not just a fluke. Well, we probably screwed up something in the code for loading the model in the recent commit. Let's revert, no biggie. Except that after reversing to a version of the code from a few days back, we still get 3000.
    101K