-
Notifications
You must be signed in to change notification settings - Fork 854
Description
Hello, I am a user of colab pro+.(Chrome)
I have been experiencing the problem of "OSError: [Errno 107] Transport endpoint is not connected" in the process of training the model every day for a while, and I encounter it almost once a day, how to solve it?
And during this period, the training is still going on but no new model progress is generated, and the calculation units are continuously deducted, and if you don't pay attention, the whole day's calculation units are deducted.
upload video(click pic to watch video)
Start every day after the reset, training will appear after ten hours of this problem
I'm going crazy, epochs need at least take 3~4 hours !!
This has been happening every day since 2023/2/17 when colab updated something
The picture shows 8x% of the training progress, which took 3 hours, (at least 3 hours of training time per day was evaporated)
mount method
from google.colab import drive
drive.mount('/content/drive')
