[DataLoader2] Saving and restoring initial seed generator#998
[DataLoader2] Saving and restoring initial seed generator#998NivekT wants to merge 19 commits intogh/NivekT/105/basefrom
Conversation
[ghstack-poisoned]
[ghstack-poisoned]
ejguan
left a comment
There was a problem hiding this comment.
I have two main comments:
SeedGeneratoris pickable. No need to add those supporting functions.- IIUC, initial state should be the state at the beginning of each epoch but not the beginning of all epochs
Yea, that makes sense. I can take those out.
I'm expecting Does that look right to you? |
|
I am not sure if it's needed to differentiate initial seed and current state of seed generator. After SG seeds the graph, the state of SG is not going to evolve during iteration. All random states should be self-preserved by random DataPipes. During |
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
---
### Consideration
I decided to make modification to the existing APIs. Alternatively, we can create a new method.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
- I think lots of current users (including some internals) are in this category.
- This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
- Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
---
### Consideration
I decided to make modification to the existing APIs. Alternatively, we can create a new method.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
- I think lots of current users (including some internals) are in this category.
- This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
- Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
---
### Consideration
I decided to make modification to the existing APIs. Alternatively, we can create a new method.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
- I think lots of current users (including some internals) are in this category.
- This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
- Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
---
### Consideration
I decided to make modification to the existing APIs. Alternatively, we can create a new method.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
- I think lots of current users (including some internals) are in this category.
- This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
- Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
---
### Consideration
I decided to make modification to the existing APIs. Alternatively, we can create a new method.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
- I think lots of current users (including some internals) are in this category.
- This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
- Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
---
### Consideration
I decided to make modification to the existing APIs. Alternatively, we can create a new method.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
- I think lots of current users (including some internals) are in this category.
- This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
- Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
---
### Consideration
I decided to make modification to the existing APIs. Alternatively, we can create a new method.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
- I think lots of current users (including some internals) are in this category.
- This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
- Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
---
### Consideration
I decided to make modification to the existing APIs. Alternatively, we can create a new method.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
- I think lots of current users (including some internals) are in this category.
- This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
- Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
---
### Consideration
I decided to make modification to the existing APIs. Alternatively, we can create a new method.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
- I think lots of current users (including some internals) are in this category.
- This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
- Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
---
### Consideration
I decided to make modification to the existing APIs. Alternatively, we can create a new method.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
- I think lots of current users (including some internals) are in this category.
- This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
- Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
---
### Consideration
I decided to make modification to the existing APIs. Alternatively, we can create a new method.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
- I think lots of current users (including some internals) are in this category.
- This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
- Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
|
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Changes to `DataLoader2`:
- Modifying `state_dict` to store `randomness_state`, which includes:
- `_seed: int`
- `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
- `_seed_generator` - the latest version at the time when `state_dict` is called
- `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
- Modifying `from_state` and `load_state_dict` to restore `randomness_state`
- Adding a method `_restore_checkpoint_beginning_of_epoch`
- This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.
---
### Considerations
Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
- Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint.
- Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store `randomness_state`, which includes:
- `_seed: int`
- `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
- `_seed_generator` - the latest version at the time when `state_dict` is called
- `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
- Modifying `from_state` and `load_state_dict` to restore `randomness_state`
- Adding a method `_restore_checkpoint_beginning_of_epoch`
- This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.
---
### Considerations
Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
- Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint.
- Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store `randomness_state`, which includes:
- `_seed: int`
- `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
- `_seed_generator` - the latest version at the time when `state_dict` is called
- `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
- Modifying `from_state` and `load_state_dict` to restore `randomness_state`
- Adding a method `_restore_checkpoint_beginning_of_epoch`
- This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.
---
### Considerations
Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
- Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint.
- Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
|
I looked into whether we can refactor |
ejguan
left a comment
There was a problem hiding this comment.
Overall LGTM with a few comments below
Changes to `DataLoader2`:
- Modifying `state_dict` to store `randomness_state`, which includes:
- `_seed: int`
- `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
- `_seed_generator` - the latest version at the time when `state_dict` is called
- `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
- Modifying `from_state` and `load_state_dict` to restore `randomness_state`
- Adding a method `_restore_checkpoint_beginning_of_epoch`
- This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.
---
### Considerations
Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
- Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint.
- Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
[ghstack-poisoned]
|
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Changes to `DataLoader2`:
- Modifying `state_dict` to store `randomness_state`, which includes:
- `_seed: int`
- `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
- `_seed_generator` - the latest version at the time when `state_dict` is called
- `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
- Modifying `from_state` and `load_state_dict` to restore `randomness_state`
- Adding a method `_restore_checkpoint_beginning_of_epoch`
- This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.
---
### Considerations
Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
- Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint.
- Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
Differential Revision: [D44390519](https://our.internmc.facebook.com/intern/diff/D44390519)
[ghstack-poisoned]
|
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
|
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Changes to `DataLoader2`:
- Modifying `state_dict` to store `randomness_state`, which includes:
- `_seed: int`
- `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
- `_seed_generator` - the latest version at the time when `state_dict` is called
- `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
- Modifying `from_state` and `load_state_dict` to restore `randomness_state`
- Adding a method `_restore_checkpoint_beginning_of_epoch`
- This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.
---
### Considerations
Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
- Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint.
- Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
Differential Revision: [D44390519](https://our.internmc.facebook.com/intern/diff/D44390519)
[ghstack-poisoned]
|
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…tate for backward compatibility" Follow up to #998 for backward compatibility. Differential Revision: [D44747988](https://our.internmc.facebook.com/intern/diff/D44747988) [ghstack-poisoned]
…d compatibility" Follow up to #998 for backward compatibility. Differential Revision: [D44747988](https://our.internmc.facebook.com/intern/diff/D44747988) [ghstack-poisoned]
…tate for backward compatibility" Follow up to #998 for backward compatibility. Differential Revision: [D44747988](https://our.internmc.facebook.com/intern/diff/D44747988) [ghstack-poisoned]
…d compatibility" Follow up to #998 for backward compatibility. Differential Revision: [D44747988](https://our.internmc.facebook.com/intern/diff/D44747988) [ghstack-poisoned]
Summary: Pull Request resolved: #1124 Reland of #998 with added guard while loading randomness state in `DataLoader2` for backward compatibility Changes to `DataLoader2`: - Modifying `state_dict` to store `randomness_state`, which includes: - `_seed: int` - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set - `_seed_generator` - the latest version at the time when `state_dict` is called - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch - Modifying `from_state` and `load_state_dict` to restore `randomness_state` - Adding a method `_restore_checkpoint_beginning_of_epoch` - This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning. --- ### Considerations Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial. I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch. - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. Test Plan: Imported from OSS f425956975 Reviewed By: bearzx Differential Revision: D44748514 Pulled By: NivekT fbshipit-source-id: 8713592902b1e0680e46e4db4280c84c708dbf55
Stack from ghstack:
restore_iterationto ReadingService method for arbitrary checkpointing #1056Changes to
DataLoader2:state_dictto storerandomness_state, which includes:_seed: int_reset_seed: bool- flag indicating whether_seedneeds to be set_seed_generator- the latest version at the time whenstate_dictis called_initial_seed_generator- the versopm that is saved at the beginning of very epochfrom_stateandload_state_dictto restorerandomness_state_restore_checkpoint_beginning_of_epochself._seed_generator = self._initial_seed_generator, allowing users to re-create an epoch from the beginning.Considerations
Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
The basic idea is that we want to allow users to restore
dl2._seed_generatorto the previously saved version. From that point on, they can create a new__iter__and continue from the beginning of the epoch._seedand_reset_seedare also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint.seed. Thatseedwill override any other behavior and theseedwill be used.Differential Revision: D44390519