fix batch_matmul for invalid mma config for sm < 80#227
fix batch_matmul for invalid mma config for sm < 80#227xinli-git merged 2 commits intohidet-org:mainfrom
Conversation
|
There is another minor annoyance with tuning this model. By default, hidet parallel tuning/build uses # of cores as parallel workers. However, when the fused kernel is large, the compilation processes would use all the memory in the system and cause system hang. Setting I will make a fix in another PR |
|
Thanks @xinli-git, it looks good to me. |
The option For the different schedules of the same operator, we use parallel build by default. We can add a new option named Besides this, we can also add a config to allow the user to control how much memory allocated for each job. The code about this is in Lines 262 to 279 in fe8b65f Currently, we allocate about 1.5 GiB for each job, sometimes it is not enough. We can add another option like hidet.option.tuning_parallelism(max_parallel_jobs: Optional[int] = None, job_preserved_memory: Optional[float] = 1.5)In which,
|
|
Thanks @yaoyaoding, I will work on this next :) |
### PR Description This PR addresses the issue outlined in [Issue #227](CentML/hidet#227). #### Problem: After investigation, the problem arises because multiple threads were writing to the same `fused_ir` directory. This led to race conditions where one thread might attempt to delete a file that another thread had already removed. Moreover, all candidates were being saved into a single directory, resulting in overwriting of files across different threads. #### Solution: This commit introduces the following changes to resolve the issue: 1. **Separate Directories for Each Candidate**: - Each candidate now gets its own directory under `fuse_ir`, preventing conflicts and file overwriting. 2. **Subdirectories for IRs by Candidate**: - The `ir` files are now stored in separate subdirectories under `candidate/ir`, ensuring that files from different candidates are isolated. 3. **Test Case Added**: - A new test case was introduced to verify that the number of folders in `fuse_ir` matches the total number of candidates listed in `candidate.txt`.
### PR Description This PR addresses the issue outlined in [Issue #227](CentML/hidet#227). #### Problem: After investigation, the problem arises because multiple threads were writing to the same `fused_ir` directory. This led to race conditions where one thread might attempt to delete a file that another thread had already removed. Moreover, all candidates were being saved into a single directory, resulting in overwriting of files across different threads. #### Solution: This commit introduces the following changes to resolve the issue: 1. **Separate Directories for Each Candidate**: - Each candidate now gets its own directory under `fuse_ir`, preventing conflicts and file overwriting. 2. **Subdirectories for IRs by Candidate**: - The `ir` files are now stored in separate subdirectories under `candidate/ir`, ensuring that files from different candidates are isolated. 3. **Test Case Added**: - A new test case was introduced to verify that the number of folders in `fuse_ir` matches the total number of candidates listed in `candidate.txt`.
### PR Description This PR addresses the issue outlined in [Issue #227](https://github.com/CentML/hidet/issues/227). #### Problem: After investigation, the problem arises because multiple threads were writing to the same `fused_ir` directory. This led to race conditions where one thread might attempt to delete a file that another thread had already removed. Moreover, all candidates were being saved into a single directory, resulting in overwriting of files across different threads. #### Solution: This commit introduces the following changes to resolve the issue: 1. **Separate Directories for Each Candidate**: - Each candidate now gets its own directory under `fuse_ir`, preventing conflicts and file overwriting. 2. **Subdirectories for IRs by Candidate**: - The `ir` files are now stored in separate subdirectories under `candidate/ir`, ensuring that files from different candidates are isolated. 3. **Test Case Added**: - A new test case was introduced to verify that the number of folders in `fuse_ir` matches the total number of candidates listed in `candidate.txt`.
Some MMA configs are not valid in sm<80. Fix this by filtering out mma_configs before creating the tuning space.
This should address #225
Also, I changed the warning for contiguous tensor to warn_once to reduce the amount of context printed to the consol.