Skip to content

fix batch_matmul for invalid mma config for sm < 80#227

Merged
xinli-git merged 2 commits intohidet-org:mainfrom
xinli-git:sd_dynamo_fix
May 15, 2023
Merged

fix batch_matmul for invalid mma config for sm < 80#227
xinli-git merged 2 commits intohidet-org:mainfrom
xinli-git:sd_dynamo_fix

Conversation

@xinli-git
Copy link
Copy Markdown
Contributor

@xinli-git xinli-git commented May 15, 2023

Some MMA configs are not valid in sm<80. Fix this by filtering out mma_configs before creating the tuning space.

This should address #225

Also, I changed the warning for contiguous tensor to warn_once to reduce the amount of context printed to the consol.

@xinli-git xinli-git requested a review from yaoyaoding May 15, 2023 13:53
@xinli-git
Copy link
Copy Markdown
Contributor Author

There is another minor annoyance with tuning this model. By default, hidet parallel tuning/build uses # of cores as parallel workers. However, when the fused kernel is large, the compilation processes would use all the memory in the system and cause system hang.

Setting hidet.option.parallel_build(enabled=True) seems to have no effect.

I will make a fix in another PR

@yaoyaoding
Copy link
Copy Markdown
Member

Thanks @xinli-git, it looks good to me.

@yaoyaoding
Copy link
Copy Markdown
Member

There is another minor annoyance with tuning this model. By default, hidet parallel tuning/build uses # of cores as parallel workers. However, when the fused kernel is large, the compilation processes would use all the memory in the system and cause system hang.

Setting hidet.option.parallel_build(enabled=True) seems to have no effect.

I will make a fix in another PR

The option hidet.option.parallel_build() is used to control the parallelization of building operators in the flow graph.

For the different schedules of the same operator, we use parallel build by default.
See the parallel argument in the following line:

ir_modules, output_dir=tuning_dir, parallel=True, verbose=True

We can add a new option named hidet.option.parallel_tuning to control whether to parallelize the tuning process.

Besides this, we can also add a config to allow the user to control how much memory allocated for each job. The code about this is in

# Set the affinity of current process. Some package such as numpy will change affinity of current process,
# which might limit the parallelism of compilation.
os.sched_setaffinity(0, range(os.cpu_count()))
# the maximum number of processes is limited by the number of cores and memory
mem_for_worker = 1.5 * 1024 * 1024 * 1024 # 1.5 GiB
num_workers = min(max(int(psutil.virtual_memory().available // mem_for_worker), 1), psutil.cpu_count())
_lazy_initialize_cuda()
with multiprocessing.Pool(processes=num_workers) as pool:
for build_result in tqdm(
pool.imap(_build_ir_module_job, jobs),
desc='Compiling',
total=len(jobs),
disable=not verbose,
ncols=80,
):
build_results.append(build_result)

Currently, we allocate about 1.5 GiB for each job, sometimes it is not enough. We can add another option like

hidet.option.tuning_parallelism(max_parallel_jobs: Optional[int] = None, job_preserved_memory: Optional[float] = 1.5)

In which,

  • max_parallel_jobs: how many parallel jobs we use at most. None indicate the number of vcpu of the host machine.
  • job_preserved_memory: how many GiB we should preserve for each job.
    The actual number of workers is calculated as min(parallel_jobs, available_mem / job_preserved_memory).

@xinli-git
Copy link
Copy Markdown
Contributor Author

Thanks @yaoyaoding, I will work on this next :)

@xinli-git xinli-git merged commit 8a57169 into hidet-org:main May 15, 2023
@xinli-git xinli-git deleted the sd_dynamo_fix branch July 5, 2023 21:07
@xinli-git xinli-git restored the sd_dynamo_fix branch July 5, 2023 21:07
@xinli-git xinli-git deleted the sd_dynamo_fix branch July 5, 2023 21:07
vadiklyutiy pushed a commit that referenced this pull request Dec 19, 2024
### PR Description

This PR addresses the issue outlined in [Issue
#227](CentML/hidet#227).

#### Problem:
After investigation, the problem arises because multiple threads were
writing to the same `fused_ir` directory. This led to race conditions
where one thread might attempt to delete a file that another thread had
already removed. Moreover, all candidates were being saved into a single
directory, resulting in overwriting of files across different threads.

#### Solution:
This commit introduces the following changes to resolve the issue:

1. **Separate Directories for Each Candidate**: 
- Each candidate now gets its own directory under `fuse_ir`, preventing
conflicts and file overwriting.
   
2. **Subdirectories for IRs by Candidate**:
- The `ir` files are now stored in separate subdirectories under
`candidate/ir`, ensuring that files from different candidates are
isolated.

3. **Test Case Added**:
- A new test case was introduced to verify that the number of folders in
`fuse_ir` matches the total number of candidates listed in
`candidate.txt`.
vadiklyutiy pushed a commit that referenced this pull request Dec 20, 2024
### PR Description

This PR addresses the issue outlined in [Issue
#227](CentML/hidet#227).

#### Problem:
After investigation, the problem arises because multiple threads were
writing to the same `fused_ir` directory. This led to race conditions
where one thread might attempt to delete a file that another thread had
already removed. Moreover, all candidates were being saved into a single
directory, resulting in overwriting of files across different threads.

#### Solution:
This commit introduces the following changes to resolve the issue:

1. **Separate Directories for Each Candidate**: 
- Each candidate now gets its own directory under `fuse_ir`, preventing
conflicts and file overwriting.
   
2. **Subdirectories for IRs by Candidate**:
- The `ir` files are now stored in separate subdirectories under
`candidate/ir`, ensuring that files from different candidates are
isolated.

3. **Test Case Added**:
- A new test case was introduced to verify that the number of folders in
`fuse_ir` matches the total number of candidates listed in
`candidate.txt`.
vadiklyutiy pushed a commit that referenced this pull request Dec 26, 2024
### PR Description

This PR addresses the issue outlined in [Issue
#227](https://github.com/CentML/hidet/issues/227).

#### Problem:
After investigation, the problem arises because multiple threads were
writing to the same `fused_ir` directory. This led to race conditions
where one thread might attempt to delete a file that another thread had
already removed. Moreover, all candidates were being saved into a single
directory, resulting in overwriting of files across different threads.

#### Solution:
This commit introduces the following changes to resolve the issue:

1. **Separate Directories for Each Candidate**: 
- Each candidate now gets its own directory under `fuse_ir`, preventing
conflicts and file overwriting.
   
2. **Subdirectories for IRs by Candidate**:
- The `ir` files are now stored in separate subdirectories under
`candidate/ir`, ensuring that files from different candidates are
isolated.

3. **Test Case Added**:
- A new test case was introduced to verify that the number of folders in
`fuse_ir` matches the total number of candidates listed in
`candidate.txt`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants