Add validation for pool names to prevent InvalidStatsNameException#59938
Conversation
…ol names can now only contain ASCII alphabets, numbers , underscores,dots,and dashes to ensure compatibility with stats naming requirments. fixes apache#59935
|
Not against this change, but it's important to note that this will be a breaking change. |
@BasPH Good point! This validation is intended to prevent the InvalidStatsNameException mentioned in #8696. To minimize breaking changes, we could:
Which approach would you prefer? Happy to adjust the PR accordingly. |
There was a problem hiding this comment.
I do not think it's a good idea to raise issue at stat's creation time. this will mean that when you create an invalid pool, things will start crashing soon after. That's quite wrong behaviour.
There are two solutions to that:
- prevent creating a pool with wrong name and migrate old pool names (that would be complex and quite impossible or super-difficult for users who currently have invalid pool name - because they would also have to migrate their dags and likely many other things
- slugify/normalize the name - i.e. for example - replace all the invalid characters in the pool with other (valid) character when raising the stats and raise a warning in logs explaining that the pool is reported in stats with different name and that the user should change pool names to be valid if they want to remove that warning
The 2) Is way more feasible and generally way better.
potiuk
left a comment
There was a problem hiding this comment.
I was supposed to request changes
Following @potiuk's feedback, changed approach from preventing invalid pool names to normalizing them for stats reporting. Changes: - Replaced validate_pool_name() with normalize_pool_name_for_stats() - Pool names with invalid characters are normalized (replaced with _) when emitting metrics - Logs warning when normalization occurs, suggesting pool rename - Removed validation from create_or_update_pool() - Updated scheduler_job_runner.py to use normalized names for stats - Removed validation tests - Updated news fragment This prevents InvalidStatsNameException without breaking existing pools with invalid names. Fixes apache#59935
@potiuk The static checks are auto-formatting whitespace/newlines but the core functionality is complete. The formatting will be applied automatically when merged |
No. The formatting should be applied by you when rebasing. You should use prek and install it locally to fix all static check issues |
ea8a80c to
b9bfe4c
Compare
Following @potiuk's feedback, changed approach from preventing invalid pool names to normalizing them for stats reporting. Changes: - Added normalize_pool_name_for_stats() function in pool.py - Pool names with invalid characters are normalized (replaced with _) when emitting metrics in scheduler_job_runner.py - Logs warning when normalization occurs - Removed validation tests - Removed accidental fix_quote.py file - Updated news fragment This prevents InvalidStatsNameException without breaking existing pools. Fixes apache#59935
|
@potiuk can you please have a look at my PR |
…pache#59938) * Add Validation for pool names to prevent InvalidStatsNameException Pool names can now only contain ASCII alphabets, numbers , underscores,dots,and dashes to ensure compatibility with stats naming requirments. fixes apache#59935 * Add unit Tests and news fragment for pool name validation * Fix test file with proper database session handling * Fix test assertions to use pytest.raises match parameter * Add newline at the end of pool.py file * Fix trailing whitespace and quote style in pool.py * Fix syntax error: use consistent double quotes in regex * Add missing newline and __init__.py file * Add missing newline and __init__.py with license * Fix __init__.py with proper license and single newline * Implement pool name normalization for stats reporting Following @potiuk's feedback, changed approach from preventing invalid pool names to normalizing them for stats reporting. Changes: - Replaced validate_pool_name() with normalize_pool_name_for_stats() - Pool names with invalid characters are normalized (replaced with _) when emitting metrics - Logs warning when normalization occurs, suggesting pool rename - Removed validation from create_or_update_pool() - Updated scheduler_job_runner.py to use normalized names for stats - Removed validation tests - Updated news fragment This prevents InvalidStatsNameException without breaking existing pools with invalid names. Fixes apache#59935 * Implement pool name normalization for stats reporting Following @potiuk's feedback, changed approach from preventing invalid pool names to normalizing them for stats reporting. Changes: - Added normalize_pool_name_for_stats() function in pool.py - Pool names with invalid characters are normalized (replaced with _) when emitting metrics in scheduler_job_runner.py - Logs warning when normalization occurs - Removed validation tests - Removed accidental fix_quote.py file - Updated news fragment This prevents InvalidStatsNameException without breaking existing pools. Fixes apache#59935 * Add missing 're' module import * Remove duplicate import * Apply CI formatting fixes - import order and blank lines * Fix formatting - add proper blank lines per ruff format requirements
These were added in apache#59938 and do nothing at all.
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| from __future__ import annotations |
There was a problem hiding this comment.
Hi @kaxil!
The tests were included in my original PR commits (1edd0d3, 6ff4d64, 0025d3a) but looks like they were removed in #60548 as part of a cleanup.
I see @jedcunningham has opened #60553 to add them back with optimizations. Happy to help review or add more test coverage if needed!
Let me know if you'd like me to contribute additional tests. 👍
These were added in #59938 and do nothing at all.
These were added in apache#59938 and do nothing at all.
…pache#59938) * Add Validation for pool names to prevent InvalidStatsNameException Pool names can now only contain ASCII alphabets, numbers , underscores,dots,and dashes to ensure compatibility with stats naming requirments. fixes apache#59935 * Add unit Tests and news fragment for pool name validation * Fix test file with proper database session handling * Fix test assertions to use pytest.raises match parameter * Add newline at the end of pool.py file * Fix trailing whitespace and quote style in pool.py * Fix syntax error: use consistent double quotes in regex * Add missing newline and __init__.py file * Add missing newline and __init__.py with license * Fix __init__.py with proper license and single newline * Implement pool name normalization for stats reporting Following @potiuk's feedback, changed approach from preventing invalid pool names to normalizing them for stats reporting. Changes: - Replaced validate_pool_name() with normalize_pool_name_for_stats() - Pool names with invalid characters are normalized (replaced with _) when emitting metrics - Logs warning when normalization occurs, suggesting pool rename - Removed validation from create_or_update_pool() - Updated scheduler_job_runner.py to use normalized names for stats - Removed validation tests - Updated news fragment This prevents InvalidStatsNameException without breaking existing pools with invalid names. Fixes apache#59935 * Implement pool name normalization for stats reporting Following @potiuk's feedback, changed approach from preventing invalid pool names to normalizing them for stats reporting. Changes: - Added normalize_pool_name_for_stats() function in pool.py - Pool names with invalid characters are normalized (replaced with _) when emitting metrics in scheduler_job_runner.py - Logs warning when normalization occurs - Removed validation tests - Removed accidental fix_quote.py file - Updated news fragment This prevents InvalidStatsNameException without breaking existing pools. Fixes apache#59935 * Add missing 're' module import * Remove duplicate import * Apply CI formatting fixes - import order and blank lines * Fix formatting - add proper blank lines per ruff format requirements
These were added in apache#59938 and do nothing at all.
These were added in apache#59938 and do nothing at all.
…pache#59938) * Add Validation for pool names to prevent InvalidStatsNameException Pool names can now only contain ASCII alphabets, numbers , underscores,dots,and dashes to ensure compatibility with stats naming requirments. fixes apache#59935 * Add unit Tests and news fragment for pool name validation * Fix test file with proper database session handling * Fix test assertions to use pytest.raises match parameter * Add newline at the end of pool.py file * Fix trailing whitespace and quote style in pool.py * Fix syntax error: use consistent double quotes in regex * Add missing newline and __init__.py file * Add missing newline and __init__.py with license * Fix __init__.py with proper license and single newline * Implement pool name normalization for stats reporting Following @potiuk's feedback, changed approach from preventing invalid pool names to normalizing them for stats reporting. Changes: - Replaced validate_pool_name() with normalize_pool_name_for_stats() - Pool names with invalid characters are normalized (replaced with _) when emitting metrics - Logs warning when normalization occurs, suggesting pool rename - Removed validation from create_or_update_pool() - Updated scheduler_job_runner.py to use normalized names for stats - Removed validation tests - Updated news fragment This prevents InvalidStatsNameException without breaking existing pools with invalid names. Fixes apache#59935 * Implement pool name normalization for stats reporting Following @potiuk's feedback, changed approach from preventing invalid pool names to normalizing them for stats reporting. Changes: - Added normalize_pool_name_for_stats() function in pool.py - Pool names with invalid characters are normalized (replaced with _) when emitting metrics in scheduler_job_runner.py - Logs warning when normalization occurs - Removed validation tests - Removed accidental fix_quote.py file - Updated news fragment This prevents InvalidStatsNameException without breaking existing pools. Fixes apache#59935 * Add missing 're' module import * Remove duplicate import * Apply CI formatting fixes - import order and blank lines * Fix formatting - add proper blank lines per ruff format requirements
These were added in apache#59938 and do nothing at all.
…pache#59938) * Add Validation for pool names to prevent InvalidStatsNameException Pool names can now only contain ASCII alphabets, numbers , underscores,dots,and dashes to ensure compatibility with stats naming requirments. fixes apache#59935 * Add unit Tests and news fragment for pool name validation * Fix test file with proper database session handling * Fix test assertions to use pytest.raises match parameter * Add newline at the end of pool.py file * Fix trailing whitespace and quote style in pool.py * Fix syntax error: use consistent double quotes in regex * Add missing newline and __init__.py file * Add missing newline and __init__.py with license * Fix __init__.py with proper license and single newline * Implement pool name normalization for stats reporting Following @potiuk's feedback, changed approach from preventing invalid pool names to normalizing them for stats reporting. Changes: - Replaced validate_pool_name() with normalize_pool_name_for_stats() - Pool names with invalid characters are normalized (replaced with _) when emitting metrics - Logs warning when normalization occurs, suggesting pool rename - Removed validation from create_or_update_pool() - Updated scheduler_job_runner.py to use normalized names for stats - Removed validation tests - Updated news fragment This prevents InvalidStatsNameException without breaking existing pools with invalid names. Fixes apache#59935 * Implement pool name normalization for stats reporting Following @potiuk's feedback, changed approach from preventing invalid pool names to normalizing them for stats reporting. Changes: - Added normalize_pool_name_for_stats() function in pool.py - Pool names with invalid characters are normalized (replaced with _) when emitting metrics in scheduler_job_runner.py - Logs warning when normalization occurs - Removed validation tests - Removed accidental fix_quote.py file - Updated news fragment This prevents InvalidStatsNameException without breaking existing pools. Fixes apache#59935 * Add missing 're' module import * Remove duplicate import * Apply CI formatting fixes - import order and blank lines * Fix formatting - add proper blank lines per ruff format requirements
These were added in apache#59938 and do nothing at all.
Fixes #59935
Problem
Pool names can currently contain any characters (spaces, emojis, etc.), which causes
InvalidStatsNameExceptionwhen reporting metrics because stats names must only contain ASCII alphabets, numbers, underscores, dots, and dashes.Example error:
InvalidStatsNameException: The stat name (pool.running_slots.pool name with whitespace and emoji😎) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.
Solution
Added validation to pool names at creation/update time in the
create_or_update_pool()method to ensure they only contain valid characters that are compatible with stats naming requirements.Changes
validate_pool_name()function inairflow/models/pool.pyPool.create_or_update_pool()methodTesting
This change prevents the issue at the source by not allowing invalid pool names to be created in the first place. Existing pools with invalid names will continue to work, but new pools or updates to existing pools will be validated.
Manual testing can be done by attempting to create a pool with an invalid name and verifying that a
ValueErroris raised with a clear error message.