Limit searching of `charset` attribute in `meta` tag for HTML to first 1024 characters in webcmdlets
PR Summary
In the case of HTML content, the cmdlet searches for <meta charset=...> in the body to find the right encoding to use. It uses a regex to do this, however, it applies it to the entire body (which can be quite large). The <meta> tag exists in the <head> tag which is expected at the top of the document under <html>. So the change here is to limit the search to just the first 1k characters. It is possible (since HTML is pretty lenient) to have a bunch of HTML comment tags that pushes the <meta> tag much lower, but seems unlikely. The encoding defaults to UTF-8 which is used by most websites anyways.
Tested manually against the repro in the issue which returns immediately after the payload is downloaded.
PR Context
Fix https://github.com/PowerShell/PowerShell/issues/17762
PR Checklist
- [x] PR has a meaningful title
- Use the present tense and imperative mood when describing your changes
- [x] Summarized changes
- [x] Make sure all
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright header - [x] This PR is ready to merge and is not Work in Progress.
- If the PR is work in progress, please add the prefix
WIP:or[ WIP ]to the beginning of the title (theWIPbot will keep its status check atPendingwhile the prefix is present) and remove the prefix when the PR is ready.
- If the PR is work in progress, please add the prefix
-
Breaking changes
- [x] None
- OR
- [ ] Experimental feature(s) needed
- [ ] Experimental feature name(s):
-
User-facing changes
- [x] Not Applicable
- OR
- [ ] Documentation needed
- [ ] Issue filed:
-
Testing - New and feature
- [x] N/A or can only be tested interactively
- OR
- [ ] Make sure you've added a new test if existing tests do not effectively test the code changed
-
Tooling
- [x] I have considered the user experience from a tooling perspective and don't believe tooling will be impacted.
- OR
- [ ] I have considered the user experience from a tooling perspective and opened an issue in the relevant tool repository. This may include:
- [ ] Impact on PowerShell Editor Services which is used in the PowerShell extension for VSCode
(which runs in a different PS Host).
- [ ] Issue filed:
- [ ] Impact on Completions (both in the console and in editors) - one of PowerShell's most powerful features.
- [ ] Issue filed:
- [ ] Impact on PSScriptAnalyzer (which provides linting & formatting in the editor extensions).
- [ ] Issue filed:
- [ ] Impact on EditorSyntax (which provides syntax highlighting with in VSCode, GitHub, and many other editors).
- [ ] Issue filed:
- [ ] Impact on PowerShell Editor Services which is used in the PowerShell extension for VSCode
(which runs in a different PS Host).
This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author
This PR has 2 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!
Quantification details
Label : Extra Small
Size : +1 -1
Percentile : 0.8%
Total files changed: 1
Change summary by file extension:
.cs : +1 -1
Change counts above are quantified counts, based on the PullRequestQuantifier customizations.
Why proper sizing of changes matters
Optimal pull request sizes drive a better predictable PR flow as they strike a balance between between PR complexity and PR review overhead. PRs within the optimal size (typical small, or medium sized PRs) mean:
- Fast and predictable releases to production:
- Optimal size changes are more likely to be reviewed faster with fewer iterations.
- Similarity in low PR complexity drives similar review times.
- Review quality is likely higher as complexity is lower:
- Bugs are more likely to be detected.
- Code inconsistencies are more likely to be detected.
- Knowledge sharing is improved within the participants:
- Small portions can be assimilated better.
- Better engineering practices are exercised:
- Solving big problems by dividing them in well contained, smaller problems.
- Exercising separation of concerns within the code changes.
What can I do to optimize my changes
- Use the PullRequestQuantifier to quantify your PR accurately
- Create a context profile for your repo using the context generator
- Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the
Excludedsection from yourprquantifier.yamlcontext profile. - Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your
prquantifier.yamlcontext profile. - Only use the labels that matter to you, see context specification to customize your
prquantifier.yamlcontext profile.
- Change your engineering behaviors
- For PRs that fall outside of the desired spectrum, review the details and check if:
- Your PR could be split in smaller, self-contained PRs instead
- Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).
- For PRs that fall outside of the desired spectrum, review the details and check if:
How to interpret the change counts in git diff output
- One line was added:
+1 -0 - One line was deleted:
+0 -1 - One line was modified:
+1 -1(git diff doesn't know about modified, it will interpret that line like one addition plus one deletion) - Change percentiles: Change characteristics (addition, deletion, modification) of this PR in relation to all other PRs within the repository.
Was this comment helpful? :thumbsup: :ok_hand: :thumbsdown: (Email) Customize PullRequestQuantifier for this repository.
:tada:v7.4.0-preview.1 has been released which incorporates this pull request.:tada:
Handy links: