Skip to content

[#33571] Minify the html output#3413

Closed
dgrammatiko wants to merge 2 commits intojoomla:stagingfrom
dgrammatiko:staging
Closed

[#33571] Minify the html output#3413
dgrammatiko wants to merge 2 commits intojoomla:stagingfrom
dgrammatiko:staging

Conversation

@dgrammatiko
Copy link
Copy Markdown
Contributor

When gzip is enabled and debug is disabled, we can uglify the output. This only removes newlines between " > < ". It's not very aggressive on whitespace as that requires handling of css and javascript comments. But it reduces by a fair amount the size of the page.
http://joomlacode.org/gf/project/joomla/tracker/?action=TrackerItemEdit&tracker_item_id=33571

@dgrammatiko dgrammatiko changed the title Minify the html output [#33571] Minify the html output Apr 7, 2014
@dgrammatiko
Copy link
Copy Markdown
Contributor Author

Just improved the regex to remove more extra whitespace safely. Please re-test!

When gzip is enabled and debug is disabled, we can uglify the output. This only removes tabs and newlines (easy on cpu)

Signed-off-by: dimitris <dimitris@dimitriss-MacBook-Pro.local>
@piotr-cz
Copy link
Copy Markdown
Contributor

I'm against this because:

  • white space has it's functionality, for example when using preformatted text in code examples
  • you should not use RegEx on HTML
  • I like to indent the output code (and I'm ASCII art junkie)

@brianteeman
Copy link
Copy Markdown
Contributor

I like to indent the output code (and I'm ASCII art junkie)

Its optional to turn this on - and is not enabled by default

@piotr-cz
Copy link
Copy Markdown
Contributor

@brianteeman by switching on output compression and debug off?

I'd be interested in the payload gain by comparing the output length before and after this PR. I'd say there's won't be much difference, as gzip should take care of compressing concurrent white space without affecting uncompressed output, but that's just a feeling.

@dgrammatiko
Copy link
Copy Markdown
Contributor Author

@piotr-cz payload and reduction in kb is relative to the DOM (the more elements the more time is needed and the bigger the savings)... In a relative small DOM the gain is around 10% and delay is not noticeable.
Also Safari, Chrome & Firefox have the ability to display source or DOM tree...
The first point is valid and this might be a bug 👍
The second point is funny

@piotr-cz
Copy link
Copy Markdown
Contributor

What I mean by the payload reduction of compressed output:

// Before
echo strlen(gzcompress('whitespace          here')); // 24
// After
echo strlen(gzcompress('whitespace here')); // 23

So this would make sense rather for uncompressed output

// Before
echo strlen('whitespace           here'); // 25
// After
echo strlen('whitespace here'); // 15

I wouldn't worry about rendering time of whitespaces, I'm sure rendering engines have taken care of this.

@dgrammatiko
Copy link
Copy Markdown
Contributor Author

@piotr-cz I was referring to server side payload, but also the browser side is valid :)

@dgrammatiko
Copy link
Copy Markdown
Contributor Author

@piotr-cz Also my answer on <pre> was wrong as i tested it, you can see it here:
https://www.dropbox.com/s/mczw6ua2y304bgn/Screenshot%202014-04-15%2017.15.00.png

@piotr-cz
Copy link
Copy Markdown
Contributor

@DGT41 You are right, this removes white space characters between tags and in tags when there's only white space.
As for the second point, I'd take it seriously.

Still I'm interested in comparing size of compressed payload. If it's in bits, I think its not worth it, at least because of additional CPU load.

@dgrammatiko
Copy link
Copy Markdown
Contributor Author

On one random page before removing whitespace =27.59kb after 25.86 (roughly 7~10%)
before: https://www.dropbox.com/s/z327fu54kspf4us/Screenshot%202014-04-15%2017.47.18.png
after: https://www.dropbox.com/s/xtt4seyxd092qmo/Screenshot%202014-04-15%2017.46.37.png
On my local env the cpu increase is unnoticeable, and the extra time is something that I cannot figure out how to measure...
@piotr-cz You are welcome to conduct some testing (i am also curious... )

@piotr-cz
Copy link
Copy Markdown
Contributor

Thanks, can you compare differences in encoded content, as this functionality won't trigger unless it's enabled.

@dgrammatiko
Copy link
Copy Markdown
Contributor Author

@piotr-cz For the images I removed in the first case the whole block of code just to make sure it won't fire. So yes both images are with gzip enabled but in the first image the minification code is not there

@piotr-cz
Copy link
Copy Markdown
Contributor

My results are:
3556 -> 3385 = 5.17%
4825 -> 4595 = 5.00%
4906 -> 4665 = 5.17%

I must admit it's more than I expected, but still I'm not sure if ~300byte savings is worth it. I guess it comes to whether template developer prefers spaces over tabs (me).

Let's see what others think.

@asika32764
Copy link
Copy Markdown
Contributor

I agree with @piotr-cz. Tab is important for some situation when we need to read source html.

And I think the white spaces not affect the page size much since we have gzip.

@dgrammatiko
Copy link
Copy Markdown
Contributor Author

@asika32764 You still can read the source html in any modern browser, just do inspect element (on something) instead of view page source. You get the DOM as a tree, human readable.
Also something, at least, over 5% is definitely something...

@brianteeman
Copy link
Copy Markdown
Contributor

Or you just dont enable it

@piotr-cz
Copy link
Copy Markdown
Contributor

@brianteeman There's no option to switch uglifying html output on/ off in this pull request.

@asika32764
Copy link
Copy Markdown
Contributor

Yes, most of time we use dev tools to see DOM tree, but DOM is different from source HTML, something we be changed by browser and browser's plugins. It's dynamic, but source HTML is real response from server.

I'm not against HTML minify, but maybe we should have a new option to turn it on/off. Or just let 3rd plugins do it.

@brianteeman
Copy link
Copy Markdown
Contributor

yes there is you just dont enable gzip (which currently doesnt work for
some unknown reason anyway if you want to post a link to facebook)

On 15 April 2014 18:40, Piotr notifications@github.com wrote:

@brianteeman https://github.com/brianteeman There's no option to switch
uglifying html output on/ off in this pull request.

Reply to this email directly or view it on GitHubhttps://github.com//pull/3413#issuecomment-40511244
.

Brian Teeman
Co-founder Joomla! and OpenSourceMatters Inc.
http://brian.teeman.net/

@asika32764
Copy link
Copy Markdown
Contributor

I'm not sure but it seems \s will remove \n from HTML, maybe some js script line without ; will broken.

@dgrammatiko
Copy link
Copy Markdown
Contributor Author

@asika32764 It WON'T affect the code inside <script> or <style> tags or to be more accurate it won't do anything if there is any character other than space tab or return, give it a try

@piotr-cz
Copy link
Copy Markdown
Contributor

@asika32764 It should not affect javascript in the way you think it will: it's removing series of white space characters (spaces and tabs) between > and < characters: $data = preg_replace('~>\s+<~', '><', $data);.

There is no /s modifier, so preg_replace won't traverse newlines either.

I can't think of any scenario out of my head now where it could really break code execution.

@asika32764
Copy link
Copy Markdown
Contributor

@DGT41 Sorry, I see your code avoid the string in tag.

So I think we should discuss about adding an option to switch minify or not.

For me, I will need gzip sometimes but I also need HTML indent and tab. This option can help me keep source HTML readable in gzip.

Something more accurate
@dgrammatiko
Copy link
Copy Markdown
Contributor Author

I like to follow the motto "less is more" and since the observation of page source is either on development (debug is usually enabled) or in special occurrences (just disable gzip) I find another switch is overkill. But if you feel that this is essential, I can add one under Server (near gzip). Also mind that the switch will also add another line (one more variable) in configuration.php...

@nternetinspired
Copy link
Copy Markdown
Contributor

Minifying html can affect the rendered layout, unlike minimising css of js for example, so I'd be wary of implementing such a feature. I'd also have concerns about adding yet another process that will increase the server load and therefore browser response times.

Personally, I think the risks outweigh the benefits here. If someone wanted to uglify the html in their own template overrides, that's easy enough to do and contains such changes exactly where they should be, the template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants