Skip to content

Simplify TokenCollection#25

Merged
theseer merged 3 commits intotheseer:masterfrom
staabm:coll
Nov 6, 2025
Merged

Simplify TokenCollection#25
theseer merged 3 commits intotheseer:masterfrom
staabm:coll

Conversation

@staabm
Copy link
Contributor

@staabm staabm commented Nov 5, 2025

processing phpunit code coverage data ...

before this PR:

➜  slow-coverage-xml1 git:(main) ✗ hyperfine 'php coverage-xml.php'
Benchmark 1: php coverage-xml.php
  Time (mean ± σ):      9.792 s ±  0.048 s    [User: 8.407 s, System: 1.335 s]
  Range (min … max):    9.723 s …  9.849 s    10 runs

after this PR:

➜  slow-coverage-xml1 git:(main) ✗ hyperfine 'php coverage-xml.php'
Benchmark 1: php coverage-xml.php
  Time (mean ± σ):      9.565 s ±  0.066 s    [User: 8.187 s, System: 1.331 s]
  Range (min … max):    9.437 s …  9.655 s    10 runs

idea is to use a php-src builtin ArrayIterator instead of implementing the Iterator interface manually, for which the engine needs to constantly switch between calling native php-src code and userland functions


before this PR:

grafik

<?php declare(strict_types = 1);
namespace TheSeer\Tokenizer;

class TokenCollection implements \ArrayAccess, \Iterator, \Countable {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a BC break, so might only be accetable with a new major version

@theseer
Copy link
Owner

theseer commented Nov 6, 2025

I see the performance increase, but I feel pains with that change ;)

I'm not happy with having to have had to implement the Iterator myself but I hate (with a passion! ;-) ) the loss of type information when using ArrayIterator. The whole point of using a collection is to have clear types for in and out. With your change - again, I do understand it and the reasoning - we only have 50% of the type information.

Is there a work around for that?

@theseer
Copy link
Owner

theseer commented Nov 6, 2025

Side nodes: Your getIterator method does not have a return type and the formatting is wrong according to my code style.. :-p

@staabm
Copy link
Contributor Author

staabm commented Nov 6, 2025

Is there a work around for that?

sure, I have added necessary static analysis type annotations

@staabm
Copy link
Contributor Author

staabm commented Nov 6, 2025

Side nodes: Your getIterator method does not have a return type and the formatting is wrong according to my code style

fixed, thanks.

@staabm
Copy link
Contributor Author

staabm commented Nov 6, 2025

if you are fine with it, it would be great we could merge #24 before this one here (as we will get conflicts and this one here will be easier to resolve)

@staabm
Copy link
Contributor Author

staabm commented Nov 6, 2025

I not yet verified whether this changes break PHPUnit usescases.

I think we should add a compatibility check with PHPUnit versions in CI

@theseer
Copy link
Owner

theseer commented Nov 6, 2025

Is there a work around for that?

sure, I have added necessary static analysis type annotations

Thanks!

It still feels wrong to have a generic return type that basically has zero information on what is the Iterator going to iterate over on the language level. This is not what I understand under strictly typed ;)

But I guess with the annotation it is good enough to be considered a pragmatic solution.

@theseer
Copy link
Owner

theseer commented Nov 6, 2025

I'm pondering about the change in the XMLSerializer now: Your change, while I did not verify that, seem to be making it less efficient by performing a null check on every loop while my original implementation did not require this.

If we'd keep the ArrayAccess on the collection, this change would not be required. It would also remove the need to make $previousToken nullable.

What do you think?

@theseer
Copy link
Owner

theseer commented Nov 6, 2025

I'm pondering about the change in the XMLSerializer now: Your change, while I did not verify that, seem to be making it less efficient by performing a null check on every loop while my original implementation did not require this.

If we'd keep the ArrayAccess on the collection, this change would not be required. It would also remove the need to make $previousToken nullable.

What do you think?

We could actually also just use $iterator->current(), couldn't we? Instead of doing this in the loop...

@theseer
Copy link
Owner

theseer commented Nov 6, 2025

I'll merge this with my suggested change.

@theseer theseer merged commit a5108c1 into theseer:master Nov 6, 2025
10 checks passed
@staabm staabm deleted the coll branch November 6, 2025 21:11
@theseer
Copy link
Owner

theseer commented Nov 6, 2025

There are quite some side effects that I'm cleaning up now:

  • There are a lot of failing tests that used ArrayAccess. So this is certainly a BC break.
  • Removing ArrayAccess removed a full set of features, even though I'm not sure being able to modify the collection by offest was actually useful ;)
  • After the change, the CollectionException is superfluous.

@theseer
Copy link
Owner

theseer commented Nov 6, 2025

Also all my Tokenizer tests fail as I used a serialized collection to ensure it's not broken.

@theseer
Copy link
Owner

theseer commented Nov 6, 2025

Done.

@staabm
Copy link
Contributor Author

staabm commented Nov 6, 2025

Thanks for cleaning it up. Why did CI not fail on said tests?

@theseer
Copy link
Owner

theseer commented Nov 6, 2025

https://github.com/theseer/tokenizer/actions/runs/19149696040/job/54736453289

The tests fail - but github doesn't notice that as a problem. Interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants