Conversation
Art4
left a comment
There was a problem hiding this comment.
Internally $status_code is always considered as int. Changing it to int|null we would need more work to be able to react internally to this new case.
What is the benefit to the user of being able to explicitly use $simplepie->status_code() === null instead of $simplepie->status_code() === 0?
library/SimplePie.php
Outdated
| * @access private | ||
| */ | ||
| public $status_code = 0; | ||
| public $status_code = null; |
There was a problem hiding this comment.
Access to this property is documented as private, so calling $simplepie->status_code directly is not allowed (but not enforce because of PHP 5.6 compatibility). One has to use $simplepie->status_code() instead, so this changes has to be reflected there too.
There was a problem hiding this comment.
Also allowing $status_code to be null has side effects on the error message in
simplepie/library/SimplePie.php
Line 1765 in b59060e
the status code is "$copyStatusCode" with integer would be cast to the status code is "0", but if we allow also null this message would be cast to the status code is "". I don't find that helpful.
There was a problem hiding this comment.
Thanks, fixed. There were actually few more corner cases:
curl_getinfo($fp, CURLINFO_HTTP_CODE)can return0, but only when the request fails with an error such asCURLE_COULDNT_RESOLVE_HOST, in which case the constructor will fail. I added code that will turn this case intonull.- If it succeeds the status code will be taken from HTTP parser, which can actually return
0when web server decides to respond with it for some reason. I added a code that will fail for0status code (though other invalid HTTP response codes may still appear successful). - Successful local requests will have
$status_code === null.
b59060e to
2f0b8cf
Compare
The benefit is mainly being able to detect the unhandled case using static analysis more easily (the Resolved conflict using git format-patch -1
git reset --hard HEAD^
sed -i 's#library/SimplePie/#src/#g;s#library/#src/#g' *.patch
git rebase master
git am -3 *.patch |
|
I don't mind which way we go here, but would like to make sure the requirement fixed by @Alkarex is still met. |
2f0b8cf to
56f7305
Compare
This is to distinguish a status_code being set on the type level, making it harder to forget the case – static analysis would notify you. To make it more consistent, we also change the few cases that could make `status_code` contain a `0` to replace it with `null`: - `CURLINFO_HTTP_CODE` can contain `0`, but only when the request fails with an error such as `CURLE_COULDNT_RESOLVE_HOST`, in which case the File constructor will fail. - HTTP parser, which could return `0` when a web server decides to return it as a status code for some reason. I added a code that will fail for `0` status code (though other invalid HTTP response codes may still appear successful). - Successful local requests were already using `null` correctly. While at it, simplify few conditionals using de Morgan’s laws for easier reasoning. This is technically a BC break but so is simplepie#728 and `null` behaves just like `0` from the perspective of non-strict comparison so I would not expect issues in legacy applications.
56f7305 to
a9a27ab
Compare
That PR would indeed be a breaking change, and would require changing the contracts in the existing software using SimplePie. That was the reason for my fix, prior to which we had some warnings in my downstream use-case FreshRSS/FreshRSS#4299 (the status code is passed to a function expecting an integer to log the HTTP error). See also my response #728 (comment) I believe the most classic way to work with a status code is to test it against some ranges with a mathematical integer operation: for instance to check whether it is in the 2xx range. This might even be delegated to a function expecting an integer as input. A status code might also be saved to an (integer) variable for later processing or reference. 0 is a perfectly fine value that is commonly used as the default integer value and does not cause problem, to the contrary of any other non-integer type. 0 is actually explicitly documented in some specifications, e.g. the JavaScript Fetch API
In strongly typed languages such as Java or C#, 0 is also the default value of enums used for status codes. I am not aware of any other implementation, in whichever language, that is not sticking to integer (or enums implemented as integer) for an HTTP status code. P.S. cURL also returns zero for the HTTP status code when no response is received. |
|
Yes I agree @Alkarex, my suggestion was to propose a change that addressed both issues. I'm not sure that's possible? |
That is precisely the issue. In proper type-driven design, we want the invalid value to cause an issue as soon as possible to be able to detect it. Ideally, even making the illegal states unrepresentable.
C-family languages tend to not be fully strongly typed. Though, to be precise, this is more of a matter of structural vs. nominal typing. While C-family languages mostly use nominal typing, enums may be one exception – when they are integers internally but the type system does keep track which enum they belong to, they become fungible and you lose type safety. Actually, looking at Java, its type system does seem to keep this information – the values are not even comparable. C# appears to be in the same boat. Dart’s type system appears to be slightly weaker, as it allows the latter. And when PHP introduced proper enums, they also thankfully decided to not make enums follow C – even the backed enums are object instances, not
Java does indeed use int but the API is ancient so we cannot really expect best practices to be used. C#, on the other hand, uses enums for HTTP status code. Languages that have well-thought out type systems, like ML-style languages do, also not suffer from this – their enums (really tagged unions) tend to not expose their tags. Though, looking at Rust, the
Sure, but that is just a hack because the language’s weak and feature-poor type system offers little in the way of alternatives. The HTTP Semantics RFC 7231 explicitly says that “the status-code element is a three-digit integer code” so 0 is definitely not allowed.
I am aware. That is also a limitation of C, and I am turning it into |
For me, 0 is a valid value, which I do not want to cause any issue, and I am not more interested in detecting it than many of the other error codes for instance in the 4xx or 5xx range. It just so happens to indicate an error on the client side as opposed to the better documented errors at server or proxy levels.
A non-nullable enum defaulting to zero, as far as I can see.
JavaScript has both 0 and null and undefined, so undefined or null could have been alternatives. My point is that can I cannot find any other implementation that is not sticking to a non-nullable integer for an HTTP status code. |
Then my point about C-family languages not really being type safe holds. Yuck 🤮
Right, I think
Yeah, turns out existing languages tend walk past type-driven design (be it because their APIs were designed before that design principle entered public consciousness, their type systems are insufficiently expressive, they trade correctness for “simplicity”, or they just do not care). Out of the languages I checked (those mentioned above, plus Dart, which uses At this point, I spent more time on trying to clean up the Augean stables of SimplePie than I wanted to, and realized that the whole thing will need more principled approach. For example, we should replace the HTTP code with PSR-18 or something. Furthermore, there should be no need for SimplePie to expose internal details like HTTP status code. Given all that, I have decided to table this for now. |
Refs #520 |
This is to distinguish a status_code being set on the type level, making it harder to forget the case – static analysis would notify you.
This is technically a BC break but so is #728 and
nullbehaves just like0from the perspective of non-strict comparison so I would not expect issues in legacy applications.