Skip to content

Make status_code nullable#732

Closed
jtojnar wants to merge 1 commit intosimplepie:masterfrom
jtojnar:status-code-nullable
Closed

Make status_code nullable#732
jtojnar wants to merge 1 commit intosimplepie:masterfrom
jtojnar:status-code-nullable

Conversation

@jtojnar
Copy link
Member

@jtojnar jtojnar commented May 18, 2022

This is to distinguish a status_code being set on the type level, making it harder to forget the case – static analysis would notify you.

This is technically a BC break but so is #728 and null behaves just like 0 from the perspective of non-strict comparison so I would not expect issues in legacy applications.

Copy link
Contributor

@Art4 Art4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally $status_code is always considered as int. Changing it to int|null we would need more work to be able to react internally to this new case.

What is the benefit to the user of being able to explicitly use $simplepie->status_code() === null instead of $simplepie->status_code() === 0?

* @access private
*/
public $status_code = 0;
public $status_code = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Access to this property is documented as private, so calling $simplepie->status_code directly is not allowed (but not enforce because of PHP 5.6 compatibility). One has to use $simplepie->status_code() instead, so this changes has to be reflected there too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also allowing $status_code to be null has side effects on the error message in

$this->error = "A feed could not be found at `$this->feed_url`; the status code is `$copyStatusCode` and content-type is `$copyContentType`";

the status code is "$copyStatusCode" with integer would be cast to the status code is "0", but if we allow also null this message would be cast to the status code is "". I don't find that helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed. There were actually few more corner cases:

  • curl_getinfo($fp, CURLINFO_HTTP_CODE) can return 0, but only when the request fails with an error such as CURLE_COULDNT_RESOLVE_HOST, in which case the constructor will fail. I added code that will turn this case into null.
  • If it succeeds the status code will be taken from HTTP parser, which can actually return 0 when web server decides to respond with it for some reason. I added a code that will fail for 0 status code (though other invalid HTTP response codes may still appear successful).
  • Successful local requests will have $status_code === null.

@jtojnar jtojnar force-pushed the status-code-nullable branch from b59060e to 2f0b8cf Compare May 18, 2022 08:30
@jtojnar
Copy link
Member Author

jtojnar commented May 18, 2022

What is the benefit to the user of being able to explicitly use $simplepie->status_code() === null instead of $simplepie->status_code() === 0?

The benefit is mainly being able to detect the unhandled case using static analysis more easily (the nullability is trivially expressible in the type system, compared to a non-zero pre-condition).


Resolved conflict using

git format-patch -1
git reset --hard HEAD^
sed -i 's#library/SimplePie/#src/#g;s#library/#src/#g' *.patch
git rebase master
git am -3 *.patch

@mblaney
Copy link
Member

mblaney commented May 19, 2022

I don't mind which way we go here, but would like to make sure the requirement fixed by @Alkarex is still met.

@jtojnar jtojnar force-pushed the status-code-nullable branch from 2f0b8cf to 56f7305 Compare May 19, 2022 06:52
@pull-request-size pull-request-size bot added size/M and removed size/S labels May 19, 2022
This is to distinguish a status_code being set on the type level,
making it harder to forget the case – static analysis would notify you.

To make it more consistent, we also change the few cases that could make
`status_code` contain a `0` to replace it with `null`:

 - `CURLINFO_HTTP_CODE` can contain `0`, but only when the request fails
   with an error such as `CURLE_COULDNT_RESOLVE_HOST`, in which case
   the File constructor will fail.
 - HTTP parser, which could return `0` when a web server decides to
   return it as a status code for some reason.
   I added a code that will fail for `0` status code
   (though other invalid HTTP response codes may still appear successful).
 - Successful local requests were already using `null` correctly.

While at it, simplify few conditionals using de Morgan’s laws
for easier reasoning.

This is technically a BC break but so is simplepie#728
and `null` behaves just like `0` from the perspective of non-strict comparison
so I would not expect issues in legacy applications.
@jtojnar jtojnar force-pushed the status-code-nullable branch from 56f7305 to a9a27ab Compare May 19, 2022 06:53
@Alkarex
Copy link
Contributor

Alkarex commented May 19, 2022

I don't mind which way we go here, but would like to make sure the requirement fixed by @Alkarex is still met.

That PR would indeed be a breaking change, and would require changing the contracts in the existing software using SimplePie. That was the reason for my fix, prior to which we had some warnings in my downstream use-case FreshRSS/FreshRSS#4299 (the status code is passed to a function expecting an integer to log the HTTP error).

See also my response #728 (comment)

I believe the most classic way to work with a status code is to test it against some ranges with a mathematical integer operation: for instance to check whether it is in the 2xx range. This might even be delegated to a function expecting an integer as input. A status code might also be saved to an (integer) variable for later processing or reference.

0 is a perfectly fine value that is commonly used as the default integer value and does not cause problem, to the contrary of any other non-integer type.

0 is actually explicitly documented in some specifications, e.g. the JavaScript Fetch API

A status is an integer in the range 0 to 999, inclusive.

An opaque filtered response is is a filtered response whose type is "opaque", URL list is the empty list, status is 0, status message is the empty byte sequence, header list is empty, and body is null.

In strongly typed languages such as Java or C#, 0 is also the default value of enums used for status codes.

I am not aware of any other implementation, in whichever language, that is not sticking to integer (or enums implemented as integer) for an HTTP status code.

P.S. cURL also returns zero for the HTTP status code when no response is received.

@mblaney
Copy link
Member

mblaney commented May 19, 2022

Yes I agree @Alkarex, my suggestion was to propose a change that addressed both issues. I'm not sure that's possible?

@jtojnar
Copy link
Member Author

jtojnar commented May 19, 2022

0 is a perfectly fine value that is commonly used as the default integer value and does not cause problem, to the contrary of any other non-integer type.

That is precisely the issue. In proper type-driven design, we want the invalid value to cause an issue as soon as possible to be able to detect it. Ideally, even making the illegal states unrepresentable.

In strongly typed languages such as Java or C#, 0 is also the default value of enums used for status codes.

C-family languages tend to not be fully strongly typed. Though, to be precise, this is more of a matter of structural vs. nominal typing. While C-family languages mostly use nominal typing, enums may be one exception – when they are integers internally but the type system does keep track which enum they belong to, they become fungible and you lose type safety.

Actually, looking at Java, its type system does seem to keep this information – the values are not even comparable. C# appears to be in the same boat. Dart’s type system appears to be slightly weaker, as it allows the latter.

And when PHP introduced proper enums, they also thankfully decided to not make enums follow C – even the backed enums are object instances, not ints.

I am not aware of any other implementation, in whichever language, that is not sticking to integer (or enums implemented as integer) for an HTTP status code.

Java does indeed use int but the API is ancient so we cannot really expect best practices to be used. C#, on the other hand, uses enums for HTTP status code.

Languages that have well-thought out type systems, like ML-style languages do, also not suffer from this – their enums (really tagged unions) tend to not expose their tags.

Though, looking at Rust, the http crate actually uses a struct (≈ class), probably to make it extensible. And Haskell’s HTTP library uses a tuple of three ints – so perhaps not the best examples.

0 is actually explicitly documented in some specifications, e.g. the JavaScript Fetch API

Sure, but that is just a hack because the language’s weak and feature-poor type system offers little in the way of alternatives.

The HTTP Semantics RFC 7231 explicitly says that “the status-code element is a three-digit integer code” so 0 is definitely not allowed.

P.S. cURL also returns zero for the HTTP status code when no response is received.

I am aware. That is also a limitation of C, and I am turning it into null in this PR.

@Alkarex
Copy link
Contributor

Alkarex commented May 19, 2022

Yes I agree @Alkarex, my suggestion was to propose a change that addressed both issues. I'm not sure that's possible?

@mblaney #728 uses 0 for default/undefined status code, and this PR uses null, so it is a choice

@Alkarex
Copy link
Contributor

Alkarex commented May 19, 2022

0 is a perfectly fine value that is commonly used as the default integer value and does not cause problem, to the contrary of any other non-integer type.

That is precisely the issue. In proper type-driven design, we want the invalid value to cause an issue as soon as possible to be able to detect it. Ideally, even making the illegal states unrepresentable.

For me, 0 is a valid value, which I do not want to cause any issue, and I am not more interested in detecting it than many of the other error codes for instance in the 4xx or 5xx range. It just so happens to indicate an error on the client side as opposed to the better documented errors at server or proxy levels.

C#, on the other hand, uses enums for HTTP status code.

A non-nullable enum defaulting to zero, as far as I can see.

0 is actually explicitly documented in some specifications, e.g. the JavaScript Fetch API

Sure, but that is just a hack because the language’s weak and feature-poor type system offers little in the way of alternatives.

JavaScript has both 0 and null and undefined, so undefined or null could have been alternatives.

My point is that can I cannot find any other implementation that is not sticking to a non-nullable integer for an HTTP status code.

@jtojnar
Copy link
Member Author

jtojnar commented May 19, 2022

C#, on the other hand, uses enums for HTTP status code.

A non-nullable enum defaulting to zero, as far as I can see.

Then my point about C-family languages not really being type safe holds. Yuck 🤮

Sure, but that is just a hack because the language’s weak and feature-poor type system offers little in the way of alternatives.

JavaScript has both 0 and null and undefined, so undefined or null could have been alternatives.

Right, I think null or undefined would make more sense for no response (and thus no status code) being returned.

My point is that can I cannot find any other implementation that is not sticking to a non-nullable integer for an HTTP status code.

Yeah, turns out existing languages tend walk past type-driven design (be it because their APIs were designed before that design principle entered public consciousness, their type systems are insufficiently expressive, they trade correctness for “simplicity”, or they just do not care). Out of the languages I checked (those mentioned above, plus Dart, which uses int like JS fetch and Java), only Rust does disallow construction of invalid status codes at the type level. I had some hopes for C# but as you demonstrated, it’s no good at typing.


At this point, I spent more time on trying to clean up the Augean stables of SimplePie than I wanted to, and realized that the whole thing will need more principled approach. For example, we should replace the HTTP code with PSR-18 or something. Furthermore, there should be no need for SimplePie to expose internal details like HTTP status code.

Given all that, I have decided to table this for now.

@jtojnar jtojnar closed this May 19, 2022
@Art4
Copy link
Contributor

Art4 commented May 20, 2022

For example, we should replace the HTTP code with PSR-18 or something.

Refs #520

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants