Add an RFC describing SV boolean semantics by leonerd · Pull Request #3 · Perl/PPCs

leonerd · 2021-08-06T18:18:21Z

While this is still in progress its ID number remains "TODO"; I will set it to the next free number just before merge.

iabyn · 2021-08-07T10:46:51Z

I think this is terrible idea! We seem to be suddenly obsessed with ignoring perl's basic operating principle (that data is fluid and typeless - type interpretation being imposed by ops operating on the data) for the sake of the one use case of round trips for JSON and similar serialisers .

Such a boolean flag adds extra overhead to lots of hot places in the perl core - for example pp_add(), which currently has to handle all permutations of two args being (int/num/string/other), would have to cope with (bool/int/num/string/other) x 2

leonerd · 2021-08-07T14:41:05Z

I think this is terrible idea! We seem to be suddenly obsessed with ignoring perl's basic operating principle (that data is fluid and typeless - type interpretation being imposed by ops operating on the data) for the sake of the one use case of round trips for JSON and similar serialisers .

This doesn't change the semantics of any existing operations: all values ("boolean-intent" or not) can still be tested for truth, can still be numified or stringified. Nothing at all changes there.

The only thing that is new is that now there is an additional question that can be asked ("is this value boolean-intent?"), a question which answers "yes" to the result of any boolean predicate test operator, or any SV transitively initialised from such.

Such a boolean flag adds extra overhead to lots of hot places in the perl core - for example pp_add(), which currently has to handle all permutations of two args being (int/num/string/other), would have to cope with (bool/int/num/string/other) x 2

Indeed - it turned out for that and other reasons I didn't implement it in a flag. Instead, in my branch I have a small optimisation in sv_setsv_flags which notices the special PV values stored by PL_sv_yes (the char array PL_Yes) and PL_sv_no (the char array PL_No) and copies those pointers, with SvLEN set to zero, into the destination. This has two upshots:

We save memory and CPU time by not needing to make string copies of boolean values: e.g.

my @arr;
push @arr, 1 == 1 for 1 .. 100000;

will now consume less memory and take less CPU time to run.

A pointer comparison on SvPVX(sv) is now sufficient to implement the SvISBOOL() test, which I have added as a function in Scalar::Util.

As far as I can tell, this optimisation is already beneficial in terms of CPU and memory savings, even if we ignore the new ability we gain by having SvISBOOL()

You can see my current progress at

https://github.com/leonerd/perl5/tree/stable-bool

leonerd · 2021-08-07T14:56:04Z

On a broader note:

I think this is terrible idea! We seem to be suddenly obsessed

I don't know about "suddenly" there. It's been the case a long time that Perl users have cared about things like boolean roundtrips in serialisers. The entire JSON module ecosystem (consisting of multiple modules, by multiple authors) invented a special-case hack for this problem - basically see the entire context and history around https://metacpan.org/pod/JSON::PP::Boolean

This suggested change is part of a broader set of changes (e.g. Nicholas Clark's PV vs IV flags adjustments) to add what I am desperately trying not to call a "type system" to Perl. "Type System" usually brings to mind that the computer (either the compiler or the interpreter) will forbid certain operations that it doesn't think match up. That isn't what we're doing here. What we're doing is adding more what I am trying to call "intention tracking" - the idea that given a scalar value, can we know what the programmer intended it to mean? Sure any value can be tested for boolean truth, for numerical value, or stringy value, but what did the programmer really have in mind as "the" canonical shape of this data.

Looking at the wider picture, comparing a few other languages, Perl is starting to look somewhat lacking here:

$ python3 -c 'import json; print(json.dumps([1, "1", 1 == 1]))'
[1, "1", true]

$ nodejs -e 'console.log(JSON.stringify([1, "1", 1 == 1]))'
[1,"1",true]

$ perl -MJSON::MaybeUTF8=encode_json_utf8 -E 'say encode_json_utf8([1, "1", 1 == 1])'
[1,"1",1]

I'm not saying this one change alone is sufficient, but this is one small facet of a much larger issue. Without fixing the larger issue, it becomes harder for Perl to stand alongside these other languages in modern commercial settings. Interoperability of data is an important concern these days - gone is the time of little standalone awk-like scripts running on the local developer's machine.

There is no getting around it - we need abilities like this for Perl to remain relevant to the modern world.

kraih · 2021-08-07T15:14:14Z

On a broader note:

I think this is terrible idea! We seem to be suddenly obsessed

I don't know about "suddenly" there. It's been the case a long time that Perl users have cared about things like boolean roundtrips in serialisers.

There's really nothing sudden about it. We have more cases where the community has been begging for a little "stricter" types for many many years. The old scalar value problem of characters vs bytes comes to mind, when dealing with I/O. There's a reason utf8::is_utf8 is misused so much in CPAN modules. A tiny bit of additional type information goes a long way towards improving the user experience.

marcusramberg · 2021-08-09T15:05:05Z

Given this doesn't have a negative performance impact, it seems like a great improvement for allowing Perl to inter-op more easily with modern devops stuff, which almost always involves json or yaml wrangling.

iabyn · 2021-08-09T18:36:40Z

On Sat, Aug 07, 2021 at 07:41:16AM -0700, Paul Evans wrote: As far as I can tell, this optimisation is already beneficial in terms of CPU and memory savings, even if we ignore the new ability we gain by having `SvISBOOL()` You can see my current progress at https://github.com/leonerd/perl5/tree/stable-bool

The trouble with this is that it is making a shared read-only string buffer effectively writeable. This SEGVs on your branch: my $x = 1; my $y = ($x == 1); $y =~ s/1/0/;

…

-- In England there is a special word which means the last sunshine of the summer. That word is "spring".

leonerd · 2021-08-09T21:19:35Z

The trouble with this is that it is making a shared read-only string buffer effectively writeable. This SEGVs on your branch

Ahyes, indeed. :/ I wonder if setting the COW flag will solve this. Possibly, though it suggests there might be a possibility for other similar troubles.

In any case while I think it over I'll add some tests and a fix for this one at least.

leonerd · 2021-08-09T23:18:05Z

OK, I think have made some progress there. I've also made a (draft) PR for the branch, so discussions on the implementation can be had over there: Perl/perl5#19040

That leaves this thread free for the abstract intent of the idea, aside from the impl.

leonerd · 2021-08-18T14:42:13Z

Implementation-wise, the code looks happy and solid, and so far seems to be waiting on this RFC process to continue before it gets merged.

How do we resolve this stalemate? Can we "accept" the RFC? Or failing that, do we just merge the PR that implements it into perl core and call it done?

tonycoz · 2021-08-25T00:58:24Z

rfcs/rfcTODO.md

+
+The core immortals `PL_sv_yes` and `PL_sv_no` will always respond true to `SvBOOLOK()`, and such a flag will be reliably copied by `sv_setsv()` and friends. The result here is that the result of boolean-returning ops will be `SvBOOLOK()` and this flag will remain with any copies of that value that get made - either over the arguments stack or stored in lexicals or elements of aggregate structures.
+
+These `SvBOOLOK()` values will still be subject to the usual semantics regarding macros like `SvPV` or `SvIV` - so numerically they will still be 1 or 0, and stringily they will still be "1" and "" (though see the Future Scope section below).


You might want to clarify here to cover @iabyn's comment, that the POK, IOK etc flags on the SV won't be changing, so code like pp_add doesn't need any changes.

tonycoz · 2021-08-25T01:02:55Z

rfcs/rfcTODO.md

+
+Obviously such a solution is specific to JSON encoding and does not apply to, for example, message gateway between JSON and MsgPack, which would require some translation inbetween. A true in-core solution to this problem would have many benefits to interoperability of data handling between these various modules.
+
+((TODO: Add some comments about purely in-core handling as well that don't rely on serialisation))


Should this TODO be updated? (or removed)

duncand · 2021-09-04T18:10:43Z

With regard to the detail you are less sure about, the true/false keywords or is_boolean, I believe it would be best if they were all provided in the same place, considering that logically true and false are just nullary functions and is_boolean is a unary function.

If they will live with Scalar::Util then they should all be there, and one can use Scalar::Util 'false'; just as much as they can use Scalar::Util 'is_boolean';.

I believe having a use feature boolean would be an unnecessary dedicated feature, but if you did have it then it should provide the is_boolean along with the false and true.

But I believe the best solution to this will also be informed by the separate discussion going on in p5p/etc about having the new generic std/builtin/etc namespace, and if that is implemented then it is where all 3/etc of these should go, and otherwise if it isn't I would say put them all in Scalar::Util.

haarg · 2021-09-04T18:22:33Z

It seems reasonable to put true and false constants in the std:: namespace, and leave those out until that namespace is established. Once it exists, it's probably reasonable to bring in a few different functions from Scalar::Util.

Until then, I think an isbool function on its own in Scalar::Util is fine.

leonerd · 2021-09-04T19:11:08Z

@duncand Yes you're right about the std namespacing points. Much of the original design on this RFC predates the std namespace discussion, it might need adjusting in light of the outcomes from that.

kraih · 2021-09-04T20:05:15Z

If using true and false required an import, i would always use !!1 and !!0 instead.

duncand · 2021-09-04T20:26:18Z

@kraih That's fair. What I was specifically advocating was that we do NOT have use feature boolean; just to get true/false. If we have use feature boolean then it should ALSO provide the isbool() etc.

duncand · 2021-09-04T20:38:19Z

Actually I would argue that false/true specifically would be strong candidates for appearing automatically when one says use 5.36; etc. They have a much stronger case to being available with the minimum possible boilerplate of any kind. So better than use feature boolean or use Scalar::Util or std::false/std::true, although come to think of it std::false etc probably actually isn't that bad for being unambiguous.

Grinnz · 2021-09-04T22:01:25Z

Adding it as a feature is a prerequisite for "appearing automatically when one says use v5.36;". Unless you want to propose an additional mechanism by which that would enable keywords, which I'm sure will be part of the function namespace discussion.

leonerd force-pushed the rfc-sv-bool-type branch from d0a1d4e to 075fde9 Compare August 7, 2021 15:14

tonycoz reviewed Aug 25, 2021

View reviewed changes

haarg mentioned this pull request Sep 4, 2021

Stable tracking of "boolean intent" across SVs Perl/perl5#19040

Merged

leonerd force-pushed the rfc-sv-bool-type branch from 075fde9 to feea6f9 Compare October 16, 2021 14:47

Add RFC 0008: Stable SV Boolean Type

e6ecd22

leonerd force-pushed the rfc-sv-bool-type branch from feea6f9 to e6ecd22 Compare October 16, 2021 14:57

leonerd merged commit 0d61e8f into Perl:master Oct 16, 2021


		The core immortals `PL_sv_yes` and `PL_sv_no` will always respond true to `SvBOOLOK()`, and such a flag will be reliably copied by `sv_setsv()` and friends. The result here is that the result of boolean-returning ops will be `SvBOOLOK()` and this flag will remain with any copies of that value that get made - either over the arguments stack or stored in lexicals or elements of aggregate structures.

		These `SvBOOLOK()` values will still be subject to the usual semantics regarding macros like `SvPV` or `SvIV` - so numerically they will still be 1 or 0, and stringily they will still be "1" and "" (though see the Future Scope section below).


		Obviously such a solution is specific to JSON encoding and does not apply to, for example, message gateway between JSON and MsgPack, which would require some translation inbetween. A true in-core solution to this problem would have many benefits to interoperability of data handling between these various modules.

		((TODO: Add some comments about purely in-core handling as well that don't rely on serialisation))

Conversation

leonerd commented Aug 6, 2021

Uh oh!

iabyn commented Aug 7, 2021

Uh oh!

leonerd commented Aug 7, 2021

Uh oh!

leonerd commented Aug 7, 2021 • edited by nwc10 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kraih commented Aug 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcusramberg commented Aug 9, 2021

Uh oh!

iabyn commented Aug 9, 2021 via email

Uh oh!

leonerd commented Aug 9, 2021

Uh oh!

leonerd commented Aug 9, 2021

Uh oh!

leonerd commented Aug 18, 2021

Uh oh!

tonycoz Aug 25, 2021

Choose a reason for hiding this comment

Uh oh!

tonycoz Aug 25, 2021

Choose a reason for hiding this comment

Uh oh!

duncand commented Sep 4, 2021

Uh oh!

haarg commented Sep 4, 2021

Uh oh!

leonerd commented Sep 4, 2021

Uh oh!

kraih commented Sep 4, 2021

Uh oh!

duncand commented Sep 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duncand commented Sep 4, 2021

Uh oh!

Grinnz commented Sep 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

leonerd commented Aug 7, 2021 •

edited by nwc10

Loading

kraih commented Aug 7, 2021 •

edited

Loading

duncand commented Sep 4, 2021 •

edited

Loading