Skip to content

Add an RFC describing SV boolean semantics#3

Merged
leonerd merged 1 commit intoPerl:masterfrom
leonerd:rfc-sv-bool-type
Oct 16, 2021
Merged

Add an RFC describing SV boolean semantics#3
leonerd merged 1 commit intoPerl:masterfrom
leonerd:rfc-sv-bool-type

Conversation

@leonerd
Copy link
Contributor

@leonerd leonerd commented Aug 6, 2021

While this is still in progress its ID number remains "TODO"; I will set it to the next free number just before merge.

@iabyn
Copy link

iabyn commented Aug 7, 2021

I think this is terrible idea! We seem to be suddenly obsessed with ignoring perl's basic operating principle (that data is fluid and typeless - type interpretation being imposed by ops operating on the data) for the sake of the one use case of round trips for JSON and similar serialisers .

Such a boolean flag adds extra overhead to lots of hot places in the perl core - for example pp_add(), which currently has to handle all permutations of two args being (int/num/string/other), would have to cope with (bool/int/num/string/other) x 2

@leonerd
Copy link
Contributor Author

leonerd commented Aug 7, 2021

I think this is terrible idea! We seem to be suddenly obsessed with ignoring perl's basic operating principle (that data is fluid and typeless - type interpretation being imposed by ops operating on the data) for the sake of the one use case of round trips for JSON and similar serialisers .

This doesn't change the semantics of any existing operations: all values ("boolean-intent" or not) can still be tested for truth, can still be numified or stringified. Nothing at all changes there.

The only thing that is new is that now there is an additional question that can be asked ("is this value boolean-intent?"), a question which answers "yes" to the result of any boolean predicate test operator, or any SV transitively initialised from such.

Such a boolean flag adds extra overhead to lots of hot places in the perl core - for example pp_add(), which currently has to handle all permutations of two args being (int/num/string/other), would have to cope with (bool/int/num/string/other) x 2

Indeed - it turned out for that and other reasons I didn't implement it in a flag. Instead, in my branch I have a small optimisation in sv_setsv_flags which notices the special PV values stored by PL_sv_yes (the char array PL_Yes) and PL_sv_no (the char array PL_No) and copies those pointers, with SvLEN set to zero, into the destination. This has two upshots:

  1. We save memory and CPU time by not needing to make string copies of boolean values: e.g.
my @arr;
push @arr, 1 == 1 for 1 .. 100000;

will now consume less memory and take less CPU time to run.

  1. A pointer comparison on SvPVX(sv) is now sufficient to implement the SvISBOOL() test, which I have added as a function in Scalar::Util.

As far as I can tell, this optimisation is already beneficial in terms of CPU and memory savings, even if we ignore the new ability we gain by having SvISBOOL()

You can see my current progress at

https://github.com/leonerd/perl5/tree/stable-bool

@leonerd
Copy link
Contributor Author

leonerd commented Aug 7, 2021

On a broader note:

I think this is terrible idea! We seem to be suddenly obsessed

I don't know about "suddenly" there. It's been the case a long time that Perl users have cared about things like boolean roundtrips in serialisers. The entire JSON module ecosystem (consisting of multiple modules, by multiple authors) invented a special-case hack for this problem - basically see the entire context and history around https://metacpan.org/pod/JSON::PP::Boolean

This suggested change is part of a broader set of changes (e.g. Nicholas Clark's PV vs IV flags adjustments) to add what I am desperately trying not to call a "type system" to Perl. "Type System" usually brings to mind that the computer (either the compiler or the interpreter) will forbid certain operations that it doesn't think match up. That isn't what we're doing here. What we're doing is adding more what I am trying to call "intention tracking" - the idea that given a scalar value, can we know what the programmer intended it to mean? Sure any value can be tested for boolean truth, for numerical value, or stringy value, but what did the programmer really have in mind as "the" canonical shape of this data.

Looking at the wider picture, comparing a few other languages, Perl is starting to look somewhat lacking here:

$ python3 -c 'import json; print(json.dumps([1, "1", 1 == 1]))'
[1, "1", true]

$ nodejs -e 'console.log(JSON.stringify([1, "1", 1 == 1]))'
[1,"1",true]

$ perl -MJSON::MaybeUTF8=encode_json_utf8 -E 'say encode_json_utf8([1, "1", 1 == 1])'
[1,"1",1]

I'm not saying this one change alone is sufficient, but this is one small facet of a much larger issue. Without fixing the larger issue, it becomes harder for Perl to stand alongside these other languages in modern commercial settings. Interoperability of data is an important concern these days - gone is the time of little standalone awk-like scripts running on the local developer's machine.

There is no getting around it - we need abilities like this for Perl to remain relevant to the modern world.

@kraih
Copy link

kraih commented Aug 7, 2021

On a broader note:

I think this is terrible idea! We seem to be suddenly obsessed

I don't know about "suddenly" there. It's been the case a long time that Perl users have cared about things like boolean roundtrips in serialisers.

There's really nothing sudden about it. We have more cases where the community has been begging for a little "stricter" types for many many years. The old scalar value problem of characters vs bytes comes to mind, when dealing with I/O. There's a reason utf8::is_utf8 is misused so much in CPAN modules. A tiny bit of additional type information goes a long way towards improving the user experience.

@marcusramberg
Copy link

Given this doesn't have a negative performance impact, it seems like a great improvement for allowing Perl to inter-op more easily with modern devops stuff, which almost always involves json or yaml wrangling.

@iabyn
Copy link

iabyn commented Aug 9, 2021 via email

@leonerd
Copy link
Contributor Author

leonerd commented Aug 9, 2021

The trouble with this is that it is making a shared read-only string buffer effectively writeable. This SEGVs on your branch

Ahyes, indeed. :/ I wonder if setting the COW flag will solve this. Possibly, though it suggests there might be a possibility for other similar troubles.

In any case while I think it over I'll add some tests and a fix for this one at least.

@leonerd
Copy link
Contributor Author

leonerd commented Aug 9, 2021

OK, I think have made some progress there. I've also made a (draft) PR for the branch, so discussions on the implementation can be had over there: Perl/perl5#19040

That leaves this thread free for the abstract intent of the idea, aside from the impl.

@leonerd
Copy link
Contributor Author

leonerd commented Aug 18, 2021

Implementation-wise, the code looks happy and solid, and so far seems to be waiting on this RFC process to continue before it gets merged.

How do we resolve this stalemate? Can we "accept" the RFC? Or failing that, do we just merge the PR that implements it into perl core and call it done?

rfcs/rfcTODO.md Outdated

The core immortals `PL_sv_yes` and `PL_sv_no` will always respond true to `SvBOOLOK()`, and such a flag will be reliably copied by `sv_setsv()` and friends. The result here is that the result of boolean-returning ops will be `SvBOOLOK()` and this flag will remain with any copies of that value that get made - either over the arguments stack or stored in lexicals or elements of aggregate structures.

These `SvBOOLOK()` values will still be subject to the usual semantics regarding macros like `SvPV` or `SvIV` - so numerically they will still be 1 or 0, and stringily they will still be "1" and "" (though see the Future Scope section below).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to clarify here to cover @iabyn's comment, that the POK, IOK etc flags on the SV won't be changing, so code like pp_add doesn't need any changes.


Obviously such a solution is specific to JSON encoding and does not apply to, for example, message gateway between JSON and MsgPack, which would require some translation inbetween. A true in-core solution to this problem would have many benefits to interoperability of data handling between these various modules.

((TODO: Add some comments about purely in-core handling as well that don't rely on serialisation))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this TODO be updated? (or removed)

@duncand
Copy link

duncand commented Sep 4, 2021

With regard to the detail you are less sure about, the true/false keywords or is_boolean, I believe it would be best if they were all provided in the same place, considering that logically true and false are just nullary functions and is_boolean is a unary function.

If they will live with Scalar::Util then they should all be there, and one can use Scalar::Util 'false'; just as much as they can use Scalar::Util 'is_boolean';.

I believe having a use feature boolean would be an unnecessary dedicated feature, but if you did have it then it should provide the is_boolean along with the false and true.

But I believe the best solution to this will also be informed by the separate discussion going on in p5p/etc about having the new generic std/builtin/etc namespace, and if that is implemented then it is where all 3/etc of these should go, and otherwise if it isn't I would say put them all in Scalar::Util.

@haarg
Copy link
Contributor

haarg commented Sep 4, 2021

It seems reasonable to put true and false constants in the std:: namespace, and leave those out until that namespace is established. Once it exists, it's probably reasonable to bring in a few different functions from Scalar::Util.

Until then, I think an isbool function on its own in Scalar::Util is fine.

@leonerd
Copy link
Contributor Author

leonerd commented Sep 4, 2021

@duncand Yes you're right about the std namespacing points. Much of the original design on this RFC predates the std namespace discussion, it might need adjusting in light of the outcomes from that.

@kraih
Copy link

kraih commented Sep 4, 2021

If using true and false required an import, i would always use !!1 and !!0 instead.

@duncand
Copy link

duncand commented Sep 4, 2021

@kraih That's fair. What I was specifically advocating was that we do NOT have use feature boolean; just to get true/false. If we have use feature boolean then it should ALSO provide the isbool() etc.

@duncand
Copy link

duncand commented Sep 4, 2021

Actually I would argue that false/true specifically would be strong candidates for appearing automatically when one says use 5.36; etc. They have a much stronger case to being available with the minimum possible boilerplate of any kind. So better than use feature boolean or use Scalar::Util or std::false/std::true, although come to think of it std::false etc probably actually isn't that bad for being unambiguous.

@Grinnz
Copy link

Grinnz commented Sep 4, 2021

Adding it as a feature is a prerequisite for "appearing automatically when one says use v5.36;". Unless you want to propose an additional mechanism by which that would enable keywords, which I'm sure will be part of the function namespace discussion.

@leonerd leonerd merged commit 0d61e8f into Perl:master Oct 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants