Support custom default values for pointer fields.

zenhack commented

2018-05-24 00:59:38 +02:00

(Migrated from github.com)

Right now we don't support custom default values for pointer fields in structs; fetching the value from a null pointer will always return the default value for the type, even if the field overrides it.

I'm intentionally deferring this for now, since it's fiddly to get right, especially if we don't want to require mutability (the C++ implementation just modifies the message to add the value on first access). Default pointer values are arguably a misfeature, but we should support them eventually for compatibility.

Right now we don't support custom default values for pointer fields in structs; fetching the value from a null pointer will always return the default value for the type, even if the field overrides it. I'm intentionally deferring this for now, since it's fiddly to get right, especially if we don't want to require mutability (the C++ implementation just modifies the message to add the value on first access). Default pointer values are arguably a misfeature, but we should support them eventually for compatibility.

👍 1

zenhack commented

2018-06-07 00:15:39 +02:00

(Migrated from github.com)

I just pushed eb33f1f, which causes the schema compiler to just reject schema that use this feature, rather than generating code that does the wrong thing. Per the commit message, almost nobody is using these, so just not supporting them doesn't seem like a horrible strategy; maybe we'll support them someday, but I consider this very low priority.

I'm going to switch this from bug to enhancement, given that we're no longer generating incorrect code, but cleanly not supporting the feature.

I just pushed eb33f1f, which causes the schema compiler to just reject schema that use this feature, rather than generating code that does the wrong thing. Per the commit message, almost nobody is using these, so just not supporting them doesn't seem like a horrible strategy; maybe we'll support them someday, but I consider this very low priority. I'm going to switch this from bug to enhancement, given that we're no longer generating incorrect code, but cleanly not supporting the feature.

zenhack commented

2018-06-17 21:09:42 +02:00

(Migrated from github.com)

I have a thought on how to implement this:

For each default value that appears in a schema, we generate a constant of type ByteString, whose value is that of a capnp segment containing the value, where the first word is the landing pad of a far-pointer pointing to the value.
When a caller follows a default-pointer for the field, we:
- Convert the the byte string to a segment for the type we care about, using the segment type's fromByteString method. This was not possible with the old Blob interfaces; the new interfaces enable this.
- Importantly, before doing the above we should invoice the length of the ByteString, since in the case of a mutable message this will make a full copy. In the immutable case it doesn't really matter, since fromByteString is O(1) and makes no copy, but it doesn't hurt.
- We append the new segment to the message, and return pointer to the value.
- In the mutable case, we patch in a far pointer pointing to the landing pad.

Note that this means in the immutable case, the Untyped values would refer to a message other than the one used by the pointer they followed, since we can't add the segment to the message in-place. But I don't think this breaks anything. It may make sense to use the persistent-vector package instead of boxed vectors from vector in this case.

I have a thought on how to implement this: * For each default value that appears in a schema, we generate a constant of type `ByteString`, whose value is that of a capnp segment containing the value, where the first word is the landing pad of a far-pointer pointing to the value. * When a caller follows a default-pointer for the field, we: * Convert the the byte string to a segment for the type we care about, using the segment type's `fromByteString` method. This was not possible with the old `Blob` interfaces; the new interfaces enable this. * Importantly, before doing the above we should `invoice` the length of the `ByteString`, since in the case of a mutable message this will make a full copy. In the immutable case it doesn't really matter, since `fromByteString` is O(1) and makes no copy, but it doesn't hurt. * We append the new segment to the message, and return pointer to the value. * In the mutable case, we patch in a far pointer pointing to the landing pad. Note that this means in the immutable case, the `Untyped` values would refer to a message other than the one used by the pointer they followed, since we can't add the segment to the message in-place. But I don't think this breaks anything. It may make sense to use the `persistent-vector` package instead of boxed vectors from `vector` in this case.

zenhack commented

2018-07-15 21:54:41 +02:00

(Migrated from github.com)

Actually, pointing to a different message gets weird if someone tries to use the HasMessage class to fetch the underlying message. An alternative would be to point to the same message, but also add to the Struct/ListOf/etc types a direct pointer to the segment (which needn't be part of the message). This may also incidentally help with performance. But this way we can still return the correct message (which will be unmodified, and thus the null pointer that brought us to the default value will still have the correct value), and still get O(1) access for the value (as opposed to the O(n) copy needed by the mutable implementation and the implementations in other languages).

Actually, pointing to a different message gets weird if someone tries to use the `HasMessage` class to fetch the underlying message. An alternative would be to point to the same message, but also add to the Struct/ListOf/etc types a direct pointer to the segment (which needn't be part of the message). This may also incidentally help with performance. But this way we can still return the correct message (which will be unmodified, and thus the null pointer that brought us to the default value will still have the correct value), and still get O(1) access for the value (as opposed to the O(n) copy needed by the mutable implementation and the implementations in other languages).

zenhack commented

2018-09-05 06:52:00 +02:00

(Migrated from github.com)

capnp 0.7 adds a default value for a Text field in json.capnp, so we now can't upgrade without at least removing the hard-reject.

The default value is just "", which we're actually reading a null pointer as anyway.

capnp 0.7 adds a default value for a Text field in json.capnp, so we now can't upgrade without at least removing the hard-reject. The default value is just `""`, which we're actually reading a null pointer as anyway.

zenhack commented

2018-09-16 03:04:01 +02:00

(Migrated from github.com)

Started working on this on a branch.

zenhack commented

2018-09-18 20:23:54 +02:00

(Migrated from github.com)

Current status: I'm hitting a snarl re: how to actually implement the different behaviors described above. It's not possible to do so as an independent function, because of parametricity. We could add a function to the Message type class, but I find that very unsatisfying.

The root of the problem here is that the message types are on the wrong side of the expression problem; we really only intend to ever have ConstMsg and MutMsg s, but we've structured things in a way where it's easier to add new Message instances than new functions that work on all messages.

It would be possible to swap out the message type class and separate types with a GADT:

-- phantom types:
data Const
data Mut s

data Msg a where
    ConstMsg :: ... -> Msg Const
    MutMsg :: ... -> Msg (Mut s)

This results in a much cleaner implementation than anything else I can come up with. However making the change is going to touch a very large percentage of the code in the libraries and generated code. I don't think it will actually be hard to do, just tedious. It would be a nightmare without a type system, and I don't think I'd even consider it, but we've got one.

I want to sleep on this a few more nights before pulling the trigger, and I think I want to get the rpc stuff merged so I don't have to deal with conflicts everywhere -- so I'm tabling this issue for the time being.

Current status: I'm hitting a snarl re: how to actually implement the different behaviors described above. It's not possible to do so as an independent function, because of parametricity. We could add a function to the Message type class, but I find that very unsatisfying. The root of the problem here is that the message types are on the wrong side of the expression problem; we really only intend to ever have ConstMsg and MutMsg s, but we've structured things in a way where it's easier to add new `Message` instances than new functions that work on all messages. It would be possible to swap out the message type class and separate types with a GADT: ```haskell -- phantom types: data Const data Mut s data Msg a where ConstMsg :: ... -> Msg Const MutMsg :: ... -> Msg (Mut s) ``` This results in a much cleaner implementation than anything else I can come up with. However making the change is going to touch a very large percentage of the code in the libraries and generated code. I don't think it will actually be hard to do, just tedious. It would be a nightmare without a type system, and I don't think I'd even consider it, but we've got one. I want to sleep on this a few more nights before pulling the trigger, and I think I want to get the rpc stuff merged so I don't have to deal with conflicts everywhere -- so I'm tabling this issue for the time being.

zenhack commented

2018-09-18 20:56:20 +02:00

(Migrated from github.com)

We'd still need some kind of parametrization (probably still a type class) to deal with the fact that we've got different constraints on the monad for the instances; MutMsg current requires PrimMonad. We also would need to export the data constructors to be able to actually take advantage of this change. hmm...

zenhack commented

2018-11-08 01:53:45 +01:00

(Migrated from github.com)

I'm tempted to change the error that the code generator emits to a warning and generate constants, as an intermediate step; this would allow at least using schema that use this feature at all, which would unblock a few things.

For a longer term solution, the best thing I can come up with is to add a method to the Message type class like:

-- | @'getMaybeSetPtr' struct i def@ fetches the @i@th pointer from the struct's data section.
-- If said pointer is null, it returns @def@ instead. For mutable messages, @def@ will first
-- be thawed and copied into the message, then pointed to from the targeted pointer.
getMaybeSetPtr :: ReadCtx m msg => Struct msg -> Word16 -> Ptr ConstMsg -> m (Ptr msg)

I'm not in love with how the conditional logic for the two instances is in the type class's contract, but per the above the whole thing is a little weird given that we have a type class that is really only intended to ever have two instances. Everything else I've sketched is far more complex.

It still has the problem that in the ConstMsg case, it would no longer be pointing at the same message.

I'm tempted to change the error that the code generator emits to a warning and generate constants, as an intermediate step; this would allow at least using schema that use this feature at all, which would unblock a few things. For a longer term solution, the best thing I can come up with is to add a method to the `Message` type class like: ```haskell -- | @'getMaybeSetPtr' struct i def@ fetches the @i@th pointer from the struct's data section. -- If said pointer is null, it returns @def@ instead. For mutable messages, @def@ will first -- be thawed and copied into the message, then pointed to from the targeted pointer. getMaybeSetPtr :: ReadCtx m msg => Struct msg -> Word16 -> Ptr ConstMsg -> m (Ptr msg) ``` I'm not in love with how the conditional logic for the two instances is in the type class's contract, but per the above the whole thing is a little weird given that we have a type class that is really only intended to ever have two instances. Everything else I've sketched is far more complex. It still has the problem that in the ConstMsg case, it would no longer be pointing at the same message.

zenhack commented

2019-01-20 09:07:51 +01:00

(Migrated from github.com)

I actually counted, and between the 0.7 capnproto core schema and the current sandstorm schema (~9000 SLOC in .capnp schema), this feature is used exactly twice. In both cases, the field is of type Text and the custom default value is the empty string. We already decode null Texts as empty string; the only way to distinguish is to use the has_* functions.

I've pushed a commit that downgrades the error we were generating to a warning; it will still generate code now. It also points you to exactly which field is the problem.

I'm planning on closing this as WONTFIX; the only question left in my mind is whether we should specifically look for ... :Text = "" and not complain at all in those cases, or leave it as-is. Need to sleep on that.

I actually counted, and between the 0.7 capnproto core schema and the current sandstorm schema (~9000 SLOC in .capnp schema), this feature is used exactly twice. In both cases, the field is of type Text and the custom default value is the empty string. We already decode null Texts as empty string; the only way to distinguish is to use the `has_*` functions. I've pushed a commit that downgrades the error we were generating to a warning; it will still generate code now. It also points you to exactly which field is the problem. I'm planning on closing this as WONTFIX; the only question left in my mind is whether we should specifically look for `... :Text = ""` and not complain at all in those cases, or leave it as-is. Need to sleep on that.

Rows
Columns

Support custom default values for pointer fields. #28