Add aromaticity field + clear up some discrepancies#35
Conversation
spec.md
Outdated
| "formalChargeList": [ 0, 0, 0, 0 ], | ||
| "bondAtomList": [ 1, 0, 2, 1, 3, 2 ], | ||
| "bondOrderList": [ 1, 1, 2 ], | ||
| "bondAromaticity": [ 0, 0, 1 ], |
spec.md
Outdated
| | [groupList](#grouplist) | [Array](#types) | Y | | ||
| | [bondAtomList](#bondatomlist) | [Binary](#types) | | | ||
| | [bondOrderList](#bondorderlist) | [Binary](#types) | | | ||
| | [bondAromaticityrList](#bondaromaticitylist)| [Binary](#types) | | |
spec.md
Outdated
| "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4, 6, 5, 7, 5 ], | ||
| "bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ] | ||
| "bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ], | ||
| "bondAromaticity": [ 0, 0, 1, 0, 0, 1, 1 ] |
There was a problem hiding this comment.
I think we should change the example to reflect the phenylring in PHE, so the aromaticity example makes sense.
spec.md
Outdated
| "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4 ], | ||
| "bondOrderList": [ 1, 1, 2, 1, 1 ] | ||
| "bondOrderList": [ 1, 1, 2, 1, 1 ], | ||
| "bondAromaticity": [ 0, 0, 1, 0, 0 ] |
spec.md
Outdated
| "formalChargeList": [ 0, 0, 0, 0 ], | ||
| "bondAtomList": [ 1, 0, 2, 1, 3, 2 ], | ||
| "bondOrderList": [ 1, 1, 2 ], | ||
| "bondAromaticity": [ 0, 0, 1 ], |
spec.md
Outdated
| "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4, 6, 5, 7, 5 ], | ||
| "bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ] | ||
| "bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ], | ||
| "bondAromaticity": [ 0, 0, 1, 0, 0, 1, 1 ] |
spec.md
Outdated
| "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4 ], | ||
| "bondOrderList": [ 1, 1, 2, 1, 1 ] | ||
| "bondOrderList": [ 1, 1, 2, 1, 1 ], | ||
| "bondAromaticity": [ 0, 0, 1, 0, 0 ] |
spec.md
Outdated
| *Type*: [Binary](#types) data that decodes into an array of 8-bit signed integers. | ||
|
|
||
| *Description*: Array of bond orders for bonds in `bondAtomList`. Must be values between 1 and 4, defining single, double, triple, and quadruple bonds. | ||
| *Description*: Array of bond orders for bonds in `bondAtomList`. Must be values between 0 and 4, defining unknown, single, double, triple, and quadruple bonds. |
There was a problem hiding this comment.
How would you feel about -1 being the unknown value? We also use -1 for undefined secondary structure.
Motivation: One day we might discuss adding zero-order bonds.
There was a problem hiding this comment.
I'm all for consistency, so a -1 would be a good option.
|
If I understand you correctly, all 3 bond...List fields must be given if bonds are present? In that case we would need to increment the version to 2.0 since 1.0-compliant files would not be valid anymore. Alternatively, a default behavior for the case of missing |
|
@speleo3 Thank you for the comments! I think I fixed all of them. Sorry I missed so much in my first pass. @pwrose & @speleo3 I updated @gtauriello I suppose that's what's up for discussion. I tried be as specific as possible for this initial draft, and if everyone else thinks differently then we can change it! In my opinion it will be easier for everyone involved if all 3 fields exist when bonds are present. Otherwise I can see applications having io code that has a lot of branch points that need to be built to adhere to the spec. But that's just an opinion, and maybe some people that have deeper integration with mmtf in their applications could speak about some pros/cons from their ends. I guess this is V2 of the PR now, I made some updates with some more notes/examples. The most important part to me is to make sure that applications don't have to have to account for a bunch of different corner cases when doing io. But you can let me know if I overreached. Additionally: |
spec.md
Outdated
| | bondAtomList | [Array](#types) | Array of bonded atom indices, [Integers](#types) | | | ||
| | bondOrderList | [Array](#types) | Array of bond orders as [Integers](#types) between 1 and 4 | | | ||
| | bondAromaticityList | [Array](#types) | Array of bond aromaticity as [Integers](#types) between 0 and 4 | | | ||
| | bondAromaticityList | [Array](#types) | Array of bond aromaticity as [Integers](#types) -1, 1,2,3 and 3 | | |
There was a problem hiding this comment.
Should this read: Array of bond aromaticity as Integers 0 or 1
|
@danpf Sounds good to me. Thanks for the update. I agree that application developers shouldn't have to do many if-branches. For the decoders, on the other hand, we may still need/want(?) to support reading of 1.0-compliant files where we then need some default behavior to deal with missing aromaticities flags. Once the file is decoded, the user won't have to do any branching anymore, since the decoder can guarantee that all 3 fields have full data. But unless we drop backwards compatibility in our decoders, this implies having a default behavior for decoders to deal with the field missing from the file... |
|
-1 for requiring |
|
Agree with @speleo3 . I'd keep both order and aromaticity optional. There can be applications that have no idea about the bond types at all. |
|
I am unsure if the run-length encoding of bondAromaticityList is worth it. In the case where nothing is aromatic, sure, but then you can leave it out altogether. For the case of proteins the "runs" can be quite short, i.e. from one aromatic side chain to another. For secondary structure data run-length encoding was generally not worth it and bondAromaticityList seems similar. @danpf did you calculate if it is worth it? |
|
@arose which yeilds: I don't really mind either way, but if we're not going to encode but I would expect the vast majority of bonds in |
|
Also, I was just thinking... Maybe we shouldn't call it |
|
I made some changes to fit what was discussed. let me know your thoughts! |
right, ok then |
This is a first pass at adding the aromaticity field.
Please post any suggestions you might have!
This is meant to clear up:
#34
#33