Add aromaticity field + clear up some discrepancies by danpf · Pull Request #35 · rcsb/mmtf

danpf · 2018-08-31T03:50:48Z

This is a first pass at adding the aromaticity field.

Please post any suggestions you might have!

This is meant to clear up:
#34
#33

speleo3 · 2018-08-31T05:36:52Z

spec.md

        "formalChargeList": [ 0, 0, 0, 0 ],
        "bondAtomList": [ 1, 0, 2, 1, 3, 2 ],
        "bondOrderList": [ 1, 1, 2 ],
+        "bondAromaticity": [ 0, 0, 1 ],


missing List suffix?

speleo3 · 2018-08-31T05:37:15Z

spec.md

 | [groupList](#grouplist)                     | [Array](#types)     |    Y     |
 | [bondAtomList](#bondatomlist)               | [Binary](#types)    |          |
 | [bondOrderList](#bondorderlist)             | [Binary](#types)    |          |
+| [bondAromaticityrList](#bondaromaticitylist)| [Binary](#types)    |          |


typo (extra r)

speleo3 · 2018-08-31T05:37:43Z

spec.md

        "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4, 6, 5, 7, 5 ],
-        "bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ]
+        "bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ],
+        "bondAromaticity": [ 0, 0, 1, 0, 0, 1, 1 ]


missing List suffix?

I think we should change the example to reflect the phenylring in PHE, so the aromaticity example makes sense.

speleo3 · 2018-08-31T05:38:18Z

spec.md

        "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4 ],
-        "bondOrderList": [ 1, 1, 2, 1, 1 ]
+        "bondOrderList": [ 1, 1, 2, 1, 1 ],
+        "bondAromaticity": [ 0, 0, 1, 0, 0 ]


missing List suffix?

speleo3 · 2018-08-31T05:39:17Z

spec.md

        "formalChargeList": [ 0, 0, 0, 0 ],
        "bondAtomList": [ 1, 0, 2, 1, 3, 2 ],
        "bondOrderList": [ 1, 1, 2 ],
+        "bondAromaticity": [ 0, 0, 1 ],


missing List suffix?

speleo3 · 2018-08-31T05:39:44Z

spec.md

        "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4, 6, 5, 7, 5 ],
-        "bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ]
+        "bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ],
+        "bondAromaticity": [ 0, 0, 1, 0, 0, 1, 1 ]


missing List suffix?

speleo3 · 2018-08-31T05:39:49Z

spec.md

        "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4 ],
-        "bondOrderList": [ 1, 1, 2, 1, 1 ]
+        "bondOrderList": [ 1, 1, 2, 1, 1 ],
+        "bondAromaticity": [ 0, 0, 1, 0, 0 ]


missing List suffix?

speleo3 · 2018-08-31T06:00:10Z

spec.md

 *Type*: [Binary](#types) data that decodes into an array of 8-bit signed integers.

-*Description*: Array of bond orders for bonds in `bondAtomList`. Must be values between 1 and 4, defining single, double, triple, and quadruple bonds.
+*Description*: Array of bond orders for bonds in `bondAtomList`. Must be values between 0 and 4, defining unknown, single, double, triple, and quadruple bonds.


How would you feel about -1 being the unknown value? We also use -1 for undefined secondary structure.
Motivation: One day we might discuss adding zero-order bonds.

I'm all for consistency, so a -1 would be a good option.

gtauriello · 2018-08-31T10:06:47Z

If I understand you correctly, all 3 bond...List fields must be given if bonds are present?

In that case we would need to increment the version to 2.0 since 1.0-compliant files would not be valid anymore. Alternatively, a default behavior for the case of missing bondAromaticityList could be defined (e.g. list of same length as bondOrderList with all values being 0)...

danpf · 2018-08-31T17:02:57Z

@speleo3 Thank you for the comments! I think I fixed all of them. Sorry I missed so much in my first pass.

@pwrose & @speleo3 I updated -1 to be the new unknown bondOrder

@gtauriello I suppose that's what's up for discussion. I tried be as specific as possible for this initial draft, and if everyone else thinks differently then we can change it!

In my opinion it will be easier for everyone involved if all 3 fields exist when bonds are present. Otherwise I can see applications having io code that has a lot of branch points that need to be built to adhere to the spec.

But that's just an opinion, and maybe some people that have deeper integration with mmtf in their applications could speak about some pros/cons from their ends.

I guess this is V2 of the PR now, I made some updates with some more notes/examples. The most important part to me is to make sure that applications don't have to have to account for a bunch of different corner cases when doing io. But you can let me know if I overreached.

Additionally:
Maintainers can make updates as well if you're so inclined to. But I'd be happy to make the changes otherwise.

pwrose · 2018-08-31T18:31:14Z

spec.md

 | bondAtomList         | [Array](#types)   | Array of bonded atom indices, [Integers](#types)                 |          |
 | bondOrderList        | [Array](#types)   | Array of bond orders as [Integers](#types) between 1 and 4       |          |
-| bondAromaticityList  | [Array](#types)   | Array of bond aromaticity as [Integers](#types) between 0 and 4  |          |
+| bondAromaticityList  | [Array](#types)   | Array of bond aromaticity as [Integers](#types) -1, 1,2,3 and 3  |          |


Should this read: Array of bond aromaticity as Integers 0 or 1

Fixed! Thanks!

gtauriello · 2018-09-03T13:43:49Z

@danpf Sounds good to me. Thanks for the update. I agree that application developers shouldn't have to do many if-branches.

For the decoders, on the other hand, we may still need/want(?) to support reading of 1.0-compliant files where we then need some default behavior to deal with missing aromaticities flags. Once the file is decoded, the user won't have to do any branching anymore, since the decoder can guarantee that all 3 fields have full data. But unless we drop backwards compatibility in our decoders, this implies having a default behavior for decoders to deal with the field missing from the file...

speleo3 · 2018-09-03T14:11:44Z

-1 for requiring bondAromaticityList if bondAtomList exists. I don't mind the extra if-branch.

josemduarte · 2018-09-03T15:12:19Z

Agree with @speleo3 . I'd keep both order and aromaticity optional. There can be applications that have no idea about the bond types at all.

arose · 2018-09-04T20:37:41Z

I am unsure if the run-length encoding of bondAromaticityList is worth it. In the case where nothing is aromatic, sure, but then you can leave it out altogether. For the case of proteins the "runs" can be quite short, i.e. from one aromatic side chain to another. For secondary structure data run-length encoding was generally not worth it and bondAromaticityList seems similar. @danpf did you calculate if it is worth it?

danpf · 2018-09-14T23:02:07Z

@arose
Since the main bondAromaticityList which would be encoded applies to all inter-group-bonds I did:

num_ig_bonds = pdb.map(lambda t: len(t[1].bond_atom_list)/2).reduce(lambda a,b: a + b)

which yeilds: 126575357.0 which should mean that un-encoded it would amount to be:
126575357.0*8/8/1000/1000 = 126.6mb
.126/9.8 = 0.012 so it would be an increase of 1% on the database size about?

I don't really mind either way, but if we're not going to encode bondAromaticityList it doesn't really make sense to encode bondOrderList which i think would have a similar impact on database size.

but I would expect the vast majority of bonds in bondAromaticityList to be peptide bonds between amino acid groups so i think run-length encoding would probably cut that number down a lot

danpf · 2018-09-14T23:04:45Z

Also, I was just thinking... Maybe we shouldn't call it bondAromaticityList but instead bondResonanceList Since aromaticity implies a cyclic functional group. then all peptide bonds could be represented as a resonating single bond.

danpf · 2018-09-14T23:41:20Z

I made some changes to fit what was discussed. let me know your thoughts!

arose · 2018-10-13T20:55:35Z

but I would expect the vast majority of bonds in bondAromaticityList to be peptide bonds between amino acid groups so i think run-length encoding would probably cut that number down a lot

right, ok then

danpf added 14 commits August 15, 2018 06:23

Fix wording of bond optionality

36bf123

initial commit

93ed439

aromaticity explanation

daaa0e0

spec updated

5a9e40c

make required more often

2b336b6

Always add bonds

0a77afb

Stronger wording for bonds

3160484

Done with bondAromaticity

b314ef9

added decode type to spec

923bcbe

fixing ( )

bc4d84d

Another try at links

83faaca

fixing link again

16f9c71

Add required column for GrouData

fd873e3

add specify Group data

9182651

speleo3 reviewed Aug 31, 2018

View reviewed changes

danpf added 8 commits August 31, 2018 08:09

Fixed some comments

751be0c

Added PHE to example

bbb392b

single to double quotes

091cfbf

Missing brace

8313ea6

added example

d3f1ff6

another example

b164cfd

Specify bond orders

63a0af6

more specific

c3b1485

pwrose reviewed Aug 31, 2018

View reviewed changes

danpf added 2 commits August 31, 2018 11:44

Fix bad aromaticity/bondOrder

8bcce4c

more backticks

9eff21d

arose self-requested a review August 31, 2018 23:50

Aromaticity->resonance + review fixes

18eed75

arose merged commit 913a092 into rcsb:master Oct 13, 2018

gtauriello mentioned this pull request Oct 22, 2018

Clarify if bondAtomList and bondOrderList are optional for groups too #33

Closed

danpf mentioned this pull request Oct 22, 2018

Bond order and aromatics/resonance #34

Closed

Conversation

danpf commented Aug 31, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gtauriello commented Aug 31, 2018

Uh oh!

danpf commented Aug 31, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gtauriello commented Sep 3, 2018

Uh oh!

speleo3 commented Sep 3, 2018

Uh oh!

josemduarte commented Sep 3, 2018

Uh oh!

arose commented Sep 4, 2018

Uh oh!

danpf commented Sep 14, 2018

Uh oh!

danpf commented Sep 14, 2018

Uh oh!

danpf commented Sep 14, 2018

Uh oh!

arose commented Oct 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants