Skip to content

Add extra fields#36

Merged
arose merged 15 commits intorcsb:masterfrom
danpf:danpf/extraFields
Oct 22, 2018
Merged

Add extra fields#36
arose merged 15 commits intorcsb:masterfrom
danpf:danpf/extraFields

Conversation

@danpf
Copy link
Contributor

@danpf danpf commented Aug 31, 2018

This PR gives applications/users places to store their own data that's associate with their mmtf files.

Up for discussion:
Any Convention key:value pairs that you would like to see added.
Any Convention encode types you'd like to add
Do you like the format?

Other conventions to think about adding:

  • specific colors for different representations

If you are interested there is already a working implementation of this in:
https://github.com/danpf/mmtf-cpp/tree/rcsb_extraData

This is meant to clear up issue:
#32


*Type*: [Map](#types)

*Description*: A field meant to store information on specific bonds. Must have the same length as [numBonds](#numbonds).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it must be of same length as numBonds then it can be an array instead of a map, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I will think of a way to rephrase this, the idea is that you have keys with the values being arrays with length numBonds if that makes sense!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, a map where the keys are integers in [0, numBonds - 1].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note, using maps is more flexible but does not easily allow for the bespoke array compression techniques.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be like:

for something with 5 atoms
atomProperties: {
  "colorList": ["#ff0000", "#008000", "#ff0000", "#008000", "#ff0000"],
  "representationList": [0, 0, 0, 1, 1]
}

extraProperties: {
  "availableRepresentations": {0: "lines", 1: "spheres", 2: "surface", 3: "ball and stick", 4: "cartoon"}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

availableRepresentations in this case doesn't have to be 5, while colorList and representationList do have to have len(5)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arose You can also find a bigger example in the original discussion thread

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the examples it looks to me the type should be Array (or List) and not Map.

Copy link
Contributor Author

@danpf danpf Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense now @arose ?

spec.md Outdated

*Description*: A field meant to store information on specific bonds. Must have the same length as [numBonds](#numbonds).

*Convention key:value pairs*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A description of what is meant exactly by 'convention' would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also rephrase this, I think I will change this to reserved key:value pairs and add a little explanation at the of the extra data header.

The idea was to have some key: value pairs locked so that you could save things like camera positions/color schemes between applications. (ie everyone use cameraOrientation not app1_camera_orientation and app2_camera_orientation).


*Type*: [Map](#types)

*Description*: A field meant to store information on specific bonds. Must have the same length as [numBonds](#numbonds).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arose You can also find a bigger example in the original discussion thread

spec.md Outdated
*Type*: [Map](#types)

*Description*: A field meant to store any information at all. There are no length restrictions associated with this field. We
encourage you to apply our encoding techniques to your application data to reduce file sizes!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to move this encouragement to the parent section, since it makes a lot of sense to use encoding techniques for any of the other extra data fields too (especially large ones like bondProperties, atomProperties and groupProperties).

@arose
Copy link
Contributor

arose commented Oct 13, 2018

I like it, I have one more request for the spec. I think the type of the 'second level' of the maps should be defined more clearly, i.e. Map<String, SECOND_LEVEL_TYPE>.

So atomProperties, bondProperties, groupProperties, chainProperties and modelProperties are maps of type Map<String, Array>. And the array must be of the same length as the corresponding num* filed.

And extraProperties is a Map<String, String|Float|Integer|Map|Array|Binary>

How about adding a note about more specific Map types to the types section and then define the map type exactly for all the *Properties fields?

@gtauriello
Copy link

I like the idea of specifying the second level type to be some type of list but I would have it as Array|Binary. Meaning that I wouldn't want to enforce in the specs whether extra fields should be encoded as binary objects or not.

@danpf
Copy link
Contributor Author

danpf commented Oct 15, 2018

@arose @gtauriello I think I addressed both of your concerns. Let me know if that is what you had in mind!

@gtauriello
Copy link

gtauriello commented Oct 15, 2018

Looks verbose but very clear. I did observe some issues when looking at the full use case example though (see comments below).

spec.md Outdated
...
"structureProperties": {
"foo_id": "ABC",
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an inconsistency with the specs here. We don't have a structureProperties field. I guess that properties at the structure level should go into extraProperties?

At least, I don't see the use case for having a structureProperties field vs adding structure-level properties in extraProperties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops good point. Yes I'm ok with just wrapping it all into extraProperties. it would be sort of a useless field.

spec.md Outdated

Example

```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visual detail: I think you can use ```python for a nice syntax highlighting here. I assume you removed the json highlighting since comments and ... would have looked bad there...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I will try that!

},
"extraProperties": {
"pymol_bondTypes": {0: "metal", 1: "single", 2: "double", 3: "triple", 4: "aromatic"}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note on type restrictions for extraProperties: the example here doesn't fit since the key is an Integer and not a String. So we should either drop any second level type restriction on extraProperties or change the example. I vote for the former...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the spec of Map<String, String|Float|Integer|Map|Array|Binary> sort of implies 'anything goes' but I can add a note to mention that there are no restrictions on the second level of extraProperties

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops you are right. I was looking at the third level here. Please ignore my comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh well i added an example anyway! I think it is ok to be extra verbose for now and then once we have some experience working with this we can make some changes.

@arose
Copy link
Contributor

arose commented Oct 17, 2018

lgtm, thanks @danpf! @gtauriello ok too? If so I will merge it.

@arose arose merged commit 1945970 into rcsb:master Oct 22, 2018
gtauriello pushed a commit to rcsb/mmtf-cpp that referenced this pull request Feb 18, 2019
* New fields as in rcsb/mmtf#36
* msgpack zone kept in StructureData to deal with extra data
* Updated MapDecoder to init from maps as well
* Extended tests and copy, =, ==, != functions for StructureData
gtauriello pushed a commit to gtauriello/mmtf-cpp that referenced this pull request Mar 1, 2019
* New fields as in rcsb/mmtf#36
* msgpack zone kept in StructureData to deal with extra data
* Updated MapDecoder to init from maps as well
* Extended tests and copy, =, ==, != functions for StructureData
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants