Tool that converts XML shadow config file to YAML.#803
Tool that converts XML shadow config file to YAML.#803sporksmith merged 36 commits intoshadow:devfrom
Conversation
|
@robgjansen @sporksmith should I keep the compatibility with Python2 ? |
|
No, I think we've already decided to no longer support python2, so as long as it supports python3 I think that's OK. |
|
Does the python2 linter test add value, even though we don't require it to pass? If not, perhaps it's time to remove it. What do you think @sporksmith ? |
Dropping it SGTM :) |
|
Great ! About the code, the output is not exactly like in your example @robgjansen , the format is like the yaml package provided. Would you want to improve it ? And if yes, how ? |
d8a919c to
73df065
Compare
robgjansen
left a comment
There was a problem hiding this comment.
We should also get @sporksmith's comments here, since I think he has more experience with YAML than me and probably also a stronger opinion about how we ought to format things to improve usability.
| process: | ||
| - arguments: tgen.server.graphml.xml | ||
| plugin: tgen | ||
| starttime: '1' | ||
| - id: client | ||
| process: | ||
| - arguments: tgen.client.graphml.xml | ||
| plugin: tgen | ||
| starttime: '2' | ||
| option: | ||
| stoptime: '3600' |
There was a problem hiding this comment.
This is sort of backwards from what I would expect. The numbers are given in quotes, and the strings are not. Is that a standard YAML thing?
There was a problem hiding this comment.
Bare (unquoted) strings are allowed in YAML, but I think can't begin with digits. It looks like the conversion is forcing everything to be a string, and leaving off the quotes where they aren't needed.
I agree it'd be better for it to make the integer-valued fields integers instead of forcing them to be strings.
There was a problem hiding this comment.
I fixed it. I'm not satisfied with the conversion function. Do you have a better way ?
| - graphml: "<?xml version=\"1.0\" encoding=\"utf-8\"?><graphml xmlns=\"http://graphml.graphdrawing.org/xmlns\"\ | ||
| \ xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"\ | ||
| http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd\"\ | ||
| >\n <key attr.name=\"packetloss\" attr.type=\"double\" for=\"edge\" id=\"d6\"\ | ||
| \ />\n <key attr.name=\"latency\" attr.type=\"double\" for=\"edge\" id=\"d5\"\ | ||
| \ />\n <key attr.name=\"packetloss\" attr.type=\"double\" for=\"node\" id=\"\ | ||
| d4\" />\n <key attr.name=\"countrycode\" attr.type=\"string\" for=\"node\" id=\"\ | ||
| d3\" />\n <key attr.name=\"bandwidthdown\" attr.type=\"int\" for=\"node\" id=\"\ | ||
| d2\" />\n <key attr.name=\"bandwidthup\" attr.type=\"int\" for=\"node\" id=\"\ | ||
| d1\" />\n <key attr.name=\"ip\" attr.type=\"string\" for=\"node\" id=\"d0\" />\n\ | ||
| \ <graph edgedefault=\"undirected\">\n <node id=\"isp\">\n <data key=\"\ | ||
| d0\">0.0.0.0</data>\n <data key=\"d1\">2251</data>\n <data key=\"d2\"\ | ||
| >17038</data>\n <data key=\"d3\">US</data>\n <data key=\"d4\">0.0</data>\n\ | ||
| \ </node>\n <edge source=\"isp\" target=\"isp\">\n <data key=\"d5\"\ | ||
| >50.0</data>\n <data key=\"d6\">0.01</data>\n </edge>\n </graph>\n</graphml>\n" |
There was a problem hiding this comment.
This is MUCH harder to read than it was before. It's basically unreadable. Is there a way to automatically format this so it is pretty-printed?
There was a problem hiding this comment.
+1. In addition to keeping or improving the internal formatting of the XML from the source, I think we can get rid of the \ns and escaped quotes by using a multiline string. https://yaml-multiline.info/
There was a problem hiding this comment.
Good advice, it's much better now ! Thank you
src/tools/convert_xml_to_yaml.py
Outdated
| if len(sys.argv) != 3: | ||
| print(f'Usage: {sys.argv[0]} <xml_filename> <yaml_filename>', file=sys.stderr) | ||
| exit(1) | ||
|
|
||
| xml_filename = sys.argv[1] | ||
| yaml_filename = sys.argv[2] |
There was a problem hiding this comment.
Would it be useful to use the argparse package here to make it easier for people to use to convert their older files? Or is that a waste of time? (I find it useful whenever tools have help menus, which you get almost for free with argparse.)
sporksmith
left a comment
There was a problem hiding this comment.
I think we'd discussed creating a YAML->XML conversion tool first, so that the config can be written and maintained in YAML and then converted to shadow's current format "just in time"?
This tool could also be useful for rewriting old configs, but I think that might be getting ahead of ourselves a bit.
| process: | ||
| - arguments: tgen.server.graphml.xml | ||
| plugin: tgen | ||
| starttime: '1' | ||
| - id: client | ||
| process: | ||
| - arguments: tgen.client.graphml.xml | ||
| plugin: tgen | ||
| starttime: '2' | ||
| option: | ||
| stoptime: '3600' |
There was a problem hiding this comment.
Bare (unquoted) strings are allowed in YAML, but I think can't begin with digits. It looks like the conversion is forcing everything to be a string, and leaving off the quotes where they aren't needed.
I agree it'd be better for it to make the integer-valued fields integers instead of forcing them to be strings.
| - graphml: "<?xml version=\"1.0\" encoding=\"utf-8\"?><graphml xmlns=\"http://graphml.graphdrawing.org/xmlns\"\ | ||
| \ xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"\ | ||
| http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd\"\ | ||
| >\n <key attr.name=\"packetloss\" attr.type=\"double\" for=\"edge\" id=\"d6\"\ | ||
| \ />\n <key attr.name=\"latency\" attr.type=\"double\" for=\"edge\" id=\"d5\"\ | ||
| \ />\n <key attr.name=\"packetloss\" attr.type=\"double\" for=\"node\" id=\"\ | ||
| d4\" />\n <key attr.name=\"countrycode\" attr.type=\"string\" for=\"node\" id=\"\ | ||
| d3\" />\n <key attr.name=\"bandwidthdown\" attr.type=\"int\" for=\"node\" id=\"\ | ||
| d2\" />\n <key attr.name=\"bandwidthup\" attr.type=\"int\" for=\"node\" id=\"\ | ||
| d1\" />\n <key attr.name=\"ip\" attr.type=\"string\" for=\"node\" id=\"d0\" />\n\ | ||
| \ <graph edgedefault=\"undirected\">\n <node id=\"isp\">\n <data key=\"\ | ||
| d0\">0.0.0.0</data>\n <data key=\"d1\">2251</data>\n <data key=\"d2\"\ | ||
| >17038</data>\n <data key=\"d3\">US</data>\n <data key=\"d4\">0.0</data>\n\ | ||
| \ </node>\n <edge source=\"isp\" target=\"isp\">\n <data key=\"d5\"\ | ||
| >50.0</data>\n <data key=\"d6\">0.01</data>\n </edge>\n </graph>\n</graphml>\n" |
There was a problem hiding this comment.
+1. In addition to keeping or improving the internal formatting of the XML from the source, I think we can get rid of the \ns and escaped quotes by using a multiline string. https://yaml-multiline.info/
In the same tool or another one ? |
There was a problem hiding this comment.
Sorry for the delay. Here is some more guidance. I hope this isn't too specific, but I wanted to give you a better idea of what I was thinking to help you make progress! :)
I think I would make both the YAML->XML and XML->YAML part of the same tool. You can use sub-commands for this in argparse using the add_subparsers functions.
There would be two sub-parsers for the two sub-commands: yaml2xml and xml2yaml, each would contain 1 positional argument, the input filename, and 1 optional argument, the output filename. You would invoke the tool like this:
convert yaml2xml shadow.config.yaml
convert xml2yaml shadow.config.xml
By default, the tool would do the requested conversion, strip off the filename suffix and append the the suffix corresponding to the new format. So this:
convert yaml2xml shadow.config.yaml
would write a file called shadow.config.xml.
But if the optional output filename was given:
convert yaml2xml shadow.config.yaml --output myconf.xml
then the tool would just write the file to myconf.xml instead of doing the suffix conversion.
Some things to consider:
- Be sure to check for errors, so that you gracefully handle the cases where the tool in invoked with a sub-command and file of the incorrect format (i.e.,
convert yaml2xml shadow.config.xmlorconvert xml2yaml shadow.config.yaml). - You should test some real
convert yaml2xml shadow.config.yamlcases to make sure that theshadow.config.xmlfile that is produced actually works in Shadow. (Run a Shadow experiment with the XML config file and make sure it works correctly.)
|
I'm assuming this isn't ready yet since it's missing the |
|
Yes ! I will ask you for a review when this will be done :) |
sporksmith
left a comment
There was a problem hiding this comment.
Submitting a place-holder review so I can get a clear signal when it's ready again. When it is, please use the 're-request' review button :)
|
@robgjansen @sporksmith could you please give me a feedback on the example files ? I don't want to miss something before cleaning the code and re requesting a review. |
sporksmith
left a comment
There was a problem hiding this comment.
Only reviewed the example xml and yaml files
| - arguments: tgen.client.graphml.xml | ||
| plugin: tgen | ||
| starttime: 2 | ||
| option: |
There was a problem hiding this comment.
Maybe call this 'options' in case we add more
There was a problem hiding this comment.
Same with 'host' and 'plugin' ?
| starttime: 2 | ||
| option: | ||
| stoptime: 3600 | ||
| plugin: |
There was a problem hiding this comment.
Would be nice if we could match the typical order we use in xml files. In particular would be good to have the topology and the plugin definitions before the hosts, since those are both referenced from within the host process objects.
I think either hard-coding an order matching our "typical" order in the examples, or preserving the order in whatever source xml file being translated would be fine; whichever is easier.
There was a problem hiding this comment.
Should be better :)
sporksmith
left a comment
There was a problem hiding this comment.
Thanks, Florian! The consistent ordering is definitely an improvement.
| </edge> | ||
| </graph> | ||
| </graphml> | ||
| plugin: |
There was a problem hiding this comment.
It's a little odd that we have list values with singular keys. e.g. plugin instead of plugins, host instead of hosts. The singular form makes sense in xml, where it's referring to a single item in that list, but is a little confusing in yaml, where it's referring to the whole list.
At the risk of making the mapping between xml and yaml a little less consistent, I'd lean towards remapping to/from the the singular/plural forms.
@robgjansen wdyt?
@florian-vuillemot how difficult would this be to do?
There was a problem hiding this comment.
I agree with @sporksmith: plurals for the yaml lists makes more sense to me too.
There was a problem hiding this comment.
Hum 3 choices:
- Hard code each key and the plural
- Add an 's' to each key automatically
- Use both methods if there is a key which plural is different from adding a simple 's' at the end (process => processes)
What do you prefer ?
There was a problem hiding this comment.
For the sake of simplicity and consistency, I'd lean towards just using a hard-coded explicit mapping for all of these; i.e. the first option.
There was a problem hiding this comment.
Where could I find all keys to map ?
There was a problem hiding this comment.
I think you got them all:
- plugin -> plugins
- host -> hosts
- process -> processes
Also, "node" was renamed to "host", and "application" was renamed to "process", but we still technically accept "node" and "application" as valid inputs. So I suppose you should add these too:
- node -> nodes
- application -> applications
There was a problem hiding this comment.
And if you're curious, the parsing code starting at root level is here:
shadow/src/main/core/support/configuration.c
Lines 733 to 754 in 9f7cd41
There was a problem hiding this comment.
I could also do:
node -> hosts
application -> processes
This could help to remove old input.
There was a problem hiding this comment.
Sure, OK. But then you would have to spell out an explicit reverse mapping from YAML -> XML to make sure that "hosts" in YAML maps to "host" in XML (and not "node" in XML). So I think you will have to actually type out the YAML_TO_XML table rather than doing this:
YAML_TO_XML = {v: k for k, v in XML_TO_YAML.items()}
|
If the examples files look good, could you please start to review the code ? :) |
|
Note that @sporksmith will review this but probably won't be able to until next week. |
sporksmith
left a comment
There was a problem hiding this comment.
Looking good! Could you add a test? Just validating that your example files get converted to the expected result is probably sufficient.
I'd put the test in the same directory as the tool; you might need to tweak some CMakeLists.txt files for it to get picked up in the automated testing.
|
6a8d8ac to
dca6326
Compare
robgjansen
left a comment
There was a problem hiding this comment.
This is great! Thanks for sticking with us and getting this passing on all of our supported platforms! 😄
#773
Example files included are here to simplify discussion on the script result