Skip to content

Tool that converts XML shadow config file to YAML.#803

Merged
sporksmith merged 36 commits intoshadow:devfrom
florian-vuillemot:feature/convert_xml_file_to_yaml
Jul 10, 2020
Merged

Tool that converts XML shadow config file to YAML.#803
sporksmith merged 36 commits intoshadow:devfrom
florian-vuillemot:feature/convert_xml_file_to_yaml

Conversation

@florian-vuillemot
Copy link
Copy Markdown
Contributor

#773
Example files included are here to simplify discussion on the script result

@florian-vuillemot
Copy link
Copy Markdown
Contributor Author

@robgjansen @sporksmith should I keep the compatibility with Python2 ?

@robgjansen
Copy link
Copy Markdown
Member

No, I think we've already decided to no longer support python2, so as long as it supports python3 I think that's OK.

@robgjansen
Copy link
Copy Markdown
Member

Does the python2 linter test add value, even though we don't require it to pass? If not, perhaps it's time to remove it. What do you think @sporksmith ?

@sporksmith
Copy link
Copy Markdown
Contributor

Does the python2 linter test add value, even though we don't require it to pass? If not, perhaps it's time to remove it. What do you think @sporksmith ?

Dropping it SGTM :)

@florian-vuillemot
Copy link
Copy Markdown
Contributor Author

Great ! About the code, the output is not exactly like in your example @robgjansen , the format is like the yaml package provided. Would you want to improve it ? And if yes, how ?

@florian-vuillemot florian-vuillemot force-pushed the feature/convert_xml_file_to_yaml branch from d8a919c to 73df065 Compare May 25, 2020 16:56
Copy link
Copy Markdown
Member

@robgjansen robgjansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also get @sporksmith's comments here, since I think he has more experience with YAML than me and probably also a stronger opinion about how we ought to format things to improve usability.

Comment on lines +3 to +13
process:
- arguments: tgen.server.graphml.xml
plugin: tgen
starttime: '1'
- id: client
process:
- arguments: tgen.client.graphml.xml
plugin: tgen
starttime: '2'
option:
stoptime: '3600'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of backwards from what I would expect. The numbers are given in quotes, and the strings are not. Is that a standard YAML thing?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare (unquoted) strings are allowed in YAML, but I think can't begin with digits. It looks like the conversion is forcing everything to be a string, and leaving off the quotes where they aren't needed.

I agree it'd be better for it to make the integer-valued fields integers instead of forcing them to be strings.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it. I'm not satisfied with the conversion function. Do you have a better way ?

Comment on lines +18 to +32
- graphml: "<?xml version=\"1.0\" encoding=\"utf-8\"?><graphml xmlns=\"http://graphml.graphdrawing.org/xmlns\"\
\ xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"\
http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd\"\
>\n <key attr.name=\"packetloss\" attr.type=\"double\" for=\"edge\" id=\"d6\"\
\ />\n <key attr.name=\"latency\" attr.type=\"double\" for=\"edge\" id=\"d5\"\
\ />\n <key attr.name=\"packetloss\" attr.type=\"double\" for=\"node\" id=\"\
d4\" />\n <key attr.name=\"countrycode\" attr.type=\"string\" for=\"node\" id=\"\
d3\" />\n <key attr.name=\"bandwidthdown\" attr.type=\"int\" for=\"node\" id=\"\
d2\" />\n <key attr.name=\"bandwidthup\" attr.type=\"int\" for=\"node\" id=\"\
d1\" />\n <key attr.name=\"ip\" attr.type=\"string\" for=\"node\" id=\"d0\" />\n\
\ <graph edgedefault=\"undirected\">\n <node id=\"isp\">\n <data key=\"\
d0\">0.0.0.0</data>\n <data key=\"d1\">2251</data>\n <data key=\"d2\"\
>17038</data>\n <data key=\"d3\">US</data>\n <data key=\"d4\">0.0</data>\n\
\ </node>\n <edge source=\"isp\" target=\"isp\">\n <data key=\"d5\"\
>50.0</data>\n <data key=\"d6\">0.01</data>\n </edge>\n </graph>\n</graphml>\n"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is MUCH harder to read than it was before. It's basically unreadable. Is there a way to automatically format this so it is pretty-printed?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. In addition to keeping or improving the internal formatting of the XML from the source, I think we can get rid of the \ns and escaped quotes by using a multiline string. https://yaml-multiline.info/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good advice, it's much better now ! Thank you

Comment on lines +60 to +65
if len(sys.argv) != 3:
print(f'Usage: {sys.argv[0]} <xml_filename> <yaml_filename>', file=sys.stderr)
exit(1)

xml_filename = sys.argv[1]
yaml_filename = sys.argv[2]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to use the argparse package here to make it easier for people to use to convert their older files? Or is that a waste of time? (I find it useful whenever tools have help menus, which you get almost for free with argparse.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

@sporksmith sporksmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd discussed creating a YAML->XML conversion tool first, so that the config can be written and maintained in YAML and then converted to shadow's current format "just in time"?

This tool could also be useful for rewriting old configs, but I think that might be getting ahead of ourselves a bit.

Comment on lines +3 to +13
process:
- arguments: tgen.server.graphml.xml
plugin: tgen
starttime: '1'
- id: client
process:
- arguments: tgen.client.graphml.xml
plugin: tgen
starttime: '2'
option:
stoptime: '3600'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare (unquoted) strings are allowed in YAML, but I think can't begin with digits. It looks like the conversion is forcing everything to be a string, and leaving off the quotes where they aren't needed.

I agree it'd be better for it to make the integer-valued fields integers instead of forcing them to be strings.

Comment on lines +18 to +32
- graphml: "<?xml version=\"1.0\" encoding=\"utf-8\"?><graphml xmlns=\"http://graphml.graphdrawing.org/xmlns\"\
\ xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"\
http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd\"\
>\n <key attr.name=\"packetloss\" attr.type=\"double\" for=\"edge\" id=\"d6\"\
\ />\n <key attr.name=\"latency\" attr.type=\"double\" for=\"edge\" id=\"d5\"\
\ />\n <key attr.name=\"packetloss\" attr.type=\"double\" for=\"node\" id=\"\
d4\" />\n <key attr.name=\"countrycode\" attr.type=\"string\" for=\"node\" id=\"\
d3\" />\n <key attr.name=\"bandwidthdown\" attr.type=\"int\" for=\"node\" id=\"\
d2\" />\n <key attr.name=\"bandwidthup\" attr.type=\"int\" for=\"node\" id=\"\
d1\" />\n <key attr.name=\"ip\" attr.type=\"string\" for=\"node\" id=\"d0\" />\n\
\ <graph edgedefault=\"undirected\">\n <node id=\"isp\">\n <data key=\"\
d0\">0.0.0.0</data>\n <data key=\"d1\">2251</data>\n <data key=\"d2\"\
>17038</data>\n <data key=\"d3\">US</data>\n <data key=\"d4\">0.0</data>\n\
\ </node>\n <edge source=\"isp\" target=\"isp\">\n <data key=\"d5\"\
>50.0</data>\n <data key=\"d6\">0.01</data>\n </edge>\n </graph>\n</graphml>\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. In addition to keeping or improving the internal formatting of the XML from the source, I think we can get rid of the \ns and escaped quotes by using a multiline string. https://yaml-multiline.info/

@florian-vuillemot
Copy link
Copy Markdown
Contributor Author

I think we'd discussed creating a YAML->XML conversion tool first, so that the config can be written and maintained in YAML and then converted to shadow's current format "just in time"?

This tool could also be useful for rewriting old configs, but I think that might be getting ahead of ourselves a bit.

In the same tool or another one ?

Copy link
Copy Markdown
Member

@robgjansen robgjansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Here is some more guidance. I hope this isn't too specific, but I wanted to give you a better idea of what I was thinking to help you make progress! :)


I think I would make both the YAML->XML and XML->YAML part of the same tool. You can use sub-commands for this in argparse using the add_subparsers functions.

There would be two sub-parsers for the two sub-commands: yaml2xml and xml2yaml, each would contain 1 positional argument, the input filename, and 1 optional argument, the output filename. You would invoke the tool like this:

convert yaml2xml shadow.config.yaml
convert xml2yaml shadow.config.xml

By default, the tool would do the requested conversion, strip off the filename suffix and append the the suffix corresponding to the new format. So this:

convert yaml2xml shadow.config.yaml

would write a file called shadow.config.xml.

But if the optional output filename was given:

convert yaml2xml shadow.config.yaml --output myconf.xml

then the tool would just write the file to myconf.xml instead of doing the suffix conversion.

Some things to consider:

  • Be sure to check for errors, so that you gracefully handle the cases where the tool in invoked with a sub-command and file of the incorrect format (i.e., convert yaml2xml shadow.config.xml or convert xml2yaml shadow.config.yaml).
  • You should test some real convert yaml2xml shadow.config.yaml cases to make sure that the shadow.config.xml file that is produced actually works in Shadow. (Run a Shadow experiment with the XML config file and make sure it works correctly.)

@robgjansen robgjansen added Component: Tools Peripheral tools like parsing log files or visualizing results Status: In Progress Work is currently underway Type: Enhancement New functionality or improved design labels Jun 4, 2020
@robgjansen
Copy link
Copy Markdown
Member

I'm assuming this isn't ready yet since it's missing the yaml2xml mode that @sporksmith mentioned should be done first. Let me know if that is incorrect.

@florian-vuillemot
Copy link
Copy Markdown
Contributor Author

Yes ! I will ask you for a review when this will be done :)

Copy link
Copy Markdown
Contributor

@sporksmith sporksmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting a place-holder review so I can get a clear signal when it's ready again. When it is, please use the 're-request' review button :)

@florian-vuillemot
Copy link
Copy Markdown
Contributor Author

@robgjansen @sporksmith could you please give me a feedback on the example files ? I don't want to miss something before cleaning the code and re requesting a review.

Copy link
Copy Markdown
Contributor

@sporksmith sporksmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed the example xml and yaml files

- arguments: tgen.client.graphml.xml
plugin: tgen
starttime: 2
option:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this 'options' in case we add more

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with 'host' and 'plugin' ?

starttime: 2
option:
stoptime: 3600
plugin:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if we could match the typical order we use in xml files. In particular would be good to have the topology and the plugin definitions before the hosts, since those are both referenced from within the host process objects.

I think either hard-coding an order matching our "typical" order in the examples, or preserving the order in whatever source xml file being translated would be fine; whichever is easier.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be better :)

Copy link
Copy Markdown
Contributor

@sporksmith sporksmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Florian! The consistent ordering is definitely an improvement.

</edge>
</graph>
</graphml>
plugin:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little odd that we have list values with singular keys. e.g. plugin instead of plugins, host instead of hosts. The singular form makes sense in xml, where it's referring to a single item in that list, but is a little confusing in yaml, where it's referring to the whole list.

At the risk of making the mapping between xml and yaml a little less consistent, I'd lean towards remapping to/from the the singular/plural forms.

@robgjansen wdyt?
@florian-vuillemot how difficult would this be to do?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @sporksmith: plurals for the yaml lists makes more sense to me too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum 3 choices:

  • Hard code each key and the plural
  • Add an 's' to each key automatically
  • Use both methods if there is a key which plural is different from adding a simple 's' at the end (process => processes)

What do you prefer ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of simplicity and consistency, I'd lean towards just using a hard-coded explicit mapping for all of these; i.e. the first option.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where could I find all keys to map ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you got them all:

  • plugin -> plugins
  • host -> hosts
  • process -> processes

Also, "node" was renamed to "host", and "application" was renamed to "process", but we still technically accept "node" and "application" as valid inputs. So I suppose you should add these too:

  • node -> nodes
  • application -> applications

Copy link
Copy Markdown
Member

@robgjansen robgjansen Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if you're curious, the parsing code starting at root level is here:

/* check for root-level elements */
if (!g_ascii_strcasecmp(elementName, "host") || !g_ascii_strcasecmp(elementName, "node")) {
/* handle the attributes directly */
*error = _parser_handleHostAttributes(parser, attributeNames, attributeValues);
/* handle internal content in a sub parser */
g_markup_parse_context_push(context, &(parser->xmlHostParser), parser);
} else if (!g_ascii_strcasecmp(elementName, "plugin")) {
*error = _parser_handlePluginAttributes(parser, attributeNames, attributeValues);
} else if (!g_ascii_strcasecmp(elementName, "kill")) {
*error = _parser_handleKillAttributes(parser, attributeNames, attributeValues);
} else if (!g_ascii_strcasecmp(elementName, "topology")) {
/* handle the attributes directly */
*error = _parser_handleTopologyAttributes(parser, attributeNames, attributeValues);
/* handle content text in a sub parser */
g_markup_parse_context_push(context, &(parser->xmlTopologyParser), parser);
} else if (!g_ascii_strcasecmp(elementName, "shadow")) {
*error = _parser_handleShadowAttributes(parser, attributeNames, attributeValues);
} else {
*error = g_error_new(G_MARKUP_ERROR, G_MARKUP_ERROR_UNKNOWN_ELEMENT,
"unknown 'root' child starting element '%s'", elementName);
}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also do:
node -> hosts
application -> processes
This could help to remove old input.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, OK. But then you would have to spell out an explicit reverse mapping from YAML -> XML to make sure that "hosts" in YAML maps to "host" in XML (and not "node" in XML). So I think you will have to actually type out the YAML_TO_XML table rather than doing this:

YAML_TO_XML = {v: k for k, v in XML_TO_YAML.items()}

@florian-vuillemot
Copy link
Copy Markdown
Contributor Author

If the examples files look good, could you please start to review the code ? :)

@robgjansen
Copy link
Copy Markdown
Member

Note that @sporksmith will review this but probably won't be able to until next week.

@robgjansen robgjansen removed their request for review June 19, 2020 12:39
Copy link
Copy Markdown
Contributor

@sporksmith sporksmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Could you add a test? Just validating that your example files get converted to the expected result is probably sufficient.

I'd put the test in the same directory as the tool; you might need to tweak some CMakeLists.txt files for it to get picked up in the automated testing.

@florian-vuillemot
Copy link
Copy Markdown
Contributor Author

Looking good! Could you add a test? Just validating that your example files get converted to the expected result is probably sufficient.

I'd put the test in the same directory as the tool; you might need to tweak some CMakeLists.txt files for it to get picked up in the automated testing.

@florian-vuillemot florian-vuillemot force-pushed the feature/convert_xml_file_to_yaml branch from 6a8d8ac to dca6326 Compare July 10, 2020 14:58
Copy link
Copy Markdown
Member

@robgjansen robgjansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks for sticking with us and getting this passing on all of our supported platforms! 😄

@sporksmith sporksmith merged commit e950797 into shadow:dev Jul 10, 2020
@florian-vuillemot florian-vuillemot deleted the feature/convert_xml_file_to_yaml branch July 10, 2020 22:45
@robgjansen robgjansen mentioned this pull request Jul 10, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Tools Peripheral tools like parsing log files or visualizing results Status: In Progress Work is currently underway Type: Enhancement New functionality or improved design

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants