Skip to content

Add tracing tools repo#748

Merged
wjwwood merged 1 commit intoros2:masterfrom
christophebedard:add-tracing-tools
Oct 16, 2019
Merged

Add tracing tools repo#748
wjwwood merged 1 commit intoros2:masterfrom
christophebedard:add-tracing-tools

Conversation

@christophebedard
Copy link
Copy Markdown
Member

@christophebedard christophebedard commented Jul 24, 2019

Signed-off-by: Christophe Bedard fixed-term.christophe.bourquebedard@de.bosch.com
Signed-off-by: Christophe Bedard bedard.christophe@gmail.com
Signed-off-by: Christophe Bedard christophe.bedard@apex.ai

Related to ros2/rcl#473
Related to ros2/rclcpp#789

With instrumentation, the tracetools package becomes a core dependency.

cc @iluetkeb

@Karsten1987
Copy link
Copy Markdown
Contributor

Are there any special flags which should be passed to cmake for the tracingtools? I've seen on the gitlab page some instructions of -DWITH_LTTNG=ON.

@iluetkeb
Copy link
Copy Markdown

iluetkeb commented Jul 25, 2019

That flag was for the ROS 1 version of tracetools. In this version, tracing is automatically enabled if LTTng is found by CMake. In the future it will be enabled automatically if LTTng is found. We have a patch for that, but it's not merged yet.

Anyway, what this means is that the tracepoints in rcl and rclcpp can call tracetools, but nothing happens. Then, somebody who wants to trace just installs LTTng, and then only has to recompile tracetools in the local workspace (as opposed to having to recompile everything).

In the future, we would like to discuss whether tracing could always be enabled (tracing is purposefully very little overhead), but that's a matter for latter.

Copy link
Copy Markdown
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past we've tried to avoid dependencies like this which are not widely used projects (unlike googletest or poco or uncrustify) and that we don't have control over. It's not for any reason like "not built here", but we just want to minimize issues when doing releases and to minimize things like nightly CI breaking due to changes in other projects.

We have exceptions for both cases, like Fast-RTPS (which we've had to fork at times in order to get a release done) and the fact that we use the latest version of some Python things via pip (which is sort of like master sometimes, lol) which has broken nightly CI in the past.

Personally, I have no problem introducing this dependency, though the cautious side of me would like to see us use released versions of this package only (rather than master) since it's out of our organization.

Also, having the ability to make it optional or to disable it in rcl and/or rclcpp would be nice. Even just calling a method in a shared library (e.g. libros2_tracing.so) can be a significant performance issue in the most high performance cases. I don't know if that means we need rcl/rclcpp/rcutils/rcpputils specific macros so we can completely compile these out or not.

@iluetkeb
Copy link
Copy Markdown

Regarding breaking CI: We can use a tag or a branch instead of master. If we use a tag, that means a lot of PR's for you guys, but it's totally safe, since you know it works. If we use a branch, it's a bit more risk but not much. Thus, I would prefer using a branch. Okay?

Regarding control: What would you need to feel like you're in control? Should we put this into rcl directly (it's not big, and all dependencies are optional)? Should we have the repo in the ros2 org?

Regarding performance: None of the tracepoints are in critical places that are called very often. No inner loops or something the like. In principle, if you have a timer or the like that's extremely short, the tracepoint could add a measureable (though still not significant) overhead. In general, though, performance is a subject we could certainly look at more, if you're worried about it.

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Jul 26, 2019

If we use a tag, that means a lot of PR's for you guys, but it's totally safe, since you know it works. If we use a branch, it's a bit more risk but not much. Thus, I would prefer using a branch. Okay?

The nice thing about a pr for each time a new version comes up is that it's an opportunity for us to run CI on the new state and compare it to the old. The goal is really to minimize downtime and "known flaky or failing tests" on our "master" so that other on going work is not disrupted.

I would personally prefer using tagged versions, especially since I don't expect this dependency to be changing frequently (at least not over the long term), but that's just my opinion. Maybe others on the team would feel differently. In contrast, we put up with the downsides of using master from Fast-RTPS because it is still changing so much and we often need intermediate fixes (fixes between releases).

Alternatively, if every change to your repository had to pass our CI before being merged to master, that would be ok too. But that puts a continuous burden on you guys and taxes our CI resources as well.

Regarding control: What would you need to feel like you're in control? Should we put this into rcl directly (it's not big, and all dependencies are optional)? Should we have the repo in the ros2 org?

This is really a non-issue for me. I'm totally fine for you guys to own it and host it. The point is that we may choose to fork it or pin the version to an older release when doing our own release if we run into issues. Whereas if it were one of our packages we would likely fix the problem and author a new release right then and there. But again this is only a problem if there is an issue and you guys cannot respond to the problems quickly, neither of which I anticipate over the long run.

My opinion is that nothing needs to be done here.

Regarding performance: None of the tracepoints are in critical places that are called very often. No inner loops or something the like. In principle, if you have a timer or the like that's extremely short, the tracepoint could add a measureable (though still not significant) overhead. In general, though, performance is a subject we could certainly look at more, if you're worried about it.

I definitely don't want to do premature optimization, and I think having the call out to your library is fine in most cases, even like our official Debian packages, but at the same time I think it should be possible for someone building ROS 2 from source to completely compile out the trace lines from rcl and rclcpp (and even avoid needing to build or link your library) if they decide they want to do so.

@iluetkeb
Copy link
Copy Markdown

iluetkeb commented Jul 29, 2019

Alright, we'll tag it and update this PR.

I definitely don't want to do premature optimization, and I think having the call out to your library is fine in most cases, even like our official Debian packages, but at the same time I think it should be possible for someone building ROS 2 from source to completely compile out the trace lines from rcl and rclcpp (and even avoid needing to build or link your library) if they decide they want to do so.

Disabling this completely would be relatively trivial, since "TRACEPOINT" is already a macro, so we could add a compile time setting to the tracetools and define it as empty.

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Jul 29, 2019

Yeah, making it completely optional is probably not necessary (at least until someone asks for it), but I do want to make sure that people compiling from source can completely compile it out, which it sounds like we could do in the header of ros2_tracing.

Copy link
Copy Markdown
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lgtm, but I do think we want input from others on the @ros2/team before going ahead with it.

@nuclearsandwich
Copy link
Copy Markdown
Member

Alright, we'll tag it and update this PR.

Is this still happening? If so I'll wait to add my ✔️ until then. And it would make sense to test the new package passes CI.

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Jul 29, 2019

Alright, we'll tag it and update this PR.

Is this still happening?

Oh yeah, that would be good to do first. Obviously passing CI too.

@wjwwood wjwwood self-requested a review July 29, 2019 17:59
Copy link
Copy Markdown
Member

@nuclearsandwich nuclearsandwich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 pending green CI for this package.

Copy link
Copy Markdown
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, lgtm with CI.

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Jul 30, 2019

Not sure if we need to merge this with the above mentioned pr's at the same time or not, but here's CI just up to the tracetools*/ros2trace packages to see how it looks so far:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@christophebedard
Copy link
Copy Markdown
Member Author

The failure on Windows should be fixed by ros2_tracing!71 (decoding issue with "ü"/0xfc in setup.py files). Updated to 0.2.1 😄

@iluetkeb
Copy link
Copy Markdown

iluetkeb commented Jul 31, 2019

btw... I think you may have a problem on your Windows build farm. The "setup.py" is utf-8 and contained an utf-8 special character, which works fine on Linux, but not on Windows. Interestingly, this only breaks when reading from stdin. Usually this means that some terminal does have a utf-8 encoding when you're expecting it to be.

Christophe found an interesting stackoverflow entry related to terminal encoding, which also specifies how you can use the terminal's encoding on stdin.

@dirk-thomas
Copy link
Copy Markdown
Member

I think you may have a problem on your Windows build farm.

I don't think the problem is on the buildfarm. It is reasonable to expect the build to pass independently what locale a machine has set. The logic extracting the metadata in from the setup.py file should work independent of the default encoding. colcon/colcon-python-setup-py#19 tries to achieve that:

  • Windows Build Status
    • While the build still fails it resolves the encoding problem.
    • The remaining failure might need to be addressed in tracetools.

Therefore I would suggest to revert the changes to replace the non-ASCII characters in your repo.

@iluetkeb
Copy link
Copy Markdown

iluetkeb commented Aug 1, 2019

The remaining failure might need to be addressed in tracetools.

Looks like it. However, I'm a bit puzzled: The library is correctly created as a dynamic library ("tracetools.dll") but then the build for the status tool tries to read "tracetools.lib", which would be the name if the library were static.

Edit: This is apparently normal. The lib file is also generated for dynamic libraries, except we had not yet added visibility markers, so nothing was exported. In that case, the .lib file is not generated.

Fixed now, we'll make a new tag soon.

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Aug 1, 2019

Yeah, as you said, usually this is due to lacking dll export/import attributes on your classes and functions.

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Aug 1, 2019

New CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@iluetkeb
Copy link
Copy Markdown

iluetkeb commented Aug 1, 2019

The unicode decode error is back...

We reverted our change on @dirk-thomas 's request, but apparently the fix in colcon isn't working?

@christophebedard
Copy link
Copy Markdown
Member Author

I think it's not using the right .repos file. It's testing b9ff095/0.2.0 (separate from the encoding issue).

@iluetkeb
Copy link
Copy Markdown

iluetkeb commented Aug 1, 2019

I think it's not using the right .repos file. It's testing b9ff095/0.2.0 (separate from the encoding issue).

Where did you see that @christophebedard ?

@christophebedard
Copy link
Copy Markdown
Member Author

I think it's not using the right .repos file. It's testing b9ff095/0.2.0 (separate from the encoding issue).

Where did you see that @christophebedard ?

The job's build information:

repos_url: https://raw.githubusercontent.com/ros2/ros2/b9ff095e2bef4cadfa5cec03b7ef49d56c530e7d/ros2.repos

@dirk-thomas
Copy link
Copy Markdown
Member

I think it's not using the right .repos file.

Correct, the passed hash is still pointing to the 0.2.0 release. I will trigger another build using the latest: https://raw.githubusercontent.com/ros2/ros2/f05f0d2183608d491d9fa709dd35589871e13657/ros2.repos

but apparently the fix in colcon isn't working?

The fix has only been merged into master but not yet released in a new package version. Therefore the job needs to be triggered with explicitly using master branches from colcon (rather than the released packages):

  • Windows Build Status

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Aug 1, 2019

Sorry, that's my bad, copied the repo's file from the old test without thinking about it.

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Aug 3, 2019

Looks like the Windows job has some compiler warnings still, but otherwise it's looking ok.

@christophebedard
Copy link
Copy Markdown
Member Author

Looks like the Windows job has some compiler warnings still, but otherwise it's looking ok.

Fixed with 0.2.3!

@dirk-thomas
Copy link
Copy Markdown
Member

@christophebedard Friendly ping. Is there an updated version we should test again?

@christophebedard
Copy link
Copy Markdown
Member Author

@dirk-thomas sure, could you test this one pointing to a specific commit with a fix? https://gitlab.com/micro-ROS/ros_tracing/ros2/raw/a4a32c7b653014a69686713a4b191550669f25b4/ros2.repos

I think @iluetkeb just got back from vacation, so we may be able to create a new release sometime soon.

@dirk-thomas
Copy link
Copy Markdown
Member

dirk-thomas commented Sep 6, 2019

could you test this one pointing to a specific commit with a fix?

  • Linux Build Status (known flaky test)
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status (same test failures are present on master)

@iluetkeb
Copy link
Copy Markdown

iluetkeb commented Sep 9, 2019

So, this is looking good, hmm?

I've reviewed Christophe's merge requests, all okay there as well.

@christophebedard
Copy link
Copy Markdown
Member Author

Updated to 0.2.7.

@dirk-thomas
Copy link
Copy Markdown
Member

Updated to 0.2.7.

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@iluetkeb
Copy link
Copy Markdown

At first glance the Windows test failure do not look related to our work. Or am I missing something?

@dirk-thomas
Copy link
Copy Markdown
Member

At first glance the Windows test failure do not look related to our work. Or am I missing something?

@cottsay I will let Scott comment if all those test failures are expected.

@cottsay
Copy link
Copy Markdown
Member

cottsay commented Sep 11, 2019

I will let Scott comment if all those test failures are expected.

Looks good to me.

@christophebedard
Copy link
Copy Markdown
Member Author

@wjwwood @nuclearsandwich @cottsay @dirk-thomas I've updated the version to 0.2.8 following the changes made for ros2/rclcpp#789 and I also squashed everything.

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Oct 16, 2019

I've updated the version to 0.2.8

Here's new CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Oct 16, 2019

I think this pr needs to be rebased, since we had to update the repos file already to pin the version of fastrtps...

Signed-off-by: Christophe Bedard <fixed-term.christophe.bourquebedard@de.bosch.com>
Signed-off-by: Christophe Bedard <bedard.christophe@gmail.com>
Signed-off-by: Christophe Bedard <christophe.bedard@apex.ai>
@christophebedard
Copy link
Copy Markdown
Member Author

I think this pr needs to be rebases, since we had to update the repos file already to pin the version of fastrtps...

Done!

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Oct 16, 2019

Ok, CI again:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@christophebedard
Copy link
Copy Markdown
Member Author

Seems like test failures are not related to this 😃

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Oct 16, 2019

@cottsay can you check the latest CI and confirm those are expected failures?

@cottsay
Copy link
Copy Markdown
Member

cottsay commented Oct 16, 2019

@cottsay can you check the latest CI and confirm those are expected failures?

I don't know if I'd use the term "expected", but we're seeing all of those in nightlies 😛

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Oct 16, 2019

Ok, I guess this is good to merge then.

@wjwwood wjwwood merged commit a5269a0 into ros2:master Oct 16, 2019
@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Oct 16, 2019

I think ros2/rcl#473 and ros2/rclcpp#789 are not needed to merge this pr, if anyone thinks that's not right, let me know.

@christophebedard christophebedard deleted the add-tracing-tools branch October 17, 2019 01:58
Jiusi-pys pushed a commit to Jiusi-pys/ros2 that referenced this pull request Jan 17, 2026
Signed-off-by: Christophe Bedard <fixed-term.christophe.bourquebedard@de.bosch.com>
Signed-off-by: Christophe Bedard <bedard.christophe@gmail.com>
Signed-off-by: Christophe Bedard <christophe.bedard@apex.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants