-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Print blob debug info during training if SolverParameter "debug_info" field is set #796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I decided that printing both the sum and mean of absolute values is a bit visually overwhelming, and the sum isn't particularly useful, so I changed to printing just the mean in the last commit. Here's what it looks like now: I think I'll merge this myself if nobody objects within a few hours -- it's an invisible change unless you specifically enable it. The information in and formatting of the output can be adjusted later if people dislike it. One thing people may want is to, e.g., only see the [Update] information. This can also be introduced later by adding finer-grained debug options in the SolverParameter proto message. |
new "debug_info" field in SolverParameter is set.
|
SGTM |
|
thanks for taking a look @Yangqing! |
Print blob debug info during training if SolverParameter "debug_info" field is set
Print blob debug info during training if SolverParameter "debug_info" field is set
Print blob debug info during training if SolverParameter "debug_info" field is set
(Heavily inspired by the venerable cuda-convnet.)
This PR lets you add "debug_info: true" to your solver.prototxts to compute and print debug information during the forward pass, backward pass, and updates (only prints on "display" iterations when the training loss is normally displayed). Specifically, the information that's printed is L1 norms (sum of absolute values) followed by the mean absolute value in parentheses. This can be helpful for debugging issues with training. For example, if you see a nan/inf loss, this would let you figure out which layer is producing the first nan/inf and when. Here's what the output looks like when you add
debug_info: trueto the end oflenet_consolidated_solver.prototxt: