Profiler RFC: Introducing new APIs

# Profiler RFC: Introducing new APIs

## Introducing New APIs

### Motivation

MXNet comes with a profiler that allows users to monitor the performance of their models in two metrics: time and memory consumption. Internally, operator calls, C API calls, and memory allocation/deallocation are represented as events. For functions calls, we know the start and finish time of the events and therefore the duration. For memory operations, we know the time of the allocation/deallocation and the size of the memory chunk. 
![Screen Shot 2019-05-24 at 4 16 39 PM](https://user-images.githubusercontent.com/10722037/58362190-49f4c080-7e49-11e9-92a3-23664384544b.png)

Currently, the profiler has a function called `dumps()` that will return the aggregate statistics, which include min, max, and average for entries in Device Memory, Operator, and C_API. The current return value is string and the data is presented in a table fashion (refer to the screenshot above). However, while the table is nicely formatted, it is only meant to be read by humans but is not easily parse-able otherwise by program. So, there is a need for an API that returns the same aggregate stats in a JSON string.


### Specification

A new API, `get_summary()`, will be introduced. It will have two parameters: 

1. “sort_by” which specifies by which statistic should we sort the entries. It defaults to “avg” and valid options are [“min”, “max”, “avg”].
2. “ascending” which specifies how the entries should be sorted. It defaults to False and valid options are [True, False].

 Expected use cases of `get_summary()` include:

1. If customers are more interested in some events or stats than the others, they can customize the data presentation to more efficiently monitor their models.
2. Customers can easily pass the stats to automated performance tests or monitoring tools. They do not need to parse the table-like string returned by `dumps()`. 
3. This new API will be immediately useful to a new operator-level benchmark tool that @sandeep-krishnamurthy will work on. cwiki: https://cwiki.apache.org/confluence/display/MXNET/MXNet+Operator+Benchmarks. 


The structure of the JSON return value is shown below. It is a four layer dictionary structure. The 1st layer is “Time”, “Memory”, and “Unit”. The 2nd layer is the category that the operators/APIs fall into. The 3rd layer is the operators/APIs. Finally, the 4th layer is the stats. Notice that the time unit is ms and the memory unit is byte.

```
{
    "Time": {
        "operator": {
            "mean ": {
                "Total Count": 2,
                "Total Time": 0.0490,
                "Min Time": 0.0240,
                "Max Time": 0.0250,
                "Avg Time": 0.0245
            }
            ...
        }
        ,
        "MXNET_C_API": {
            "MXNDArrayWaitAll": {
                "Total Count": 1,
                "Total Time": 205.9560,
                "Min Time": 205.9560,
                "Max Time": 205.9560,
                "Avg Time": 205.9560
            }
            ,
            "MXNDArraySetGradState": {
                "Total Count": 8,
                "Total Time": 0.0050,
                "Min Time": 0.0000,
                "Max Time": 0.0010,
                "Avg Time": 0.0006
            }
            ...
        }
    }
    ,
    "Memory": {
        "Device Storage": {
            "Memory: cpu/0 ": {
                "Count": 1,                
                "Max Usage": 109037988,
                "Min Usage": 0,
                "Avg Usage": 54518999
            }
            ,
            "Memory: gpu/0 ": {
                "Count": 1, 
                "Max Usage": 109037988,
                "Min Usage": 0,
                "Avg Usage": 54518999
            }
        }
        ,
        "Pool Memory": {
            "Pool:gpu/0 Pool Free": {
                "Count": 1, 
                "Max Usage": 1,
                "Min Usage": 2,
                "Avg Usage": 3
            }
            ,
            "Pool:gpu/0 Pool Used": {
                "Count": 1, 
                "Max Usage": 0,
                "Min Usage": 1,
                "Avg Usage": 2
            }
            ...
        }
    }
    "Unit": {
        "Time": "ms",
        "Memory": "byte"        
    }
}
```


Asides from `get_summary(),` we will also have another new API, `reset()`, which will clear the aggregate statistics up until now. A typical use case is like:

```
# we don't care what happened before this point
profiler.reset()
# model
profiler.set_state('run')
run_training_iteration(*next(itr))
mx.nd.waitall()
profiler.set_state('stop')
# end model
func(profiler.get_summary())
```

In a more complex case, suppose we want to use the same profiler to benchmark various sections of a model, we can then call `get_summary()` and `reset()` at the end of each section or supposedly at the end of a loop neatly like:

```
# model section 1
profiler.set_state('run')
# model code here
profiler.set_state('stop')
print(profiler.get_summary())
profiler.reset()

# model section 2
profiler.set_state('run')
# model code here
profiler.set_state('stop')
func(profiler.get_summary())
profiler.reset()
```

OR

```
# loop through tests functions
for f in benchmark_tests:
    profiler.set_state('run')
    f()
    mx.nd.waitall()
    profiler.set_state('stop')
    print(profiler.get_summary())
    profiler.reset()
```

## Fixing the Output of Dumps()
![Screen Shot 2019-05-23 at 5 23 56 PM](https://user-images.githubusercontent.com/10722037/58362201-7c062280-7e49-11e9-88ec-3ab102c95795.png)
Currently labeling in the table is slightly off. For memory-related entries the labels should be “Usage” rather than “Time”. The “Time (ms)” column also does not make sense for memory entries, so it should be removed for memory entries.

The new table labeling should look like:
`// For time entries`
`Name    Total Count    Total Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)`
`// For memory entries`
`Name    Total Count    Min Usage (MB)    Max Usage (MB)    Avg Usage (MB)`


## F&Q

1. Why can't we use the current dumps() API?

We can use the current dumps API and basically get the save information, but then we need to manually parse the table which is not a good user experience.

1. Why add a new profiler API `get_summary()` in the back-end rather than a python parser utility that returns in JSON? 

This is we can use this new API in different languages and make sure the return is consistent. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Profiler RFC: Introducing new APIs #15069

Profiler RFC: Introducing new APIs

Introducing New APIs

Motivation

Specification

Fixing the Output of Dumps()

F&Q

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Profiler RFC: Introducing new APIs #15069

Description

Profiler RFC: Introducing new APIs

Introducing New APIs

Motivation

Specification

Fixing the Output of Dumps()

F&Q

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions