HTTP filter fuzzer base class and POC buffer filter fuzzer#9400
HTTP filter fuzzer base class and POC buffer filter fuzzer#9400mattklein123 merged 37 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Asra Ali <asraa@google.com>
|
cc @nareddyt @htuch @mattklein123 |
|
|
||
| DEFINE_PROTO_FUZZER( | ||
| const test::extensions::filters::http::buffer::BufferFilterFuzzTestCase& input) { | ||
| BufferFilterFuzz filter(input.config()); |
There was a problem hiding this comment.
Another way is:
class DecoderFuzzer {
public:
DecoderFuzz(DecodeFilter* filer) : filter_(filter) {
filter_->setDcoderFilterCallbacks(calback_);
}
void fuzz() {
filter_->...
}
DocoderFilter* filter_{};
callback_;
};
Each custom filter not need to derive from it.
DEFINE_PROTO_FUZZER() {
filter = create custom filter;
fuzzer = DoecoerFuzzer(&filter);
fuzzer.fuzz(input.data());
}
There was a problem hiding this comment.
^ Ooh. I like this a lot more. It leaves the filter fuzz writer with less responsibility to look at the internal of the DecoderFuzzer class. I'll update the POC.
|
@asraa looks super cool. I think this is very worthwhile but it will get complicated since in the general case the filter fuzzer is going to have to replicate a lot of the logic that lives inside the HCM (might be a good opportunity to refactor some of it out). I will take a look at the GH issue that you referred to and we can discuss there. |
| // TODO: add end_stream handling and decodeTrailers | ||
| filter_->decodeHeaders(headers, false); | ||
| Buffer::OwnedImpl buffer(data.data()); | ||
| filter_->decodeData(buffer, true); |
There was a problem hiding this comment.
For decodeData, could we call it multiple times for the buffer, with the last call being end_stream=true? Some of our custom filters have logic to handle streaming grpc data, so a single call would not encapsulate this case.
There was a problem hiding this comment.
Yeah, that's a good point I asked about in the GH issue. We could make the inputs look more like an action stream with repeated Headers and Data, and call decode multiple times with the last call ending the stream. WDYT? It's useful to know that's good, I'll change at least the data to be repeated strings.
htuch
left a comment
There was a problem hiding this comment.
This is great. I think this is definitely the right direction, my comments are mostly trying to see how far we could take the removal of boiler plate to make it easier/cheaper to fuzz additional filters.
|
|
||
| import "validate/validate.proto"; | ||
|
|
||
| message BufferFilterFuzzTestCase { |
There was a problem hiding this comment.
Is it possible to use proto reflection and something like DynamicMessage with libproto-mutator? This could allow us to synthesize the HttpData + per-filter config type message on-the-fly, avoiding the need to have these boiler plate protos.
| : config_(std::make_shared<BufferFilterConfig>(proto_config)) { | ||
| // Initialize filter and setup callbacks. | ||
| filter_ = std::make_unique<BufferFilter>(config_); | ||
| filter_->setDecoderFilterCallbacks(callbacks_); |
There was a problem hiding this comment.
Can this be in the base somehow? Ideally what we are shooting for here is as few lines of boiler plate as possible for every new filter we want to fuzz. I would think it might even be possible to build an Uber Filter Fuzzer that uses the factories to find all the filters, consumes some fuzz data to pick the filter, synthesizes a reasonable config, and then runs through this process, without having any per-filter boiler plate. That way, every single filter gets fuzzed the moment it is linked into Envoy :)
There was a problem hiding this comment.
I would think it might even be possible to build an Uber Filter Fuzzer that uses the factories to find all the filters,
Moving in to the base is in the POC now.
I was working and researching a way to Uber Filter Fuzz, but am running in to a problem dissociating ActiveStream callbacks from the Filter instantiation. (There is a createFilter method that each Filter implements, and the appropriate one is called with createFilterFactoryFromProto. But these methods return a Http::FilterFactoryCb rather than an Http::Filter, so I can't dissociate the creation of the HTTP filter and an active stream. This may be related to the general issue of breaking down the complexity of HCM discussed here #9401. I could imagine the createFilter creating an Http::Filter, and wrapping the creation of the filter chain for a new stream separately.
There was a problem hiding this comment.
Just made a version of the UberFilterFuzzer that hacks the mock expectation to grab the filter when adding to the stream and executes the code on that. It runs at about 250 exec/sec, which is fast as I can seem to get it by reducing the number of mock expectations per call to 0 (and just one static one)
I hardcoded the filter for now, but TODO to make a directory of filter configs and use that.
Signed-off-by: Asra Ali <asraa@google.com>
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
| )EOF"; | ||
| // Create proto_config from the string. | ||
| envoy::config::filter::network::http_connection_manager::v2::HttpFilter proto_config; | ||
| TestUtility::loadFromYaml(config_string, proto_config); |
There was a problem hiding this comment.
I like the new approach to making this generic. What I think we can do to make this more robust is:
- Use some initial entropy to index into the list of available HTTP filters. There's probably no convenient integer index mechanism right now, but it should be easy to add to the registry.
- Once we have the filter factory, we can then ask it it
createEmptyConfigProto(). This will give us aProtobuf::Message. - With the message obtained in (2) (and/or its descriptor), we ask
libprotobuf-mutatorto populate with structurally valid content. - We fuzz with
UberFilterFuzzeras in this PR, that bit looks great.
There was a problem hiding this comment.
All set. Here are some things:
3. The mutator sometimes produces an invalid config, and so this code is in a try/catch block. Sometimes PGV constraints are violated, sometimes exceptions like LuaExceptions are thrown for bad code. I'm wondering if it's better to run this a few times rather than abort the fuzz test immediately, although I think it doesn't make a difference.
4. The class is starting to get complicated. I could move it out in to a library.
| end_stream = true; | ||
| } | ||
| Buffer::OwnedImpl buffer(data.data().Get(i)); | ||
| filter->decodeData(buffer, end_stream); |
There was a problem hiding this comment.
I think we need to consider the decode* return value, which indicates the stream status. If we're told to stop, we shouldn't provide more data, or we violate the filter's preconditions on these methods.
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
htuch
left a comment
There was a problem hiding this comment.
Cool, definitely getting closer to the generic push button filter fuzzer!
| protobuf_mutator::Mutator mutator; | ||
| try { | ||
| // Mutate and validate the chosen filter protobuf directly. | ||
| mutator.Mutate(proto_config.get(), 200); |
There was a problem hiding this comment.
I think we're just running the mutator here directly on an empty config. I wonder if instead we could make use of something in the input proto, so that the mutation is now a function of the selected filter's config type and some input entropy.
In an ideal world, there would be some type magic so that the outer LPM, driving the fuzz test, had some oracle to know the proto type that was going to be selected fro the test run. This magic doesn't exist, so I think the next best thing is for the outer LPM to produce some bit vector that we combine with the known proto type here with an inner LPM.
Maybe there is a smarter way to do this that allows the outer LPM to be the only LPM and/or allows it to do smarter cross-over and mutations, but I'm not seeing it.
There was a problem hiding this comment.
Instead of fuzzing a filter_index and mutating an empty config, I opted for fuzzing configs, and registering a post processor mutation that will always ensure that we at least consume valid named configs. There is no more "inner mutator".
I think it does get closer to producing both random and mostly valid configs (mostly by the hopes that the fuzzer can find the right ones).
It doesn't solve the problem of the typed_config containing an Any which may or may not be parsed into the selected filter config, or that the text proto representation of the Any is binary encoded and ugly. If we do use this approach, I think automagically parsing through unit tests to create the corpus entries is a good way to go.
There was a problem hiding this comment.
This makes sense, thanks. I think there are two things to think about here:
- Agree on the corpus and generation, we could get a good baseline by generating this from unit tests. I think we would want a framework that makes this basically automatic any time you write a unit test for an HTTP filter.
- The mutation of the Any seems to me to be an area that is still a bit hard to understand. LPM is going to see a sequence of bytes and do byte mutation rather than structured proto mutation I think. Could you tag the LPM author on this thread? Would be good to get their take on this.
There was a problem hiding this comment.
@vitalybuka Would love some feedback on this fuzzer. We have a proto fuzzer that takes as input a message that has an embedded google.protobuf.Any.
- The PostProcessorRegistration constrains the filter name to match a supported filter.
- The type of the
typed_configfield depends on the filter being instantiated. Because it is anAny, the value is a serialized message. Assuming LPM will perform byte mutation on this field, is there a better way to perform structured fuzzing on the value?
There was a problem hiding this comment.
Update: I added some valid typed_configs to represent the buffer_filter into a corpus directory, and some ASSERT(false) in the source code to check if we ever generate a config with certain values for fields, and running the fuzzer hundreds of thousands of runs doesn't catch the assert. I logged the config we produce when fuzzing, and too many aren't valid.
@htuch specifically, I crashed if we ever set max request bytes to 0, and restricted to only fuzzer buffer filter. Too many inputs turned into hidden_envoy_deprecated_config with deeper and deeper struct definitions.
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
| [](test::extensions::filters::http::FilterFuzzTestCase* input, unsigned int seed) { | ||
| // This ensures that the mutated configs all have valid filter names. | ||
| static const std::vector<absl::string_view> filter_names = Registry::FactoryRegistry< | ||
| Server::Configuration::NamedHttpFilterConfigFactory>::registeredNames(); |
There was a problem hiding this comment.
Looks like LPM is not very helpful with google.protobuf.Any
We should add support for that. google/libprotobuf-mutator#151
For now you can either keep this post processor assuming it will not conflict with future LPM support or you can just remove it assuming fuzzing performance will improve in future on it's own.
There was a problem hiding this comment.
Actually we need somehow to find supported types for any without changing the proto definition. I guess this callback can be useful for that. I don't know for sure yet how this is going to work on LPM side.
There was a problem hiding this comment.
Yeah, I would guess it would essentially lookup the proto in the descriptor pool with type URL, unpack to it, and then do whatever is normally done by LPM, then repack. But, that's without any knowledge of LPM details. I think this is what we need to make this work efficiently, I know @asraa has been doing some experiments to see what can be done with the existing mutator, so there might be something useful even without that.
There was a problem hiding this comment.
So the message with embedded Any must have (1) valid name, (2) appropriate type_url based on name, (3) Any of the matching type. With the existing mutator what I've done is:
- After setting valid name, will LPM eventually create a non-trivial Any and appropriate type_url? -> Did not seem to after millions of runs
- After setting valid name, and adding valid type_urls and serialized protos to a dictionary, will LPM combine appropriately? Nope.
- After setting valid name and matching type_url, and adding serialized protos to a dictionary, will LPM populate the
valuefield with the dictionary string? -> Yes! I got much more lucrative results.
So for now, I'm relying on dictionaries having full config strings pulled from unit test examples. I think we can iterate further later.
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
|
@asraa can you also update the comment at the top which still reads "Very wip.."? Thanks. |
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
|
@htuch as a reminder. |
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
|
@yanavlasov can you please take a look any time you have a few min? thank you! |
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
|
Master changes merged, applied changes to header map type specifications. @yanavlasov |
| "//test/mocks/buffer:buffer_mocks", | ||
| "//test/mocks/http:http_mocks", | ||
| "//test/mocks/server:server_mocks", | ||
| ] + envoy_all_extensions(), |
There was a problem hiding this comment.
Comment that this links all of innternal extensions.
| headers.setMethod("GET"); | ||
| } | ||
| if (headers.Host() == nullptr) { | ||
| headers.setHost("authority"); |
Signed-off-by: Asra Ali <asraa@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
Nice. Let's ship and iterate!
Generic HTTP filter fuzzer framework that aims to provide coverage over any registered HTTP filter.
This fuzz target fuzzes the decode methods of an HTTP Filter. The fuzzed data is:
HttpFilterA post-processor mutation sanitizes the configuration to have a registered
namepulled from theNamedHttpFilterConfigFactorywith a correspondingtype_url.The fuzzer then ingests these two inputs, creates the appropriate filter (if the serialized configuration can be parsed), and executes
decode(Headers/Data/Trailers)on the fuzzed data.Runs at about 250 exec/sec
Signed-off-by: Asra Ali asraa@google.com