Skip to content

HTTP filter fuzzer base class and POC buffer filter fuzzer#9400

Merged
mattklein123 merged 37 commits intoenvoyproxy:masterfrom
asraa:httpfilterbase
Feb 20, 2020
Merged

HTTP filter fuzzer base class and POC buffer filter fuzzer#9400
mattklein123 merged 37 commits intoenvoyproxy:masterfrom
asraa:httpfilterbase

Conversation

@asraa
Copy link
Copy Markdown
Contributor

@asraa asraa commented Dec 18, 2019

Generic HTTP filter fuzzer framework that aims to provide coverage over any registered HTTP filter.

This fuzz target fuzzes the decode methods of an HTTP Filter. The fuzzed data is:

  • A configuration for an HttpFilter
  • Untrusted HTTP data (headers, data, trailers)

A post-processor mutation sanitizes the configuration to have a registered name pulled from the NamedHttpFilterConfigFactory with a corresponding type_url.

The fuzzer then ingests these two inputs, creates the appropriate filter (if the serialized configuration can be parsed), and executes decode(Headers/Data/Trailers) on the fuzzed data.

Runs at about 250 exec/sec

Signed-off-by: Asra Ali asraa@google.com

Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Dec 18, 2019

cc @nareddyt @htuch @mattklein123
Would love some initial comments. This is just a skeleton / POC, and I'm opening up a GH Issue to discuss more the objectives and goals for these fuzzers.

@asraa asraa mentioned this pull request Dec 18, 2019

DEFINE_PROTO_FUZZER(
const test::extensions::filters::http::buffer::BufferFilterFuzzTestCase& input) {
BufferFilterFuzz filter(input.config());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way is:

class DecoderFuzzer {
public:
DecoderFuzz(DecodeFilter* filer) : filter_(filter) {
filter_->setDcoderFilterCallbacks(calback_);
}

void fuzz() {
filter_->...
}

DocoderFilter* filter_{};
callback_;
};

Each custom filter not need to derive from it.

DEFINE_PROTO_FUZZER() {
filter = create custom filter;
fuzzer = DoecoerFuzzer(&filter);
fuzzer.fuzz(input.data());
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ Ooh. I like this a lot more. It leaves the filter fuzz writer with less responsibility to look at the internal of the DecoderFuzzer class. I'll update the POC.

@mattklein123
Copy link
Copy Markdown
Member

@asraa looks super cool. I think this is very worthwhile but it will get complicated since in the general case the filter fuzzer is going to have to replicate a lot of the logic that lives inside the HCM (might be a good opportunity to refactor some of it out). I will take a look at the GH issue that you referred to and we can discuss there.

// TODO: add end_stream handling and decodeTrailers
filter_->decodeHeaders(headers, false);
Buffer::OwnedImpl buffer(data.data());
filter_->decodeData(buffer, true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For decodeData, could we call it multiple times for the buffer, with the last call being end_stream=true? Some of our custom filters have logic to handle streaming grpc data, so a single call would not encapsulate this case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point I asked about in the GH issue. We could make the inputs look more like an action stream with repeated Headers and Data, and call decode multiple times with the last call ending the stream. WDYT? It's useful to know that's good, I'll change at least the data to be repeated strings.

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I think this is definitely the right direction, my comments are mostly trying to see how far we could take the removal of boiler plate to make it easier/cheaper to fuzz additional filters.


import "validate/validate.proto";

message BufferFilterFuzzTestCase {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use proto reflection and something like DynamicMessage with libproto-mutator? This could allow us to synthesize the HttpData + per-filter config type message on-the-fly, avoiding the need to have these boiler plate protos.

: config_(std::make_shared<BufferFilterConfig>(proto_config)) {
// Initialize filter and setup callbacks.
filter_ = std::make_unique<BufferFilter>(config_);
filter_->setDecoderFilterCallbacks(callbacks_);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be in the base somehow? Ideally what we are shooting for here is as few lines of boiler plate as possible for every new filter we want to fuzz. I would think it might even be possible to build an Uber Filter Fuzzer that uses the factories to find all the filters, consumes some fuzz data to pick the filter, synthesizes a reasonable config, and then runs through this process, without having any per-filter boiler plate. That way, every single filter gets fuzzed the moment it is linked into Envoy :)

Copy link
Copy Markdown
Contributor Author

@asraa asraa Dec 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think it might even be possible to build an Uber Filter Fuzzer that uses the factories to find all the filters,

Moving in to the base is in the POC now.

I was working and researching a way to Uber Filter Fuzz, but am running in to a problem dissociating ActiveStream callbacks from the Filter instantiation. (There is a createFilter method that each Filter implements, and the appropriate one is called with createFilterFactoryFromProto. But these methods return a Http::FilterFactoryCb rather than an Http::Filter, so I can't dissociate the creation of the HTTP filter and an active stream. This may be related to the general issue of breaking down the complexity of HCM discussed here #9401. I could imagine the createFilter creating an Http::Filter, and wrapping the creation of the filter chain for a new stream separately.

Copy link
Copy Markdown
Contributor Author

@asraa asraa Dec 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made a version of the UberFilterFuzzer that hacks the mock expectation to grab the filter when adding to the stream and executes the code on that. It runs at about 250 exec/sec, which is fast as I can seem to get it by reducing the number of mock expectations per call to 0 (and just one static one)

I hardcoded the filter for now, but TODO to make a directory of filter configs and use that.

Signed-off-by: Asra Ali <asraa@google.com>
@stale
Copy link
Copy Markdown

stale bot commented Dec 27, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added stale stalebot believes this issue/PR has not been touched recently and removed stale stalebot believes this issue/PR has not been touched recently labels Dec 27, 2019
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
)EOF";
// Create proto_config from the string.
envoy::config::filter::network::http_connection_manager::v2::HttpFilter proto_config;
TestUtility::loadFromYaml(config_string, proto_config);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new approach to making this generic. What I think we can do to make this more robust is:

  1. Use some initial entropy to index into the list of available HTTP filters. There's probably no convenient integer index mechanism right now, but it should be easy to add to the registry.
  2. Once we have the filter factory, we can then ask it it createEmptyConfigProto(). This will give us a Protobuf::Message.
  3. With the message obtained in (2) (and/or its descriptor), we ask libprotobuf-mutator to populate with structurally valid content.
  4. We fuzz with UberFilterFuzzer as in this PR, that bit looks great.

Copy link
Copy Markdown
Contributor Author

@asraa asraa Jan 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All set. Here are some things:
3. The mutator sometimes produces an invalid config, and so this code is in a try/catch block. Sometimes PGV constraints are violated, sometimes exceptions like LuaExceptions are thrown for bad code. I'm wondering if it's better to run this a few times rather than abort the fuzz test immediately, although I think it doesn't make a difference.
4. The class is starting to get complicated. I could move it out in to a library.

end_stream = true;
}
Buffer::OwnedImpl buffer(data.data().Get(i));
filter->decodeData(buffer, end_stream);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to consider the decode* return value, which indicates the stream status. If we're told to stop, we shouldn't provide more data, or we violate the filter's preconditions on these methods.

asraa added 5 commits January 2, 2020 12:19
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, definitely getting closer to the generic push button filter fuzzer!

protobuf_mutator::Mutator mutator;
try {
// Mutate and validate the chosen filter protobuf directly.
mutator.Mutate(proto_config.get(), 200);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're just running the mutator here directly on an empty config. I wonder if instead we could make use of something in the input proto, so that the mutation is now a function of the selected filter's config type and some input entropy.

In an ideal world, there would be some type magic so that the outer LPM, driving the fuzz test, had some oracle to know the proto type that was going to be selected fro the test run. This magic doesn't exist, so I think the next best thing is for the outer LPM to produce some bit vector that we combine with the known proto type here with an inner LPM.

Maybe there is a smarter way to do this that allows the outer LPM to be the only LPM and/or allows it to do smarter cross-over and mutations, but I'm not seeing it.

Copy link
Copy Markdown
Contributor Author

@asraa asraa Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of fuzzing a filter_index and mutating an empty config, I opted for fuzzing configs, and registering a post processor mutation that will always ensure that we at least consume valid named configs. There is no more "inner mutator".

I think it does get closer to producing both random and mostly valid configs (mostly by the hopes that the fuzzer can find the right ones).

It doesn't solve the problem of the typed_config containing an Any which may or may not be parsed into the selected filter config, or that the text proto representation of the Any is binary encoded and ugly. If we do use this approach, I think automagically parsing through unit tests to create the corpus entries is a good way to go.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, thanks. I think there are two things to think about here:

  1. Agree on the corpus and generation, we could get a good baseline by generating this from unit tests. I think we would want a framework that makes this basically automatic any time you write a unit test for an HTTP filter.
  2. The mutation of the Any seems to me to be an area that is still a bit hard to understand. LPM is going to see a sequence of bytes and do byte mutation rather than structured proto mutation I think. Could you tag the LPM author on this thread? Would be good to get their take on this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vitalybuka Would love some feedback on this fuzzer. We have a proto fuzzer that takes as input a message that has an embedded google.protobuf.Any.

  • The PostProcessorRegistration constrains the filter name to match a supported filter.
  • The type of the typed_config field depends on the filter being instantiated. Because it is an Any, the value is a serialized message. Assuming LPM will perform byte mutation on this field, is there a better way to perform structured fuzzing on the value?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I added some valid typed_configs to represent the buffer_filter into a corpus directory, and some ASSERT(false) in the source code to check if we ever generate a config with certain values for fields, and running the fuzzer hundreds of thousands of runs doesn't catch the assert. I logged the config we produce when fuzzing, and too many aren't valid.

@htuch specifically, I crashed if we ever set max request bytes to 0, and restricted to only fuzzer buffer filter. Too many inputs turned into hidden_envoy_deprecated_config with deeper and deeper struct definitions.

asraa added 3 commits January 7, 2020 12:43
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
[](test::extensions::filters::http::FilterFuzzTestCase* input, unsigned int seed) {
// This ensures that the mutated configs all have valid filter names.
static const std::vector<absl::string_view> filter_names = Registry::FactoryRegistry<
Server::Configuration::NamedHttpFilterConfigFactory>::registeredNames();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like LPM is not very helpful with google.protobuf.Any
We should add support for that. google/libprotobuf-mutator#151
For now you can either keep this post processor assuming it will not conflict with future LPM support or you can just remove it assuming fuzzing performance will improve in future on it's own.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we need somehow to find supported types for any without changing the proto definition. I guess this callback can be useful for that. I don't know for sure yet how this is going to work on LPM side.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would guess it would essentially lookup the proto in the descriptor pool with type URL, unpack to it, and then do whatever is normally done by LPM, then repack. But, that's without any knowledge of LPM details. I think this is what we need to make this work efficiently, I know @asraa has been doing some experiments to see what can be done with the existing mutator, so there might be something useful even without that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the message with embedded Any must have (1) valid name, (2) appropriate type_url based on name, (3) Any of the matching type. With the existing mutator what I've done is:

  • After setting valid name, will LPM eventually create a non-trivial Any and appropriate type_url? -> Did not seem to after millions of runs
  • After setting valid name, and adding valid type_urls and serialized protos to a dictionary, will LPM combine appropriately? Nope.
  • After setting valid name and matching type_url, and adding serialized protos to a dictionary, will LPM populate the value field with the dictionary string? -> Yes! I got much more lucrative results.

So for now, I'm relying on dictionaries having full config strings pulled from unit test examples. I think we can iterate further later.

asraa added 2 commits January 14, 2020 13:01
Signed-off-by: Asra Ali <asraa@google.com>
@htuch htuch changed the title WIP: HTTP filter fuzzer base class and POC buffer filter fuzzer HTTP filter fuzzer base class and POC buffer filter fuzzer Jan 23, 2020
@htuch
Copy link
Copy Markdown
Member

htuch commented Jan 23, 2020

@asraa can you also update the comment at the top which still reads "Very wip.."? Thanks.

asraa added 7 commits January 23, 2020 10:54
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@htuch
Copy link
Copy Markdown
Member

htuch commented Feb 5, 2020

@htuch as a reminder.

@yanavlasov yanavlasov self-assigned this Feb 5, 2020
@stale
Copy link
Copy Markdown

stale bot commented Feb 12, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Feb 12, 2020
Signed-off-by: Asra Ali <asraa@google.com>
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Feb 13, 2020
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Feb 13, 2020

@yanavlasov can you please take a look any time you have a few min? thank you!

Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Feb 19, 2020

Master changes merged, applied changes to header map type specifications. @yanavlasov

"//test/mocks/buffer:buffer_mocks",
"//test/mocks/http:http_mocks",
"//test/mocks/server:server_mocks",
] + envoy_all_extensions(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment that this links all of innternal extensions.

headers.setMethod("GET");
}
if (headers.Host() == nullptr) {
headers.setHost("authority");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it foo.com?

Signed-off-by: Asra Ali <asraa@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Let's ship and iterate!

@mattklein123 mattklein123 merged commit fca0f0b into envoyproxy:master Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants