Whenever you train a detector you are feeding your training areas to a deep learning model one after the other. These models need a fixed image size to be passed into them, which is just the nature of how these models work in training. Let’s call this fixed size image a “tile”. For us the default tile is a 256 x 256 sample of one of your training areas, which is quite standard in the machine learning computer vision field. If your training areas are larger than this, a single tile will only encompass a subpart of that area. This is fine because we will select a tile from each of your training areas multiple times during training and each time we randomly sample the location of the tile in your training area, the idea being that eventually after multiple training steps all training area contents will be seen.

Training tiles as seen in the training report
However, sometimes this tile size is simply too small. Say your object is too large to fit into a 256 x 256 tile and the full object is never seen in one tile by your model. This is like trying to learn what a human face looks like but you’ve only ever seen separately a nose, or an eye, or a mouth, but never all of it together. It can be achieved, but it makes the task harder. The way we’d get around this problem with Picterra previously was to use the detector resolution setting to scale down your images, which can work fine in many cases but in doing so you also lose detail in your image. If that detail is important in producing correct detections (which depends on your use case), then it may not be such a good idea to scale down that imagery. However, if we just increase the tile size then perhaps we can fit the entire object in a single tile without losing detail, and that’s what the tile size setting does. instead of scaling your imagery down by two, you can increase the tile size from 256 x 256 to 512 x 512 for example.
Here is an example dataset of where this setting has helped in improving the results, producing a full 5% increase in performance (from 63% to 68%) on a difficult and complex problem.
Missing some paving on the left as well as a falsely detected rooftop. Improved results on the right (no rooftop detection at all).
Falsely detected road on the left, correct on the right.
Despite not looking for a specific object (this is a segmentation detector), the tile size is still helpful here because the “paving” in here is defined as basically being not the main road for vehicles, which means the class contains multiple different appearances and textures. Of course there are still rules in what materials are used for which and so scaling the imagery down can be detrimental because you lose detail of the texture of the materials.
However, a major factor in how we as humans would determine whether or not something is pavement is not just by its material but also its surrounding context.
We can tell something is a sidewalk because it’s a narrow linear element next to a larger wider linear element, the road. Since the context is important, having more of it is helpful. So how can we increase the context while still maintaining the texture detail? By increasing the tile size. This dataset in particular could also use more data, and perhaps multiple classes, to improve the score further, but that’s a different story.
It sounds great on paper but of course there are a few drawbacks to be aware of. It may not always perform better as expected. Also it may not even be the details that are important for the model to learn what your object is anyways (in the end it’s not us who decides what details in the images to use during training, it’s the model’s responsibility to do that automatically). As mentioned earlier, it really depends on your use case. In addition increasing the tile size means increasing the GPU memory of your model. A bump from 256 x 256 to 512 x 512 means 4x the GPU memory. This means your training will also take 4x longer. In short this setting can be quite helpful but it’s important to experiment and play around with it to see if it can be of benefit to your detector.