-
Notifications
You must be signed in to change notification settings - Fork 11
Home
Sammie-Roto 2 is a free and open source application which allows you to load videos or images, and then use AI models to mask and track objects within the scene. It can currently do segmentation (create binary masks where every pixel is either masked or not masked) using Segment Anything Model 2. It can also do matting (create soft masks with areas of partial transparency) using the MatAnyone model. Finally, it can do object removal using MiniMax-Remover. There are also several options for cleaning up masked areas and edges, and a full-featured export dialog.
The models that Sammie-Roto uses are designed to run on a GPU. While they can run on CPU, it will be very slow, but this may be acceptable for you.
- An NVIDIA RTX GPU is recommended. 4GB of VRAM is sufficient to run the segmentation models. More VRAM may be necessary to run the matting model at high resolutions or to use object removal.
- AMD Radeon GPUs can currently only be used on Linux. Windows support for these GPUs will come whenever the Pytorch library can support ROCm for Windows. AMD has been promising this support for years, but it seems like it might actually be coming soon now.
- Apple Silicone should be able to run it alright.
- Download latest version from releases
- Extract the zip archive to a location that is convenient to access.
- Double-click 'install_dependencies.bat' and follow the prompt. Several gigabytes of dependencies and model files will be downloaded.
- Double-click 'run_sammie.bat' to launch the software.
- Everything is self-contained in the Sammie-Roto folder. If you want to remove the application, simply delete this folder. You can also move the folder.
- MacOS users: Make sure Homebrew is installed.
- Ensure Python is installed (version 3.10 or higher, 3.12 recommended)
- Download latest version from releases
- Extract the zip archive to a location that is convenient to access.
- Open a terminal and navigate to the Sammie-Roto folder that you just extracted from the zip.
- Execute the following command: bash install_dependencies.sh then follow the prompt. Several gigabytes of dependencies and model files will be downloaded.
- MacOS users: double-click "run_sammie.command" to launch the program.
- Linux users: bash run_sammie.command or execute the file however you prefer.

- The file menu at the top of the screen gives you access to load and export videos, as well as access to the settings.
- Below that is the viewport, where you can see a preview of the video. There are several different views that you can access from the "View" dropdown that is just to the bottom right of the viewport. Also below the viewport you will find a seek bar to access different frames of the video, as well as playback controls. Note the left and right keyframe buttons, which allow you to seek to frames where you have added points.
- The mouse scroll wheel / middle mouse button can be used to zoom and pan around the viewport.
- At the bottom-left, you will find the "Segmentation Point List" which keeps track of all points that you add to the video. You can select and delete individual points here. Selected points will be highlighted in the preview image.
- To the right of the point list, you will find the console, which provides status information.
- Finally, you will find a sidebar on the right side of the screen with tabs for Segmentation, Matting, and Object Removal. The segmentation tab is the primary tab where you can add points to the video to create masks.
- You can press F1 within the application to launch this help page. Ctrl+F1 will display a list of all the keyboard shortcuts.
Go to the File menu and select Load Video, then you can load a short video clip. Some sample videos that you can try out are in the "examples" folder. You are expected to trim the clip down to a single scene before loading it into Sammie-Roto. The entire video clip will be converted to a series of images, so attempting to load long videos will take a long time and use up a lot of disk space.
Once a video is loaded, you should be in the Segmentation-Edit view, and the segmentation tab should be selected in the right panel. You can left-click on the image to add points indicating areas that you want to be masked. You can right-click to indicate areas that should not be included in the mask. It is recommended to use as few points as possible.
If there are multiple objects in the video that you want to select, you can use change the object id selection at the top of the segmentation tab to label each object individually. You can also write a name for each object which may be helpful for keeping track of the different objects when exporting.
The "track objects" button in the right panel will propagate your masks across all frames of the video, automatically tracking the objects that you selected. If the tracking goes off-course, you can go back and add masks on other frames of the video to help it stay on track, and run tracking again.
If you are working with animation, which typically only changes on every 2 or 3 frames, you can click the "Deduplicate masks" button after tracking has completed. This will make the masks much more stable on frames where image doesn't change.
The run tracking and deduplicate masks buttons will display a green checkmark when they have been run across the entire video and everything is up to date. If you perform any actions to invalidate the data, such as by adding or deleting points, the green checkmarks will disappear.
If you switch to the Matting tab on the right side, you can run the MatAnyone model to create soft masks around objects. This works great for things like humans or animals where hair or fur creates areas of partial transparency. MatAnyone does its own object tracking, separate from the tracking that is performed in the segmentation tab. It just requires you to first use the segmentation tab to select an object on at least 1 frame to get it started. If you add points to multiple frames, then MatAnyone will reset its tracking from that frame forward, which can help correct issues with the tracking, but may also disturb the temporal consistency of the mask. MatAnyone is designed to only handle a single object at a time, so if you have multiple objects, they will be processed in sequence, multiplying the processing time.
The Object Removal tab will give you access to two different object removal methods. The primary method is MiniMax-Remover, which is a video diffusion model which will attempt to remove any objects that are inside segmentation masks (which you tracked on the segmentation tab). An alternative method uses some algorithms from the OpenCV library, which may only give acceptable results for small objects against a fairly solid background. Object removal does not currently work on a per-object basis, it will simply attempt to remove all objects.
Starting from version 2.1, Sammie-Roto lets you add In and Out points to specify a specific range of frames to process, rather than processing the entire video. This can be useful if you simply don't need to process a part of the video, or they can be used to reprocess a small section rather than running tracking or matting across the entire video again. Both points are inclusive. The points can be added using the left and right brackets ([, ]) on the keyboard, and they can be removed with the shortcut "Ctrl+Shift+X".
When you exit the application, Sammie-Roto will automatically save your session and reload it the next time you launch the application. If you want to save a project long-term, there are two different options available in the File menu.
- Save Points: This just saves a small file which contains a list of the points you have added to frames. Later, if you load the same input video again, you can simply load the points. You have to make sure to keep the original video file around.
- Save Project: This will save a file which contains all data regarding the current project. This includes image files of every frame and mask, along with settings. The resulting file can be quite large, but it is a simple and complete backup solution. It is NOT guaranteed to always be compatible with future versions of Sammie-Roto (for example, if certain models get changed or removed).
| Image | View Mode | Description |
|---|---|---|
![]() |
Segmentation-Edit | This view is used for adding points to the image for segmentation. It gives additional options to show an outline or color overlay over the selected object. |
![]() |
Segmentation-Matte | This view shows a black and white matte of the selected objects. There is an option to anti-alias the edges to make them smooth. |
![]() |
Segmentation-BGcolor | This view composites your selected objects over a colored background. This view also provides the option to anti-alias the edges. |
![]() |
Matting-Matte | This view displays a black and white matte of the matted objects. This requires having run matting from the matting tab. |
![]() |
Matting-BGcolor | This view composites the matted objects over a colored background. This requires having run matting from the matting tab. |
![]() |
ObjectRemoval | After running object removal, this shows the view with the objects removed. |
The postprocessing options allow you to tweak the masks created by the segmentation or matting models. You can double-click the label of any postprocessing option to reset it to the default value.
- Remove Holes: The segmentation model may sometimes leave small holes in parts of the mask. This setting can fill in those holes. Larger values fill larger holes.
- Remove Dots: The segmentation model may also sometimes add small dots outside of the main area of your selected object. This setting will remove those spots. Larger values will remove larger areas.
- Border Fix: Sometimes the segmentation model will leave a small area unmasked right next to the edge of the frame boundary. This setting will expand the mask towards the edge of the frame. The value indicates how many pixels away from the frame the mask can be before it is affected.
- Shrink/Grow: A straightforward setting that simply expands or contracts the mask.
- Gamma: This controls the brightness of the semi-transparent pixels. It is an exponential value, where values close to zero have a large effect, but values near the maximum only have a small effect.
- Shrink/Grow: A straightforward setting that simply expands or contracts the mask.
The export image option from the file menu will allow you to export the currently displayed frame as an image. It is a straightforward dialog where you simply select from the various display options and then click the Save As button. When the save as dialog appears, you can choose between PNG and JPEG formats.
The Export Video dialog offers many features to help you export the type of content you want and to also name your files in the way you want. It can appear complex at first, but it is fairly straightforward once you understand the options.

-
First, you can either select an output folder to save your file, or check the box to output to the same folder as your input file.
-
The filename field allows you to type in the filename that you want to save the file as. You can also add a variety of tags which can dynamically insert various types of information into the filename. Selecting a tag from the dropdown list will insert it at the cursor position. Note that you do NOT need to specify a file extension in the filename field, as that will be determined by the codec selection.
Tag Description {input_name} Inserts the filename of your input video. {output_type} Inserts the selected output type, such as "Segmentation-Matte" or "Matting-Greenscreen". {codec} Inserts the selected codec. {object_id} Inserts the object id number. {object_name} Inserts the object name. Will default to "object_id" if a name is not specified for the object. {in_point} Inserts the frame number of the in point, or "0" if an in point is not specified. {out_point} Inserts the frame number of the out point, or the last frame number if an out point is not specified. {datetime} Inserts a timestamp. {date} Just the date part of the timestamp. {time} Just the time part of the timestamp. -
Below the filename field, you will see a preview of what the filename will look like.
-
The codec selection lets you choose the file format to export to. Prores and FFV1 are video formats that support all output types. X264 and X265 are widely compatible with most software, but they do not support output types that require an alpha channel. Finally, the EXR option will export an EXR image sequence. EXR outputs each object as a different layer in the file.
-
The Output Type lets you select the type of content that you want to export. This is similar to the view selection in the main window, but it additionally adds options for Segmentation-Alpha and Matting-Alpha, which will export your video with an alpha channel.
-
Export Object lets you choose to export only a single object, or all of them combined together.
-
Export Videos for Each Object will export an individual video for each object. For example, if you have 3 objects it will create 3 videos. This setting requires you to either have the {object_id} or {object_name} tag in your filename.
-
The Save Settings button in the bottom left will save your current export settings so they can be reused next time.
You can access the Settings dialog through File>Settings. The first tab in the Settings dialog is the General settings, and the second tab is where you can change some defaults. We will focus on the first tab here, since the second tab should be fairly self-explanatory.

First, you can select which SAM segmentation model you would like to use. The Base (standard) model is typically sufficient. If you want the absolute best quality segmentation, you can select the Large model, which can offer a slight quality improvement at the cost of running about 50% slower. The Efficient model uses a faster but noticeably lower quality model, which may be of use to users without hardware acceleration.
There is a checkbox to force the models to run on CPU. You should never check this unless there is some problem preventing Sammie-Roto from functioning with your GPU.
The Video Frame Extraction setting lets you choose if the application should extract video frames to the JPEG or PNG format when loading a video. JPEG is faster but introduces a small reduction in quality (the JPEGs are saved at 95% quality). PNG does not introduce any compression artifacts, but can be a little bit slower. The difference in speed is fairly minor, and probably wouldn't be noticed unless you are directly comparing them.
The Display Update Frequency lets you specify how often the preview image updates when you are tracking or running matting. The default setting is to display an update every 5 frames, but you can reduce this to have it display every frame if you like. The performance penalty of displaying frames can vary depending on what view you currently have selected, but its usually not too significant unless you are using a greenscreen view or have the antialiasing option enabled for the view.
Finally, the Deduplication threshold is used when using the deduplicate masks function on the segmentation page. The default value of 0.8 seems to work well for most animated content, but if you need the effect to be stronger you can lower this value to 0.7 or 0.6.
Sammie-Roto allows passing a video file as an argument when starting the application. On windows, you can call "run_sammie.bat videofile.mp4". On Linux you would use "run_sammie.command videofile.mp4" instead. I don't believe this works on Mac, but I'm not sure.
The matting tab has a dropdown where you can set the maximum internal resolution used by the MatAnyone matting model. This basically downscales your video before feeding it into MatAnyone, which can make it run faster and use less VRAM. Here are the approximate VRAM requirements at different resolutions:
- 480p: 3GB
- 720p: 4GB
- 1080: 8GB
- 1440p: 12GB
- 2160p: unknown, probably 24GB
On the object removal tab, in the MiniMax-Remover mode, there is a dropdown to select the maximum resolution that the object removal is processed at. Higher resolutions will greatly increase both the VRAM required as well as the processing time. The length of the video also increases the resources required. Object removal works best for very short videos. The lowest quality of 352p can run in 6GB of VRAM on a short clip of 80 frames or less.
The VAE tiling checkbox can also help reduce vram usage specifically during the VAE encoding/decoding steps. If you get through all of the transformer steps and then it fails with an out of memory error during VAE decoding, try enabling this. Don't enable it if you don't have to, because it will slow things down quite a bit.
In the Nvidia control panel, there is a setting for CUDA applications to fallback to system memory when you run out of VRAM. On some systems this setting may be enabled by default. You should set this to "prefer no sysmem fallback" in order to disable this feature. It is generally preferable to simply receive an out of memory error instead of letting it fallback to system memory, because running from system memory is MASSIVELY slower than running on your GPU.







