File Taint: Adding ARM for Linux and Wildcard for Files, with PANDA logging#1616
Merged
File Taint: Adding ARM for Linux and Wildcard for Files, with PANDA logging#1616
Conversation
8145b35 to
b69c772
Compare
b69c772 to
6e34452
Compare
6e34452 to
af4d98c
Compare
2 tasks
a9b7af5 to
4219acd
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request adds ARM architecture support (32-bit and 64-bit) for Linux file tainting in the file_taint plugin and replaces exact filename matching with wildcard pattern matching using POSIX fnmatch.
Key Changes:
- Adds ARM register handling (R0 for 32-bit ARM, X0 for 64-bit ARM) to extract syscall return values for read operations
- Replaces substring-based filename matching with fnmatch-based wildcard pattern matching to support multiple files
- Includes numerous code style improvements (brace formatting consistency)
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| panda/plugins/file_taint/file_taint.cpp | Adds ARM support for Linux read syscalls, replaces filename matching with wildcard pattern matching using fnmatch, adds empty filename validation |
| panda/plugins/file_taint/README.md | Updates documentation to describe new wildcard matching behavior with examples and usage guidance |
| panda/plugins/taint2/taint_api.cpp | Code formatting improvements and adds debug print statement in taint2_label_ram |
| panda/plugins/taint2/taint2.cpp | Code style improvements (brace formatting) and adds informational print statements |
| panda/plugins/taint2/taint2_hypercalls.cpp | Code formatting improvement for multi-line function call |
| panda/debian/setup.sh | Adds TARGET_LIST build argument hardcoded to x86_64-softmmu |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
011af6b to
c8de23b
Compare
5 tasks
5e31d1c to
0311fda
Compare
0311fda to
7cf8205
Compare
7cf8205 to
abffa4a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Your checklist for this pull request
Detailed description
This PR completes two changes:
For 32-bit ARM, the system calls are identical it i386
https://github.com/panda-re/panda/blob/dev/panda/plugins/syscalls2/generated/syscalls_ext_typedefs_arm64.h#L1377-L1384
https://github.com/panda-re/panda/blob/dev/panda/plugins/syscalls2/generated/syscalls_ext_typedefs_arm.h#L1589-L1596
For 64-bit ARM, the system calls are identical to x86-64
https://github.com/panda-re/panda/blob/dev/panda/plugins/syscalls2/generated/syscalls_ext_typedefs_arm64.h#L1277-L1284
https://github.com/panda-re/panda/blob/dev/panda/plugins/syscalls2/generated/syscalls_ext_typedefs_arm64.h#L1373-L1380
I'm also assuming that I need to check register 0 for both the ARM architecture to get the number of bytes read.
fnmatch seems to be the best fit for supporting flexibility on file names, matching how shells do file matching.
https://man7.org/linux/man-pages/man3/fnmatch.3.html
...
Test plan
With LAVA, I will test being able to run with files such as ./toy/inputs/*, which should taint files such as ./toy/inputs/small-1.bin, ./toy/inputs/small-2.bin
I can confirm, based on LAVA logs, that two files, testbig.bin and testsmall.bin, were tainted using the wildcard. Additionally, it appears that taint2 works on both files. I added a debug print on label_ram JUST to make sure.
When testing originally, I saw the taint2 hypercall warning, hence that debug message, but I guess I was needlessly spooked.
bug_mining.log
Also, I added panda logging to any new files tainted, see here:

This would be useful so if your log captures multiple file taints, you can figure out which taints belong to which files!
PSA: If you use Python to convert panda log to JSON, you MUST use the updated version of PyPanda that will be made with this PR, then you should be able to see the FileMatchTaint instance.
You would see this in Python under pandare/plog_pb2.py, and search for "file_taint_match"
...
Closing issues
N/A
...