Commit d42b34a
fix: handle invalid UTF-8 in Ruby and Vue preprocessors (#19588)
## Summary
This PR fixes a panic that occurs when the Ruby or Vue preprocessors
encounter files with invalid UTF-8 bytes.
**The issue:**
- `ruby.rs:37` and `vue.rs:18` used
`std::str::from_utf8(content).unwrap()`
- This panics when processing files containing invalid UTF-8 bytes
**Error message:**
```
thread panicked at crates/oxide/src/extractor/pre_processors/ruby.rs:37:59:
called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 45, error_len: Some(1) }
```
**The fix:**
- Wrap UTF-8 conversion in `if let Ok(...)` to gracefully handle invalid
UTF-8
- Skip regex-based template extraction when UTF-8 conversion fails
- Allow byte-level processing to continue (in Ruby's case)
This can happen in Rails projects when:
- Binary files are inadvertently scanned
- Files contain non-UTF-8 encodings
- Files are truncated at multi-byte character boundaries during parallel
processing
## Test plan
- [x] Added `test_invalid_utf8_does_not_panic` test for Ruby
preprocessor
- [x] Added `test_valid_utf8_with_multibyte_chars` test for Ruby
preprocessor
- [x] Added `test_invalid_utf8_does_not_panic` test for Vue preprocessor
- [x] All existing tests pass (`cargo test pre_processors` - 43 tests)
---------
Co-authored-by: Robin Malfait <malfait.robin@gmail.com>1 parent 0612ddc commit d42b34a
3 files changed
Lines changed: 81 additions & 43 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
63 | 64 | | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
69 | 70 | | |
70 | | - | |
71 | | - | |
72 | | - | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
73 | 74 | | |
74 | | - | |
75 | | - | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
76 | 78 | | |
77 | 79 | | |
78 | 80 | | |
| |||
444 | 446 | | |
445 | 447 | | |
446 | 448 | | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
447 | 469 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
24 | 27 | | |
25 | 28 | | |
26 | 29 | | |
| |||
42 | 45 | | |
43 | 46 | | |
44 | 47 | | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
45 | 60 | | |
0 commit comments