Skip to content

Exclude development scripts and fuzzing data#1319

Closed
weiznich wants to merge 1 commit intorust-lang:masterfrom
GiGainfosystems:exclude_unneccessary_stuff
Closed

Exclude development scripts and fuzzing data#1319
weiznich wants to merge 1 commit intorust-lang:masterfrom
GiGainfosystems:exclude_unneccessary_stuff

Conversation

@weiznich
Copy link
Contributor

During a dependency review we noticed that different regex-* crate includes various development scripts and sometimes fuzzing data. These development scripts shouldn't be there as they might, at some point become problematic. As of now they prevent any downstream user from enabling the [bans.build.interpreted] option of cargo deny. The fuzzing data are also not needed for anything.

I opted for using an explicit include list instead of an exclude list to prevent these files from beeing included in the published packages to make sure that everything that's included is an conscious choice.

@weiznich
Copy link
Contributor Author

weiznich commented Feb 3, 2026

@BurntSushi Any chance to get your look at this PR?

Cargo.toml Outdated
]

[workspace.package]
include = ["CHANGELOG.md", "Cargo.toml", "LICENSE-MIT", "LICENSE-APACHE", "README.md", "src/**/*.rs"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is quite including everything we were including before (while excluding the bits you mention). For example, this doesn't include tests.

Also, please wrap lines to 79 columns (inclusive).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your response. I've updated the include to include tests and also most of the other files that were included before. The excluded files are:

  • Fuzzing artifacts (There were already partially removed from regex before this PR, but not from the other crates)
  • Test data (not sure about that one?)
  • Test scripts
  • General develoment environment setup (vim config, rustfmt.toml, Cross.toml)

Detailed changes:

Regex

Included files:

.cargo_vcs_info.json
CHANGELOG.md
Cargo.lock
Cargo.toml
Cargo.toml.orig
LICENSE-APACHE
LICENSE-MIT
README.md
UNICODE.md
bench/README.md
src/builders.rs
src/bytes.rs
src/error.rs
src/find_byte.rs
src/lib.rs
src/pattern.rs
src/regex/bytes.rs
src/regex/mod.rs
src/regex/string.rs
src/regexset/bytes.rs
src/regexset/mod.rs
src/regexset/string.rs
tests/lib.rs
tests/misc.rs
tests/regression.rs
tests/regression_fuzz.rs
tests/replace.rs
tests/searcher.rs
tests/suite_bytes.rs
tests/suite_bytes_set.rs
tests/suite_string.rs
tests/suite_string_set.rs

The following files were removed:

-.gitignore
-.vim/coc-settings.json
-Cross.toml
-rustfmt.toml
-test
-testdata/README.md
-testdata/anchored.toml
-testdata/bytes.toml
-testdata/crazy.toml
-testdata/crlf.toml
-testdata/earliest.toml
-testdata/empty.toml
-testdata/expensive.toml
-testdata/flags.toml
-testdata/fowler/basic.toml
-testdata/fowler/dat/README
-testdata/fowler/dat/basic.dat
-testdata/fowler/dat/nullsubexpr.dat
-testdata/fowler/dat/repetition.dat
-testdata/fowler/nullsubexpr.toml
-testdata/fowler/repetition.toml
-testdata/iter.toml
-testdata/leftmost-all.toml
-testdata/line-terminator.toml
-testdata/misc.toml
-testdata/multiline.toml
-testdata/no-unicode.toml
-testdata/overlapping.toml
-testdata/regex-lite.toml
-testdata/regression.toml
-testdata/set.toml
-testdata/substring.toml
-testdata/unicode.toml
-testdata/utf8.toml
-testdata/word-boundary-special.toml
-testdata/word-boundary.toml
Regex-automata

Included files:

.cargo_vcs_info.json
Cargo.lock
Cargo.toml
Cargo.toml.orig
LICENSE-APACHE
LICENSE-MIT
README.md
src/dfa/accel.rs
src/dfa/automaton.rs
src/dfa/dense.rs
src/dfa/determinize.rs
src/dfa/minimize.rs
src/dfa/mod.rs
src/dfa/onepass.rs
src/dfa/regex.rs
src/dfa/remapper.rs
src/dfa/search.rs
src/dfa/sparse.rs
src/dfa/special.rs
src/dfa/start.rs
src/hybrid/dfa.rs
src/hybrid/error.rs
src/hybrid/id.rs
src/hybrid/mod.rs
src/hybrid/regex.rs
src/hybrid/search.rs
src/lib.rs
src/macros.rs
src/meta/error.rs
src/meta/limited.rs
src/meta/literal.rs
src/meta/mod.rs
src/meta/regex.rs
src/meta/reverse_inner.rs
src/meta/stopat.rs
src/meta/strategy.rs
src/meta/wrappers.rs
src/nfa/mod.rs
src/nfa/thompson/backtrack.rs
src/nfa/thompson/builder.rs
src/nfa/thompson/compiler.rs
src/nfa/thompson/error.rs
src/nfa/thompson/literal_trie.rs
src/nfa/thompson/map.rs
src/nfa/thompson/mod.rs
src/nfa/thompson/nfa.rs
src/nfa/thompson/pikevm.rs
src/nfa/thompson/range_trie.rs
src/util/alphabet.rs
src/util/captures.rs
src/util/determinize/mod.rs
src/util/determinize/state.rs
src/util/empty.rs
src/util/escape.rs
src/util/int.rs
src/util/interpolate.rs
src/util/iter.rs
src/util/lazy.rs
src/util/look.rs
src/util/memchr.rs
src/util/mod.rs
src/util/pool.rs
src/util/prefilter/aho_corasick.rs
src/util/prefilter/byteset.rs
src/util/prefilter/memchr.rs
src/util/prefilter/memmem.rs
src/util/prefilter/mod.rs
src/util/prefilter/teddy.rs
src/util/primitives.rs
src/util/search.rs
src/util/sparse_set.rs
src/util/start.rs
src/util/syntax.rs
src/util/unicode_data/mod.rs
src/util/unicode_data/perl_word.rs
src/util/utf8.rs
src/util/wire.rs
tests/dfa/api.rs
tests/dfa/mod.rs
tests/dfa/onepass/mod.rs
tests/dfa/onepass/suite.rs
tests/dfa/regression.rs
tests/dfa/suite.rs
tests/fuzz/dense.rs
tests/fuzz/mod.rs
tests/fuzz/sparse.rs
tests/gen/dense/mod.rs
tests/gen/dense/multi_pattern_v2.rs
tests/gen/mod.rs
tests/gen/sparse/mod.rs
tests/gen/sparse/multi_pattern_v2.rs
tests/hybrid/api.rs
tests/hybrid/mod.rs
tests/hybrid/suite.rs
tests/lib.rs
tests/meta/mod.rs
tests/meta/suite.rs
tests/nfa/mod.rs
tests/nfa/thompson/backtrack/mod.rs
tests/nfa/thompson/backtrack/suite.rs
tests/nfa/thompson/mod.rs
tests/nfa/thompson/pikevm/mod.rs
tests/nfa/thompson/pikevm/suite.rs

Removed files:

-tests/fuzz/testdata/deserialize_dense_crash-9486fb7c8a93b12c12a62166b43d31640c0208a9
-tests/fuzz/testdata/deserialize_dense_minimized-from-9486fb7c8a93b12c12a62166b43d31640c0208a9
-tests/fuzz/testdata/deserialize_sparse_crash-0da59c0434eaf35e5a6b470fa9244bb79c72b000
-tests/fuzz/testdata/deserialize_sparse_crash-18cfc246f2ddfc3dfc92b0c7893178c7cf65efa9
-tests/fuzz/testdata/deserialize_sparse_crash-61fd8e3003bf9d99f6c1e5a8488727eefd234b98
-tests/fuzz/testdata/deserialize_sparse_crash-a1b839d899ced76d5d7d0f78f9edb7a421505838
-tests/fuzz/testdata/deserialize_sparse_crash-c383ae07ec5e191422eadc492117439011816570
-tests/fuzz/testdata/deserialize_sparse_crash-d07703ceb94b10dcd9e4acb809f2051420449e2b
-tests/fuzz/testdata/deserialize_sparse_crash-dbb8172d3984e7e7d03f4b5f8bb86ecd1460eff9
-tests/gen/README.md
-test
-tests/gen/dense/multi_pattern_v2_fwd.bigendian.dfa
-tests/gen/dense/multi_pattern_v2_fwd.littleendian.dfa
-tests/gen/dense/multi_pattern_v2_rev.bigendian.dfa
-tests/gen/dense/multi_pattern_v2_rev.littleendian.dfa
-tests/gen/sparse/multi_pattern_v2_fwd.bigendian.dfa
-tests/gen/sparse/multi_pattern_v2_fwd.littleendian.dfa
-tests/gen/sparse/multi_pattern_v2_rev.bigendian.dfa
-tests/gen/sparse/multi_pattern_v2_rev.littleendian.dfa
Regex-capi

Included files:

Cargo.lock
Cargo.toml
Cargo.toml.orig
LICENSE-APACHE
LICENSE-MIT
README.md
ctest/test.c
examples/iter.c
examples/sherlock.txt
include/rure.h
src/error.rs
src/lib.rs
src/macros.rs
src/rure.rs

The following files were removed:

-ctest/.gitignore
-ctest/compile
-examples/.gitignore
-examples/compile
-test
Regex-CLI

Included files

.cargo_vcs_info.json
Cargo.lock
Cargo.toml
Cargo.toml.orig
LICENSE-APACHE
LICENSE-MIT
README.md
args/api.rs
args/backtrack.rs
args/common.rs
args/dfa.rs
args/flags.rs
args/haystack.rs
args/hybrid.rs
args/input.rs
args/lite.rs
args/meta.rs
args/mod.rs
args/onepass.rs
args/overlapping.rs
args/patterns.rs
args/pikevm.rs
args/syntax.rs
args/thompson.rs
cmd/compile_test.rs
cmd/debug/dfa.rs
cmd/debug/literal.rs
cmd/debug/mod.rs
cmd/find/capture/dfa.rs
cmd/find/capture/mod.rs
cmd/find/capture/nfa.rs
cmd/find/half/dfa.rs
cmd/find/half/mod.rs
cmd/find/match/dfa.rs
cmd/find/match/mod.rs
cmd/find/match/nfa.rs
cmd/find/mod.rs
cmd/find/which/dfa.rs
cmd/find/which/mod.rs
cmd/find/which/nfa.rs
cmd/generate/fowler.rs
cmd/generate/mod.rs
cmd/generate/serialize/dfa.rs
cmd/generate/serialize/mod.rs
cmd/generate/unicode.rs
cmd/mod.rs
logger.rs
main.rs
util.rs

Removed files: None

Regex-lite

Included files:

.cargo_vcs_info.json
Cargo.lock
Cargo.toml
Cargo.toml.orig
LICENSE-APACHE
LICENSE-MIT
README.md
src/error.rs
src/hir/mod.rs
src/hir/parse.rs
src/int.rs
src/interpolate.rs
src/lib.rs
src/nfa.rs
src/pikevm.rs
src/pool.rs
src/string.rs
src/utf8.rs

tests/lib.rs
tests/string.rs

Removed files:

-tests/fuzz/mod.rs
-tests/fuzz/testdata/crash-a886ce2b0d64963f1232f9b08b8c9ad4740c26f5
-tests/fuzz/testdata/minimized-from-298f84f9dbb2589cb9938a63334fa4083b609f34
Regex-syntax

Included files:

.cargo_vcs_info.json
Cargo.lock
Cargo.toml
Cargo.toml.orig
LICENSE-APACHE
LICENSE-MIT
README.md
benches/bench.rs
src/ast/mod.rs
src/ast/parse.rs
src/ast/print.rs
src/ast/visitor.rs
src/debug.rs
src/either.rs
src/error.rs
src/hir/interval.rs
src/hir/literal.rs
src/hir/mod.rs
src/hir/print.rs
src/hir/translate.rs
src/hir/visitor.rs
src/lib.rs
src/parser.rs
src/rank.rs
src/unicode.rs
src/unicode_tables/LICENSE-UNICODE
src/unicode_tables/age.rs
src/unicode_tables/case_folding_simple.rs
src/unicode_tables/general_category.rs
src/unicode_tables/grapheme_cluster_break.rs
src/unicode_tables/mod.rs
src/unicode_tables/perl_decimal.rs
src/unicode_tables/perl_space.rs
src/unicode_tables/perl_word.rs
src/unicode_tables/property_bool.rs
src/unicode_tables/property_names.rs
src/unicode_tables/property_values.rs
src/unicode_tables/script.rs
src/unicode_tables/script_extension.rs
src/unicode_tables/sentence_break.rs
src/unicode_tables/word_break.rs
src/utf8.rs
test

Removed files:

-benches/bench.rs
-test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah testdata should be included. I did that in #1332.

During a dependency review we noticed that different regex-* crate includes various development scripts and sometimes fuzzing data. These development scripts shouldn't be there as they might, at some point become problematic. As of now they prevent any downstream user from enabling the `[bans.build.interpreted]` option of cargo deny. The fuzzing data are also not needed for anything.

I opted for using an explicit include list instead of an exclude list to prevent these files from beeing included in the published packages to make sure that everything that's included is an conscious choice.
@weiznich weiznich force-pushed the exclude_unneccessary_stuff branch from c736e89 to f4fc618 Compare February 3, 2026 12:56
Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

BurntSushi pushed a commit that referenced this pull request Feb 3, 2026
During a dependency review we noticed that different regex-* crate
includes various development scripts and sometimes fuzzing data. These
development scripts shouldn't be there as they might, at some point
become problematic. As of now they prevent any downstream user from
enabling the `[bans.build.interpreted]` option of cargo deny. The
fuzzing data are also not needed for anything.

I opted for using an explicit include list instead of an exclude list
to prevent these files from beeing included in the published packages
to make sure that everything that's included is an conscious choice.

Closes #1319
@BurntSushi
Copy link
Member

I can't push to this PR, so I opened a new one with a changelog entry: #1332

@weiznich
Copy link
Contributor Author

weiznich commented Feb 3, 2026

(For reference I'm just writing a summary what's now included and what is removed, so it might be meaningful to wait a few more minutes before merging. Just to make sure everyone is aware what happens)

BurntSushi pushed a commit that referenced this pull request Feb 3, 2026
During a dependency review we noticed that different regex-* crate
includes various development scripts and sometimes fuzzing data. These
development scripts shouldn't be there as they might, at some point
become problematic. As of now they prevent any downstream user from
enabling the `[bans.build.interpreted]` option of cargo deny. The
fuzzing data are also not needed for anything.

I opted for using an explicit include list instead of an exclude list
to prevent these files from beeing included in the published packages
to make sure that everything that's included is an conscious choice.

Closes #1319
pull bot pushed a commit to dumpmemory/regex that referenced this pull request Feb 3, 2026
During a dependency review we noticed that different regex-* crate
includes various development scripts and sometimes fuzzing data. These
development scripts shouldn't be there as they might, at some point
become problematic. As of now they prevent any downstream user from
enabling the `[bans.build.interpreted]` option of cargo deny. The
fuzzing data are also not needed for anything.

I opted for using an explicit include list instead of an exclude list
to prevent these files from beeing included in the published packages
to make sure that everything that's included is an conscious choice.

Closes rust-lang#1319
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants