greedy capture with sed
I am trying to greedily capture text with sed.
For example, I have the string abbbc, and I want to capture all of the repeated b characters, so that my result is bbb.
Here's an attempt at a solution:
$ sed -n 's/.*\(b\+\).*/\1/p' <<< abbbc
b
As shown in the output of the command, the capture only obtains a single b rather than my desired result bbb.
I know I could prepend and append the "not b" pattern ([^b]) to my capture, which would give me the desired result:
$ sed -n 's/.*[^b]\(b\+\)[^b].*/\1/p' <<< abbbc
bbb
However, this solution is a bit inelegant, and may become much more complicated when the match is not as simple. So I'm hoping there's another way to force the capture to be greedy.
1 answer
The following users marked this post as Works for me:
| User | Comment | Date |
|---|---|---|
| Trevor | (no comment) | Jun 1, 2025 at 02:27 |
The b\+ part of the regex is already greedy. In sed, all repetitions are greedy. Your problem is that the initial .* is also greedy, and so that's gobbling up both the a and as many bs as it can. For this example, you can change that part to [^b]*:
$ sed -n 's/[^b]*\(b\+\).*/\1/p' <<< abbbc
bbb
For more complicated situations, sed is unlikely to cut it. grep might be a more natural fit for what you're trying to do anyway.
$ grep -o 'b\+' <<< abbbc
bbb

1 comment thread