greedy capture with sed

−0

I am trying to greedily capture text with sed. For example, I have the string abbbc, and I want to capture all of the repeated b characters, so that my result is bbb. Here's an attempt at a solution:

$ sed -n 's/.*\(b\+\).*/\1/p' <<< abbbc
b

As shown in the output of the command, the capture only obtains a single b rather than my desired result bbb.

I know I could prepend and append the "not b" pattern ([^b]) to my capture, which would give me the desired result:

$ sed -n 's/.*[^b]\(b\+\)[^b].*/\1/p' <<< abbbc
bbb

However, this solution is a bit inelegant, and may become much more complicated when the match is not as simple. So I'm hoping there's another way to force the capture to be greedy.

sed regex

posted 8 months ago

CC BY-SA 4.0

Trevor‭

121 reputation 9 2 20 7

Copy Link

Raw

Markdown

History

1 comment thread

sd is modern sed (1 comment)

1 answer

−0

Worked for Trevor‭

The following users marked this post as Works for me:

User	Comment	Date
Trevor‭	(no comment)	Jun 1, 2025 at 02:27

The b\+ part of the regex is already greedy. In sed, all repetitions are greedy. Your problem is that the initial .* is also greedy, and so that's gobbling up both the a and as many bs as it can. For this example, you can change that part to [^b]*:

$ sed -n 's/[^b]*\(b\+\).*/\1/p' <<< abbbc
bbb

For more complicated situations, sed is unlikely to cut it. grep might be a more natural fit for what you're trying to do anyway.

$ grep -o 'b\+' <<< abbbc
bbb

posted 8 months ago

CC BY-SA 4.0

r~~‭

1177 reputation 0 32 123 12