I want to use this regex
r"Summe\d+\W\d+"
to match this string
150,90‡50,90‡8,13‡Summe50,90•50,90•8,13•Kreditkartenzahlung
but I want to only filter out this specific part
Summe50,90
I can select the entire string with this regex but I’m not sure how to filter out only the matching part
here is the function it is in where i am trying to get the amount from a pdf:
def get_amount(url):
data = requests.get(url)
with open('/Users/derricdonehoo/code/derric-d/price-processor/exmpl.pdf', 'wb') as f:
f.write(data.content)
pdfFileObj = open('exmpl.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pageObj = pdfReader.getPage(0)
text = pageObj.extractText().split()
regex = re.compile(r"Summe\d+\W\d+")
matches = list(filter(regex.search, text))
for i in range(len(matches)):
matchString = '\n'.join(matches)
print(matchString)
as described above, I would like guidance on how I can best filter out a part of this string so that it returns just the matching portion. preferably with varying lengths of characters on either side but that’s not a priority.
thanks!!
Solution:
This is what you want, your regex is correct but you must get the match after searching for it.
regex = re.compile(r"Summe\d+\W\d+")
text = ["150,90‡50,90‡8,13‡Summe50,90•50,90•8,13•Kreditkartenzahlung"]
matches = []
for t in text:
m = regex.search(t)
if m:
matches.append(m.group(0))
print(matches)
re.search returns a Match object on success, None on failure, and that object contains all the information about your matching regex. To get the whole match you call Match.group().

