Skip to content

Amazon Standard Identification Number (ASIN)#104

Merged
bee-san merged 11 commits intobee-san:mainfrom
mimiflynn:main
Jul 7, 2021
Merged

Amazon Standard Identification Number (ASIN)#104
bee-san merged 11 commits intobee-san:mainfrom
mimiflynn:main

Conversation

@mimiflynn
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please format tests using black

@mimiflynn
Copy link
Copy Markdown
Contributor Author

I'm not quite sure whats happening: with the hardcoded boundaries ^((?:[/dp/]|$)([A-Z0-9]{10}))$, the tests fail and without them (?:[/dp/]|$)([A-Z0-9]{10}) they pass. I'm looking into it more, but if you have any quick insights, I'd appreciate the feedback.

Thanks!

@amadejpapez
Copy link
Copy Markdown
Collaborator

This is happening because with hardcoded boundaries the whole match needs to be in one line with nothing else around it.

http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C

Your regex is matching to only /dp/B0015T963C, so it doesn't match as there is something before it. If you add only this part as a test it will pass. 😄

@mimiflynn
Copy link
Copy Markdown
Contributor Author

This is happening because with hardcoded boundaries the whole match needs to be in one line with nothing else around it.

http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C

Your regex is matching to only /dp/B0015T963C, so it doesn't match as there is something before it. If you add only this part as a test it will pass. smile

Oh, I see! I had lost sight of main use of pyWhat while using regex101 and was extracting ASINs from strings instead of checking if a string is an ASIN. Updating now with new regex removing URL specific aspects.

@bee-san
Copy link
Copy Markdown
Owner

bee-san commented Jul 7, 2021

This is happening because with hardcoded boundaries the whole match needs to be in one line with nothing else around it.
http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C
Your regex is matching to only /dp/B0015T963C, so it doesn't match as there is something before it. If you add only this part as a test it will pass. smile

Oh, I see! I had lost sight of main use of pyWhat while using regex101 and was extracting ASINs from strings instead of checking if a string is an ASIN. Updating now with new regex removing URL specific aspects.

I think in bounardyless mode we do extract that string, but we need it to have boundaries in our database so we can take them away in our boundaryless mode :)

@bee-san bee-san requested a review from amadejpapez July 7, 2021 09:59
@bee-san bee-san merged commit 72a894d into bee-san:main Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants