Skip to content

Fix same link in different file error#26

Merged
bwplotka merged 2 commits intobwplotka:mainfrom
saswatamcode:same-link-fix
May 14, 2021
Merged

Fix same link in different file error#26
bwplotka merged 2 commits intobwplotka:mainfrom
saswatamcode:same-link-fix

Conversation

@saswatamcode
Copy link
Copy Markdown
Collaborator

Currently, if there are same links but in different files (come under same file glob passed to mdox) like here and here, mdox throws the error below,

remote link https://slack.cncf.io/: URL already visited

Even though the link is valid.
This PR fixes this and adds a test case for the same.

@saswatamcode saswatamcode requested a review from bwplotka May 13, 2021 13:12
@saswatamcode saswatamcode self-assigned this May 13, 2021
@saswatamcode
Copy link
Copy Markdown
Collaborator Author

Linter failure seems to be due to goreportcard.com being down.

@bwplotka
Copy link
Copy Markdown
Owner

You should be able to retry @saswatamcode since you are maintainer now!

@bwplotka
Copy link
Copy Markdown
Owner

Hm, do you know why this error was created in the first place? (:

@saswatamcode
Copy link
Copy Markdown
Collaborator Author

I think this error was created to improve performance and avoid duplicate requests in colly. But here it seems to cause this issue...is there a better way to handle this other than using AllowURLRevisit? (:

Also, goreportcard seems to still be down so check would fail if I rerun...

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Copy link
Copy Markdown
Owner

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, one suggestion, otherwise LGTM (:

v.c.OnRequest(func(request *colly.Request) {
v.rMu.Lock()
defer v.rMu.Unlock()
request.Ctx.Put("originalURL", request.URL.String())
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Suggestion: Handcrafted strings like this can be prone to errors. Use constant to reuse such key. Also it does not need to be string to be key. It's recommended to use empty struct (less mem allocated) e.g

var originalURLKey = struct{}{}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Get and Put methods seem to only take in strings as highlighted here. Should originalURLKey be passed in using fmt.Sprintf here? But that maybe wouldn’t help with memory as that would be the same as using a string…

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, this is something library specific not Go context.. ok, let's do constant at least

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants