Skip to content
This repository was archived by the owner on Jul 28, 2023. It is now read-only.

Script to get a post using Rest API to test it with the vDOM script.#112

Merged
cbravobernal merged 7 commits intoperf-and-stress-testingfrom
perf-stress-testing/add-post-page-step
Dec 2, 2022
Merged

Script to get a post using Rest API to test it with the vDOM script.#112
cbravobernal merged 7 commits intoperf-and-stress-testingfrom
perf-stress-testing/add-post-page-step

Conversation

@cbravobernal
Copy link
Copy Markdown
Collaborator

@cbravobernal cbravobernal commented Nov 30, 2022

I created an initial draft of a script that reads a CSV domains file, then, per domain it makes a request to the REST API in order to get the URL of the first post found and add it to another CSV.

I'm currently testing it and I'm having tons of 403 and sites with the REST API not enabled. Until I get a percentage to check if we should consider this approach, I will leave this PR as a draft.

There also some tasks pending or other possibilities:

  • Check if we can use RSS to get the link of the post instead of the REST API.
  • Use a DB to be able to stop and continue later with the script execution.

@SantosGuillamot
Copy link
Copy Markdown
Contributor

Thanks for opening this @c4rl0sbr4v0 !

Just to keep it in mind, some sites may have disabled access to the REST API and we will hit a "You are not allowed to access the REST API" (or something similar) message instead of the post.

@cbravobernal
Copy link
Copy Markdown
Collaborator Author

cbravobernal commented Nov 30, 2022

Just to keep it in mind, some sites may have disabled access to the REST API and we will hit a "You are not allowed to access the REST API" (or something similar) message instead of the post.

Yes! In that case, at the moment, we just skip it, catch and console. We can get the status and the response message if needed. It's the first draft just to keep the record and an initial number of URLs.

@SantosGuillamot
Copy link
Copy Markdown
Contributor

Yes! In that case, at the moment, we just skip it, catch and console. We can get the status and the response message if needed. It's the first draft just to keep the record and an initial number of URLs.

Okay, great! 🙂 For reference, we could encounter at least two different messages:

Comment on lines +27 to +31
if (rss) {
await fs.writeFileSync('./benchmark/data/posts_rss.csv', '');
} else {
await fs.writeFileSync('./benchmark/data/posts.csv', '');
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed, as appendFileSync creates the file if it does not exist.

@SantosGuillamot
Copy link
Copy Markdown
Contributor

I've just thought that we may want to skip Cloudflare sites if we don't want to get blocked. We could do something like this, I guess.

@cbravobernal cbravobernal marked this pull request as ready for review December 1, 2022 17:17
@cbravobernal
Copy link
Copy Markdown
Collaborator Author

I think is stable enough to be review ready. I'm getting about a 50% of posts using this script.

@SantosGuillamot
Copy link
Copy Markdown
Contributor

I was testing it, and it looks great so far. I am encountering many sites that are registering a post with many encoded characters in the url. Something like this: /%d8%a3%d9%87%d9%84%d8%a7-%d8%a8%d8%a7%d9%84%d8%b9%d8%a7%d9%84%d9%85/.

Is that expected? More than 50% of the urls I got are like that.

Copy link
Copy Markdown
Contributor

@SantosGuillamot SantosGuillamot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some minor comments. Apart from that, it looks great to me 🙂

I was wondering as well if it could make sense to try fetching the post through the REST API when the RSS fails. Basically because I think you already had that logic. But maybe it is not worthy. I leave it up to you 🙂

@cbravobernal
Copy link
Copy Markdown
Collaborator Author

I was wondering as well if it could make sense to try fetching the post through the REST API when the RSS fails.

We can do it, but also it will multiply the time of the script running by x1.8 more or less. I don't know if it worths to be honest.

@cbravobernal
Copy link
Copy Markdown
Collaborator Author

I was testing it, and it looks great so far. I am encountering many sites that are registering a post with many encoded characters in the url. Something like this: /%d8%a3%d9%87%d9%84%d8%a7-%d8%a8%d8%a7%d9%84%d8%b9%d8%a7%d9%84%d9%85/.
Is that expected? More than 50% of the urls I got are like that.

Japanese and other special characters!!

zoes.tw/2021/10/23/1017%e6%a1%83%e5%9c%92amour%e8%ad%89%e5%a9%9a-%e5%87%b1%e9%88%9e%e6%9c%b1%e7%91%9c-%e5%a9%9a%e7%a6%ae%e9%8c%84%e5%bd%b1-%e7%b2%be%e8%8f%af%e7%89%88/ turns to be:
Screenshot 2022-12-02 at 11 16 13

Copy link
Copy Markdown
Contributor

@SantosGuillamot SantosGuillamot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🙂

@cbravobernal cbravobernal merged commit c8aa7cd into perf-and-stress-testing Dec 2, 2022
@cbravobernal cbravobernal deleted the perf-stress-testing/add-post-page-step branch December 2, 2022 11:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants