Script to get a post using Rest API to test it with the vDOM script.#112
Conversation
|
Thanks for opening this @c4rl0sbr4v0 ! Just to keep it in mind, some sites may have disabled access to the REST API and we will hit a "You are not allowed to access the REST API" (or something similar) message instead of the post. |
Yes! In that case, at the moment, we just skip it, catch and console. We can get the status and the response message if needed. It's the first draft just to keep the record and an initial number of URLs. |
Okay, great! 🙂 For reference, we could encounter at least two different messages:
|
benchmark/getPostUris.mjs
Outdated
| if (rss) { | ||
| await fs.writeFileSync('./benchmark/data/posts_rss.csv', ''); | ||
| } else { | ||
| await fs.writeFileSync('./benchmark/data/posts.csv', ''); | ||
| } |
There was a problem hiding this comment.
This is not needed, as appendFileSync creates the file if it does not exist.
|
I've just thought that we may want to skip Cloudflare sites if we don't want to get blocked. We could do something like this, I guess. |
|
I think is stable enough to be review ready. I'm getting about a 50% of posts using this script. |
|
I was testing it, and it looks great so far. I am encountering many sites that are registering a post with many encoded characters in the url. Something like this: Is that expected? More than 50% of the urls I got are like that. |
SantosGuillamot
left a comment
There was a problem hiding this comment.
I've added some minor comments. Apart from that, it looks great to me 🙂
I was wondering as well if it could make sense to try fetching the post through the REST API when the RSS fails. Basically because I think you already had that logic. But maybe it is not worthy. I leave it up to you 🙂
We can do it, but also it will multiply the time of the script running by x1.8 more or less. I don't know if it worths to be honest. |

I created an initial draft of a script that reads a CSV domains file, then, per domain it makes a request to the REST API in order to get the URL of the first post found and add it to another CSV.
I'm currently testing it and I'm having tons of 403 and sites with the REST API not enabled. Until I get a percentage to check if we should consider this approach, I will leave this PR as a draft.
There also some tasks pending or other possibilities: