Bluesky #
Bluesky Historical Posts Data Collector #
The Bluesky Historical Posts Data Collector retrieves historical posts that match a specified search query within a given date range, plus corresponding replies (up to the account’s limit). It is ideal for studying communities and discourse around a specific topic(s) or hashtag(s) on Bluesky.
Learn More
- To use this collector, you do not need to create a Bluesky account or apply for a separate API key.
- The search query is solely used to retrieve relevant posts that match your search criteria within the post’s text or the alt text of any media objects attached to the post.
- The search query is case-insensitive (e.g., searching for “toronto” will match both “toronto” and “Toronto”).
- The search query supports the following operators:
- Double quotes (“phrase”) to search for a phrase (e.g., the following query will return posts mentioning “New York”)
- The vertical line between keywords or phrases (keyword1 | keyword2) to apply Boolean OR (e.g., the following query will return posts mentioning either New York or Toronto, “New York” | Toronto )
- The minus sign in front of a keyword (-keyword) excludes posts with this keyword (e.g., the following query will return posts mentioning Canada, but not Toronto, Canada -Toronto). Note: There is no space between the minus sign and ‘Toronto’.
- Space between search keywords (keyword1 keyword2) to apply Boolean AND (e.g., the following query will return posts mentioning both Canada and Toronto, Canada Toronto)
- The search excludes results from active bot accounts such as nowbreezing.ntw.app (Note: we’ll expand the list of known bot accounts in the future.)
- This collector excludes posts from accounts that requested that their content only be shown to signed-in Bluesky users.
Step-by-step Tutorials
Bluesky Thread Data Collector #
The Bluesky Thread Data Collector retrieves replies to a specified post (up to the account’s limit). It is ideal for studying posts that have attracted high engagement on Bluesky.
Learn More
- To use this collector, you do not need to create a Bluesky account or apply for a separate API key; a Bluesky API key will automatically be generated during the collection process.
- Only publicly available replies are retrieved.
- This collector excludes posts from accounts who requested that their content only be shown to signed-in Bluesky users.
Step-by-step Tutorials
Bluesky User’s Timeline Data Collector #
The Bluesky User’s Timeline Collector retrieves posts, reposts, and corresponding replies (up to the account’s limit) from a public account. It is ideal for analyzing the accounts of influencers, politicians, or brands on Bluesky.
Learn More
- To use this collector, you do not need to create a Bluesky account or apply for a separate API key; a Bluesky API key will automatically be generated during the collection process.
- The collector can retrieve posts from one or several accounts.
- If you wish to collect timelines from more than one user within the same dataset, use a comma to separate each account name.
- If you need only to collect original/parent posts but not the corresponding replies or plan to ONLY perform a Civility, Sentiment, and/or Topic analysis on your dataset, we suggest you leave the box about collecting corresponding replies unchecked. This will significantly shorten your data collection time.
- However, if you plan to perform network analysis on your dataset using the Network Analyzer, please be sure to check the box that says collect corresponding replies. The replies will be used to build a communication network from your data.
- Only publicly available replies are retrieved.
- This collector excludes posts from accounts who requested that their content only be shown to signed-in Bluesky users.
Step-by-step Tutorials
Bluesky Profile Search Data Collector #
The Bluesky Profile Search Collector retrieves user details for public accounts. This collector offers three distinct modes:
- Profile Search: Collect profile information from public accounts that match search keywords
- Followers Collection: Collect profiles of public accounts following a given user
- Follows Collection: Collect profiles of public accounts that a given user follows
Learn More
- To use this collector, you do not need to create a Bluesky account or apply for a separate API key; a Bluesky API key will automatically be generated during the collection process.
- When available, the collector retrieves full profile details, including post counts, follower counts, bio information, and engagement metrics.
- This collector excludes detailed profile information from accounts that requested that their content only be shown to signed-in Bluesky users.
Mastodon #
Mastodon Recent Posts Data Collector #
The Mastodon Recent Posts Data Collector retrieves recent posts and replies from the “Live Feeds” – aka the “Homepage” of a specified public Mastodon server, plus corresponding replies (up to the account’s limit). It is ideal for studying communities and discourse on Mastodon.
Learn More
- You do not have to create a Mastodon account or apply for a separate Mastodon API key to use this collector.
- Visit https://mastodonservers.net/servers/top for a list of public Mastodon servers, aka Instances.
- If you need only to collect original/parent posts but not the corresponding replies or plan to ONLY perform a Civility, Sentiment, and/or Topic analysis on your dataset, we suggest you leave the box about collecting corresponding replies unchecked. This will significantly shorten your data collection time.
- However, if you plan to perform network analysis on your dataset using the Network Analyzer, please be sure to check the box that says collect corresponding replies. The replies will be used to build a communication network from your data.
Step-by-step Tutorials
Mastodon Hashtag Posts Data Collector #
The Mastodon Hashtag Posts Data Collector retrieves recent posts and replies containing a specified hashtag from the “Live Feeds” – aka the “Homepage” of a specified public Mastodon server, plus corresponding replies (up to the account’s limit). It is ideal for studying communities and discourse around a specific hashtag on Mastodon.
Learn More
- You do not have to create a Mastodon account or apply for a separate Mastodon API key to use this collector.
- Visit https://mastodonservers.net/servers/top for a list of public Mastodon servers, aka Instances.
- If you need only to collect original/parent posts but not the corresponding replies or plan to ONLY perform a Civility, Sentiment, and/or Topic analysis on your dataset, we suggest you leave the box about collecting corresponding replies unchecked. This will significantly shorten your data collection time.
- However, if you plan to perform network analysis on your dataset using the Network Analyzer, please be sure to check the box that says collect corresponding replies. The replies will be used to build a communication network from your data.
Step-by-step Tutorials
Mastodon User’s Timeline Data Collector #
The Mastodon User’s Timeline Data Collector retrieves recent posts and reblogs/reposts from a public account. It is ideal for analyzing the accounts of influencers, politicians, or brands.
Learn More
- You do not have to create a Mastodon account or apply for a separate Mastodon API key to use this collector.
- This data collector includes only posts from public accounts that are set to ‘public’ visibility.
Step-by-step Tutorials
Reddit #
Reddit Recent Posts Data Collector #
The Reddit Recent Posts Data Collector retrieves recent submissions (i.e. thread-starting posts) and replies from a public subreddit. The collector starts by retrieving up to 200 ‘new’ or ‘hot’ submissions in EDU or up to 900 ‘new’, ‘hot’, or ‘top’ submissions in PRO. The user can filter these submissions based on a specified search query. After retrieving the specified number of submissions, the collector retrieves then-available corresponding replies (up to the account’s limit). It is ideal for studying discourse on a subreddit.
Learn More
- Only publicly available posts, as retrieved through Reddit’s official API, are collected.
- To use this collector, users must create and link their Reddit account to their Communalytic account.
- The search query is used solely to retrieve relevant submissions (i.e., thread-starting posts). It does not impose any restrictions on comments made on submissions or replies to comments (including replies to replies).
- The search query supports various Boolean operators (AND, OR, NOT) and parentheses for grouping. For more details, see the Reddit documentation.
- The search query also supports the following filters: author:, flair:, site:, title:. For more details on how to use these filters, see the Reddit documentation.
Step-by-step Tutorials
Reddit Future Posts Data Collector #
The Reddit Future Posts Data Collector retrieves submissions (i.e. thread-starting posts) and replies for up to 7 consecutive days into the future from a public subreddit (up to the account’s limit). Data collection begins by retrieving the 100 most recent submissions and then continues to gather new submissions (if available) until the specified end date. If the user opts in, the script will also collect any replies to these submissions at the end of the collection period. This collector is ideal for collecting future posts from a specific subreddit. Researchers can use it to collect data about a developing or newsworthy event.
Learn More
- To use this collector, users must create and link their Reddit account to their Communalytic account.
- Comments to Reddit submissions and replies to comments are only collected at the end of the specified data collection period. If a comment or a reply has been deleted by the moderator(s) or the poster prior to the end date of your data collection, it will not be included in the final dataset.
- This collector will attempt to collect any new submissions within the specified data collection period; however, some posts in “high volume” groups (such as r/all) may be dropped due to Reddit API limitations.
Step-by-step Tutorials
TikTok #
TikTok Data Collectors #
- Collect metadata from publicly available TikTok videos and associated comments via the TikTok Research API (for those with access).
- Import TikTok JSON files created with Zeeschuimer, a browser extension that captures metadata from TikTok videos you view on the web interface.
Step-by-step Tutorials
Telegram #
Telegram Historical Posts Data Collector #
The Telegram Historical Posts Data Collector retrieves posts (up to the account’s limit) from one or several public Telegram channels, groups or supergroups within a given period. It is ideal for studying Telegram communities and discourse.
Learn More
- To use this collector, users must apply for a free Telegram Developer Account and obtain a Telegram API Key. Obtaining a Telegram API Key is a very involved process. You will need to create a Telegram developer account and use either a mobile or desktop client application for Telegram. Please budget an hour or so for the setup process.
- Once you have an API key (aka Telegram Session Key), add it to your Communalytic “My Profile” page before you can start data collection.
- Communalytic will store your key for your future use.
- Collecting data from multiple groups/channels simultaneously may exceed your daily Telegram API token limit.
Step-by-step Tutorials
X (Twitter) #
X Recent Posts Data Collector #
The X (Twitter) Recent Posts Data Collector retrieves posts from X published within the last seven days that match a specified search query. The collection size is limited by Communalytic’s account limit and Twitter API plan cap. It is ideal for studying X communities and discourse.
Learn More
- To use this collector, users must apply for an X Developers Account, obtain an X API key, and purchase a Twitter API plan.
- Once you have the Bearer token, add it to your Communalytic “My Profile” page before you can start data collection.
Step-by-step Tutorials
X Thread Data Collector #
The X Thread Data Collector retrieves the most recent public replies (up to Communalytic’s account limit or Twitter API plan cap) to a public post on X published within the last seven days. It is ideal for collecting conversations around a post that has attracted many replies.
Learn More
- To use this collector, users must apply for an X Developers Account, obtain an X API key, and purchase a Twitter API plan.
- Once you have the Bearer token, add it to your Communalytic “My Profile” page before you can start data collection.
Step-by-step Tutorials
YouTube #
YouTube Video Comments Data Collector #
The YouTube Video Comments Data Collector retrieves comments (up to the account’s limit) from a publicly accessible YouTube video. It is ideal for studying the discourse around a popular YouTube video.
Learn More
- To use this collector, users must apply for a free Google Developer Account and obtain a YouTube API key.
- The YouTube API is currently free, with a quota of 10k daily calls (typically returning 100 records per call), set by Google.
- Once you have a YouTube API key, add it to your Communalytic “My Profile” page before you can start data collection.
Step-by-step Tutorials
File Import #
Import From File #
The Import From File collector allows users to upload an existing social media or text-based dataset to Communalytic for analysis. The file must be saved in one of the supported formats: CSV, Excel, JSON, ZIP, GZ.
Learn More
- CSV files must be UTF-8 encoded.
- For larger files, you can compress the CSV/Excel file into a ZIP or GZ archive. (The ZIP or GZ file should contain only one CSV/Excel file.)
- If your dataset is from one of the listed social media platforms, use the provided templates to rename CSV columns to ensure that Communalytic can properly recognize the data fields.
- If you’re ONLY interested in using one of Communalytic’s textual analysis modules (e.g., Toxicity, Sentiment and/or Topic Analyzer), a CSV/Excel file with a single column called ‘text‘ will suffice. This is ideal for analyzing any type of textual data.
- If you’re interested in using some or ALL available data analysis modules in Communalytic (e.g. Toxicity Analyzer, Sentiment Analyzer, Topic Analyzer, Network Analyzer, Time Series, Word & Emoji Cloud and Top Posters), your CSV file should include some or ALL the following columns:
- text (Req. for Toxicity, Sentiment, Topic Analyzer, and Word/Emoji Cloud)
- created_at (Req. for Time Series) [i.e., when the post was created; example: 10/14/2022 19:03 OR 2020-03-13 23:15:56]
- user_screen_name (Req. for Top Posters and Network Analysis) [i.e., who created the post]
- in_reply_to_screen_name (Req. for Network Analysis) [i.e., recipient of the post, for replies only]
Step-by-step Tutorials