FAQ

Welcome to Communalytic! This page features answers to some common FAQs. As Communalytic is still under active development, the answers here are subject to change. We’ll update this FAQ periodically as we continue to release more features. Thank you for joining our research community!


Frequently Asked Questions

Overview

  • What is Communalytic?

    Communalytic is a no-code computational social science research tool for studying online communities and public discourse on social media. It is designed to provide researchers, journalists, and students with essential resources and infrastructure for conducting independent, public-interest research. It has a full suite of easy-to-use social media data collectors – no coding required. Users can bring their own data or use one of Communalytic’s various social media data collectors to gather data from platforms such as Bluesky, Mastodon, Reddit, Telegram, X (formerly Twitter), and YouTube.  

    In addition to data collectors, Communalytic also comes with a comprehensive suite of built-in data analyzers capable of processing both social and non-social media data, including a Civility Analyzer, a Sentiment Analyzer, a Topic Analyzer, and a Network Analyzer. Many of the analyzers are equipped with advanced AI features that assist researchers in conducting complex analyses, offering valuable insights and recommendations as needed. Researchers maintain complete control throughout the process, allowing them to add essential context and expertise and make all final decisions.

  • How to cite Communalytic?

    If you are using Communalytic in an academic publication, please cite us as: 

    • Gruzd, A., & Mai, P. (<access year>). Communalytic: A no-code computational social science research tool for studying online communities and public discourse on social media. Available at https://Communalytic.org
  • Which version of Communalytic (EDU or PRO) should I use?

    There are two versions of Communalytic: EDU and PRO.

    • Communalytic EDU is designed to help students learn about social media data analytics and social network analysis.
    • Communalytic PRO is designed for researchers, journalists and other stakeholders and is ideal for large-scale research projects. It provides users with the resources and infrastructure necessary for conducting independent research in the public interest. 

    Each version is hosted on its own dedicated server and has its own account creation and sign-in processes. Users of Communalytic can share datasets with other users using the same version of Communalytic (i.e., EDU users with EDU users and PRO with PRO).

  • How many datasets can I have?

    • Communalytic EDU account can collect and store ≤30k records shared across ≤3 datasets (i.e., per account, you can have 1 dataset with ≤30k records or up to 3 datasets with a variable number of records not exceeding 30k records in total).
    • Communalytic PRO account can collect and store ≤3M records shared across ≤30 datasets per account (each dataset can store up to 1M records). 
  • Can I add more datasets to my Communalytic account?

    • If you’re at your account limit, you can delete and/or download your existing datasets to free up space. 
    • Alternatively, you can upgrade to Communalytic PRO where you can collect and store ≤3M records shared across ≤30 datasets. 

Data Collection

  • Do I need to apply to the platforms for permission to access their APIs?

    Communalytic provides stable and efficient data access via official APIs owned and maintained by various social media platforms; it does not “scrape the data”. Depending on the platform, users may need to request and/or pay separately for access. For example, accessing data from X currently requires a separate paid API plan. Please note that social media data and API access in Communalytic are granted solely at the discretion of their respective owners and may be revoked at any time; therefore, Communalytic cannot guarantee access to any specific platform. See our FAQ  and Tutorials for platform-specific details and access instructions. 

    1. BlueSky: You do not need to apply separately for API access. Bluesky automatically generates an API key upon request for each user/session. 
    2. Mastodon: You do not need to create a Mastodon account or apply for a separate API key. Mastodon automatically generates an API key upon request for each user/session. 
    3. Reddit: Sign up for Reddit and link your Reddit account to Communalytic.
    4. Telegram: Request a free Telegram Developer Account
    5. X (Twitter): Request a Twitter Developer Account and purchase a Twitter Basic (10k tweets/month) or Pro plan (1M tweets/month)
    6. YouTube: Request a free YouTube API key from Google. 
    7. Perspective API (for Civility Analysis): Request a free Perspective API key from Google.
  • What types of data are available via Communalytic EDU?

    Communalytic EDU is designed for teaching and learning and has a reduced feature set with less storage. EDU accounts can typically only collect and store up to 30k records, shared across a maximum of 3 datasets. However, due to limited computing resources and/or API restrictions, there may be additional platform-specific limits on how much data—such as posts and replies—can be collected. 

  • What types of data are available via Communalytic PRO?

    Communalytic PRO is designed for research and has a full feature set with more storage and higher platform-specific data limits. PRO accounts can collect and store up to 3M records, shared across a maximum of 30 datasets. However, due to limited computing resources and/or API restrictions, there may be additional platform-specific limits on how much data—such as posts and replies—can be collected. More information is available below.

  • Can I collect data that is private such as DMs or posts from private groups?

    No, it is not possible for Communalytic to collect data that is meant to be private such as DMs or posts from accounts that are set to private.

    The developers of Communalytic are proponents of ethical computational social science research in the public interest. If you are working with social media data, we encourage you to review and follow ethical and best practices guidelines established by your institution and applicable laws in the jurisdiction where you reside. 

    As a primer, please review “Ethical Decision-Making and Internet Research ” published by the Association of Internet Researchers (AOIR).

  • Can I run multiple data collectors simultaneously?

    You can concurrently run one data collector for each available data source. This feature is ideal for studying dynamic, newsworthy events where you want to study how the same phenomenon or event plays out across multiple platforms during the same time period. 

Data Management

  • How long will you keep my datasets on your server?

    • For Communalytic EDU accounts, datasets are kept for 100 days from the end of their collection date. You will receive a notification 3 weeks before the expiration date and 3 days before your dataset is automatically deleted from our system.
    • For Communalytic PRO accounts, datasets are kept until the expiry of the PRO account. You can extend your PRO account at anytime in 3,6,12-month increment via the My Profile menu. You will receive a 7-day notification before your PRO account’s expiration date. After your account has expired, you will have 14 days to upgrade it before your account and datasets are automatically removed from our system.
  • Can I move datasets between the EDU and the PRO version?

    • There is no direct or automatic option to achieve this. 
    • However, you can manually move a dataset from the EDU version to the PRO version (and vice versa) by downloading it as a CSV file first and then uploading it to another account. 
    • Please note that due to the EDU data cap, the transfer from the PRO to the EDU version is limited to datasets with ≤30K records.
  • Can I share my dataset with my collaborators?

    Users of Communalytic can share datasets with other users using the same version of Communalytic, i.e., EDU users with EDU users and PRO with PRO. 

    • You can share datasets that you have collected with collaborators from within Communalytic under the ‘My Datasets’ tab.

    • You can accept datasets that have been shared with you within Communalytic under the ‘Shared with Me’ tab. (Look for a jingling red bell.) 

  • Can I upload/import my own dataset?

    Yes, you can upload/import an existing dataset into Communalytic for analysis, subject to the following caps:

    • Communalytic EDU: File size: <10Mb; Dataset size: ≤30K records;
    • Communalytic PRO: File size: <100Mb; Dataset size: ≤1M records; 

    For larger files, you can compress the CSV file into a ZIP or GZ archive. (The ZIP or GZ file should contain only one CSV file.)

    If your dataset is from one of the listed social media platforms, use the provided templates to rename CSV columns to ensure that Communalytic can properly recognize the data filed: 

    If you’re ONLY interested in using one of Communalytic’s textual analysis modules (e.g., Toxicity, Sentiment and/or Topic Analyzer), a CSV file with a single column called ‘text’ will suffice. This is ideal for analyzing any type of textual data.

    If you’re interested in using Some or ALL available data analysis modules in Communalytic (e.g. Toxicity Analyzer, Sentiment Analyzer, Topic Analyzer, Network Analyzer, Time Series, Word & Emoji Cloud and Top Posters), your CSV file should include Some or ALL the following columns:created_at

    • text (Req. for Toxicity, Sentiment, Topic Analyzer, and Word/Emoji Cloud)
    • created_at (Req. for Time Series) [i.e., when the post was created; example: 10/14/2022 19:03 OR 2020-03-13 23:15:56]
    • user_screen_name (Req. for Top Posters and Network Analysis) [i.e., who created the post]
    • in_reply_to_screen_name (Req. for Network Analysis) [i.e., recipient of the post, for replies only]
  • How do I properly open a dataset (csv) that has been exported from Communalytic in Excel?

    If you have exported a dataset from Communalytic as a CSV file, and now wish to view and analyze it further in Excel, follow the steps in this tutorial to learn how to properly open it in Excel.

    Important: Do not double click on the CSV file to open it in Excel. Double clicking to open will cause Excel to improperly display emojis and other special characters. It may also corrupt some of the fields that store unique identifiers for posts and users (as these fields are usually represented as a long sequence of digits) which Excel will try to interpret as integers and will likely fail and corrupt your data.

  • Can I download my datasets?

    • Yes, you can download your datasets as a CSV or Excel file along with all toxicity, prosocial and sentiment polarity scores. 
    • In addition, you can download the resulting network files as a GraphML file from the Network Analyzer and the embeddings and clusters from the Topic Analyzer

Data Analysis and Visualization

  • Can I collect and analyze non-English posts?

    You can collect and analyze data in different languages. 

    1. The summary charts on the Dataset Overview Page (Time series, Interactive Word Cloud, Emoji Cloud, …) are all language agnostic. 
    2. For the most current info on this topic, visit our Learn More about Communalytic Data Analyzers page.
  • What types of summary charts can Communalytic automatically generate about my dataset?

    Communalytic automatically generates various summary charts for each dataset, accessible in the Dataset Overview Module. These charts provide a quick preview of the dataset, allowing users to verify if their specified search criteria returned the desired results. All charts are available for download as PNG images or CSV data files. In addition, users also have the option to further explore and visualize the data from the summary charts in Plotly Chart Studio, a popular data visualization tool for structured data.  

    Charts in the Dataset Overview Module

    • Posts Per Day Chart
      • This chart shows the number of posts per day over time.
    • Word Cloud Chart
      • This chart shows the 100 most frequently used words based on your full dataset. It excludes numbers, URLs, and stop words in 15 different languages.
    • Emoji Cloud Chart
      • This chart shows the 100 most frequently used emojis based on your full dataset.
    • Top 10 Posters
      • This chart shows the Top 10 posters in your dataset. 
  • How to conduct toxicty and/or prosocial analysis with Communalytic’s Civility Analyzer?

    The Civility Analyzer identifies toxic and prosocial interactions in a dataset. Users can analyze their dataset using one of two machine learning models: Perspective API and Detoxify. The analyzer calculates toxicity scores (such as Toxicity, Insult, and Threat) and prosocial scores (like Compassion, Curiosity, and Respect) for each record in the dataset.  

  • How to perform a sentiment analysis with Communalytic’s Sentiment Analyzer?

    The Sentiment Analyzer calculates sentiment polarity scores to determine whether text in a dataset expresses a positive, negative or neutral sentiment. Sentiment analysis is performed using the following libraries: VADER (supports English and Portuguese), TextBlob (English, French, German), and Dostoevsky (Russian). Records (e.g., social media posts, survey responses, interview transcripts) that are in English are analyzed using both VADER and TextBlob. Users can inspect conflicting polarity scores from both libraries and decide which library is better suited, i.e., more accurate for analyzing their dataset. 

  • How to perform a topic analysis with Communalytic’s Topic Analyzer?

    The Topic Analyzer discovers latent topics (i.e., abstract topics that may not be observable from just reading the text) based on the semantic similarity between posts (aka records) in a dataset. The analyzer transforms text into embeddings (computer-readable vector of numbers). Once transformed, texts are clustered based on their semantic similarity and visualized via an interactive 3D Semantic Similarity Map. 

  • How to perform a network analysis with Communalytic’s Network Analyzer and what types of networks can Communalytic generate and visualize?

    The Network Analyzer generates and visualizes various types of networks in a dataset, including communication and link-sharing networks.

    Network Types Available in Communalytic:

    • Reply-To Network: Account-to-Account
      • This communication network shows who replied to whom.  
    • Repost Network: Account-to-Account
      • This communication network shows who reposted whom. 
    • Two-Mode Link Sharing Network: Account-to-Website
      • This ‘link sharing’ network shows which accounts in the dataset shared a link to the same website(s).
      • It’s ideal for detecting coordinated inauthentic behaviours by bot networks or organized groups of users. 
      • The resulting network data can be downloaded as a GraphML or Gephi file for use with other network analysis software packages.