This corpus of questions and answers is made available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Generic license (CC BY-NC-SA 4.0). Inquiries about commercial licensing should be directed to Prof. Norman Sadeh (sadeh@cs.cmu.edu). If you use the corpus (i.e., the questions, generated answers, or both) toward a publication, you must cite the following paper:
Lea Duesterwald, Ian Yang, Norman Sadeh, "Can a Cybersecurity Question Answering Assistant Help Change User Behavior? An In Situ Study", The Symposium on Usable Security and Privacy (USEC 2025), Feb 2025.
The above paper is also important for understanding the contents of the corpus, and the collection of questions and answers. The full paper can be found at https://par.nsf.gov/servlets/purl/10598526
This repository contains data collected as part of an in situ user study examining the effectiveness of a cybersecurity question-answering assistant.
The repository includes two data files:
-
security_qa_questions_and_answers.csv
This file contains 1,045 real-world cybersecurity questions asked by study participants. For each question, two answers generated by GPT-4 are provided:
- one answer generated with additional prompt engineering, and
- one answer generated without additional prompt engineering.
Each row corresponds to a single question and includes the following columns:
- question_asked
- answer_with_prompt_engineering
- answer_no_prompt_engineering
-
security_qa_full_context_data.json
This file contains anonymized, participant-level longitudinal data from the study. Each top-level JSON entry corresponds to a single participant and consists of a list of sublists in the following structure:
- The first sublist contains that participant’s initial survey responses.
- Each subsequent sublist corresponds to one interaction with the assistant and includes:
- the question asked,
- the answer received,
- the alternate answer that could have been received, and
- the participant’s evening survey responses for that interaction.
Together, these entries provide the full trajectory of each participant’s interactions and evaluations throughout the study.
The full lists of initial and evening survey questions (in order) are provided below.
Do you speak English?
What is your age in years? - Age:
In the past week, which of the following types of devices have you used at least once?
In the past week, which of the following types of devices, if any, that belongs to you did you allow someone else to use?
What gender do you identify with? - Selected Choice
What gender do you identify with? - Other (please specify) - Text
Please indicate how you identify yourself:
What best describes your employment status?
Have you ever worked in or studied in a computer-related field? (Computer Science, IT support, etc.)
What is the highest level of school you have completed or degree you have earned? - Selected Choice
What is the highest level of school you have completed or degree you have earned? - Other (please specify): - Text
Please indicate which other people, if any, live in your household. - Selected Choice
Please indicate which other people, if any, live in your household. - Other (please specify): - Text
What do you think is the likelihood of others observing your web browsing activity?
How concerned or unconcerned would you be if others observed your web browsing activity?
Rate your level of disagreement or agreement with the following statement: “I think I know how to use privacy tools to prevent others from observing my web browsing activity.”
How interested or uninterested would you be in learning to use privacy tools to prevent others from observing your web browsing activity?
How easy or difficult do you think it would be for you to use privacy tools to prevent others from observing your web browsing activity?
Rate your level of disagreement or agreement with the following statement: “If I were to start using privacy tools, in general I would prevent others from observing my web browsing activity.”
How concerned or unconcerned are you about the computer security risks of your daily internet use?
Rate your level of disagreement or agreement with the following statement: “I think I know how to use security tools to prevent computer security risks.”
How interested or uninterested would you be in learning better behaviors to protect yourself against security risks online?
How easy or difficult do you think it would be for you to change your practices to protect yourself from security risks online?
Rate your level of disagreement or agreement with the following statement: “Right now, my computer activity and internet use place me at a risk of security breaches.”
Rate your level of disagreement or agreement with the following statement: “If I had better advice on how, I would be able to use security tools and change my behavior to prevent security risks.”
Virtual Private Networks (VPNs) - Rate your knowledge:
Private Browsing - Rate your knowledge:
Cookies - Rate your knowledge:
Ad Blockers - Rate your knowledge:
Two-Factor Authentication (2FA) - Rate your knowledge:
Firewalls - Rate your knowledge:
Encryption - Rate your knowledge:
Antivirus Software - Rate your knowledge:
Browser Security Features (e.g., HTTPS) - Rate your knowledge:
Phishing - Rate your knowledge:
Scan your computer using antivirus software - Rate how often:
Use software security (e.g., firewall/VPN) - Rate how often:
Download things from the internet - Rate how often:
Download things from the internet without knowing exactly what you are downloading - Rate how often:
Clicking links or downloading attachments in emails not from people you know (e.g., advertisement emails or emails from unknown people) - Rate how often:
Mouse over links to see where they go before clicking them - Rate how often:
Back up information on an external hard drive or network - Rate how often:
Use strong passwords (including uppercase and lowercase letters and numbers and symbols) - Rate how often:
Provide any kind of personal information (e.g., age, name, financial information, etc.) to any kind of website. - Rate how often:
Use extra security (e.g., a VPN) when connected to public Wi-Fi - Rate how often:
Use encryption software to store sensitive information - Rate how often:
Hackers are more likely to target rich and important people - Rate your agreement:
Hackers target large businesses - Rate your agreement:
Hackers target home computer users - Rate your agreement:
Viruses create visible problems (I will always know when I have a virus) - Rate your agreement:
I can’t get a virus if I have anti-virus software installed - Rate your agreement:
The only way to get a virus is by downloading something or viewing ads - Rate your agreement:
I can avoid getting viruses by not downloading risky software - Rate your agreement:
I can get a virus from strange emails - Rate your agreement:
I can get a virus if I don’t pay attention to cookies - Rate your agreement:
Public Wi-Fi networks (e.g., coffee shops or airports) are just as safe as my home network - Rate your agreement:
How many terms were there in the answer that were confusing or difficult to understand (1 = no terms, 3 = a couple, 5 = enough confusing terms that the answer was not understandable)
Overall, how well were you able to understand the answer? (1 = not at all, 5 = I completely understood the answer)
How easy was it to follow the advice given (if instructions were given)? (1 = impossible to follow, 5 = very easy to follow)
Did you follow the advice provided by the assistant?
Why or why not? Please explain.
Do you plan to follow the advice in the future?
On a scale of 1 - 4 how would you rate the answer you received to this answer? (1 = not helpful at all, 2 = not terribly helpful 3 = somewhat helpful, 4 = very helpful)
On a scale of 1 - 4 how would you rate the above alternate answer?
(1 = not helpful at all, 2 = not terribly helpful 3 = somewhat helpful, 4 = very helpful)
If you had received the above alternate answer would you have found it more or less helpful?
If you had received the above alternate answer would you have been more or less likely to follow the advice provided (if applicable)?
This research has been supported in part by grants from the National Science Foundation under the SaTC program (grants CNS-1914486) and under the REU program, the latter in part through CMU's RE-USE Program (NSF grant 2150217). Additional support was also provided by CMU's Block Center under its Responsible AI initiative.