Inspiration
The process of investigating war crimes usually involves working with and analyzing significant amounts of unstructured natural language texts, for instance in first-hand victim or witness statements, or in news stories or social media posts reporting on the incidents. In addition, in the era of mass communication hate speech has become an inevitable precursor to atrocities and war crimes and documenting instances of hate speech and incitement in both traditional and social media is a necessary component of digital open-source investigations into war crimes.
Natural-language understanding is a powerful tool for investigators for extracting structured information from free-form texts like victim and witness statements and law-enforcement and first-responder reports. This information can be used to make a more effective search, discovery, and categorization tool for investigators, or as input data for other machine learning applications like predicting the truthfulness of statements. In addition, the text of traditional news articles and transcripts of TV and radio broadcasts are usually copyrighted, which makes storage of this content by third parties problematic. Structured syntactic and semantic information extracted by natural-language understanding provides an alternative set of data that can be stored to facilitate search and analysis of hate speech without running afoul of copyright restrictions in different countries.
What it does
citizen5 is a secure, decentralized tool for open-source digital investigations into war crimes designed to resist disinformation campaigns and attempts to discredit investigations by promoting transparency in the investigative process. All participants in an investigation are anonymous to each other by default and identified only by a public/private key-pair used to sign and verify messages and submissions from the citizen5 client sent through the Nym network-level anonymizer. All reports, files, analyses, and other artifacts of investigations including activity logs of investigators are stored in a decentralized document database using IPFS storage where they can be accessed and inspected by anyone and cannot be censored or blocked by the state.
citizen5 is envisioned as an open-source tool for implementing the Berkeley protocol by providing investigators with a secure decentralized transparent service for collecting, analyzing, and verifying evidence of war crimes.
citizen5 was created as a submission to the Kyiv Tech Summit hackathon. For this hackathon, I added the ability to analyze natural language free text in statements, testimonies, traditional and social media articles and posts using expert.ai. The expert.ai NLU APIs are utilized in the followiing way:
Detecting and categorizing hate: News articles, transcripts of TV and radio broadcasts, and other media are stored using metadata properties from the Dublin Core Metadata Set together with semantic information extracted on the presence and categorization of hate speech, the entities or names or places mentioned and so on. This facilitates the searching and categorization and analysis of media items containing hate speech without requiring the rights to store the text.
Detecting PII: Victim and witness testimonies may contain PII that could pose a security risk to individuals when publicized. Identifying testimonies or other texts with PII would allow investigators both to extract this information and anonymize the texts before they are made public
Detecting entities like people and place names: This facilitates better search and categorization of testimonies and statements
Extracting relations from statements and testimony. This facilitates semantic search over free text accounts and statements e.g find relations like
<x> killed prisoners
How we built it
- Written in Go.
- Uses the Nym network-level anonymizer service. citizen5 instances use the Nym websocket client to send and receive encrypted messages to one another anonymously. A citizen5 server instance runs as a Nym service provider
- Uses the go-orbit-db P2P database.
- Provides a REST server. For data reporting and analysis apps citizen5 provides an ordinary REST server that can be accessed by apps like Python notebooks
- Use expert.ai NLU APIs to add semantic information on hate speech, PII, entities, and other information extracted from witness and victim statements, first-responder reports, and other free text sources.
Challenges we faced
There were 2 minor issues I encountered using the expert.ai API from Go and I filed a PR and issue report for them.
Things we learned
I learned a lot about working with the expert.ai APIs and connecting to OpenAPI spec REST servers from Go.
What's next for citizen5
I continue to work on citizen5 to implement planned features according to the NymTech AnonDrop spec
Built With
- expertai
- go
- ipfs
- nym


Log in or sign up for Devpost to join the conversation.