This repository holds the public view of code and materials for the Privacy Fingerprint - a structured calculation of unstructured text privacy
The NHS deal with huge amounts of sensitive data, structured and unstructured, which is private. Patient records, staff records, treatments, etc. all contribute to this corpus. Often this data is of value to researchers seeking new approaches to treatment, etc. but sharing such data with them is often challenging. This work seeks to quantify the privacy risk around their data (or subsets thereof). A ‘privacy fingerprint’ or ‘privacy risk score’ is desired to help articulate and quantify privacy risks. Of note, this is NOT a tool to anonymise or de-identify records – there are existing vendors of tools that seek to do this – but how can that risk be quantified in the first place? How might that risk score change after applying a particular privacy enhancing tool?
This goes beyond first order PII – identifying gender, age or even name is relatively straightforward, but particular diseases such as rare ones, their symptoms, familial relationships, etc. may not be part of an existing ontology and rather entered as free text. A good example of this is mental health records that often are in a long story format, rather than existing SNOMED codes. The tool will need to judge privacy and derive a risk score based on factors such as these; and explain this score, perhaps through categories.
Note: Only public or fake data are shared in this repository.
PROJECT ONGOING SO NO CODE AVAILABLE YET
- The main code is found in the root of the repository (see Usage below for more information)
- The accompanying report is also available in the
reportsfolder - More information about the code usage can be found in the model card
- {LIST OF MAIN PACKAGE VERSIONS}
To get a local copy up and running follow these simple steps.
To clone the repo:
git clone https://github.com/nhsx/privacyfingerprint
To create a suitable environment:
python -m venv _envsource _env/bin/activatepip install -r requirements.txt
{ADDITIONAL TECHNICAL SUPPORT AND NEEDS}
{DESCRIPTION OF CODE}
{LIST AND DESCRIPTION OF OUTPUTS}
{NOTES ON REPRODUCIBILITY OF RESULTS}
{DESCRIPTION AND LINKS TO DATASETS}
{LINK TO FAKE DATA TO SUPPORT INITAIL CODE RUNS}
See the {LINK TO REPO ISSUES} for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidance.
Unless stated otherwise, the codebase is released under the MIT Licence. This covers both the codebase and any sample code in the documentation.
See LICENSE for more information.
The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.
To find out more about the Digitial Analytics and Research Team visit our project website or get in touch at england.tdau@nhs.net.