This codebase is a proof of concept and should only be used for demonstration purposes within a controlled environment. The components are not a live product and should not be deployed in a live or production environment.
We further recommend looking for the most recent versions of the individual components in their original repositories.
This discovery project seeks to investigate possible approaches to building a data set of NHS focussed text sources for the purposes of training and benchmarking NLP models in the NHS. You can read more about it in the blog here.
The aim was to test thinking and feasibility of such a solution by exploring aspects of:
- infrastructure, scalability and maintenance
- possible data sources, appropriate metadata, clinical input and required governance
- possible use cases of the outputs for training, benchmarking, validating and testing
This repository contains aspects of the tooling used during the discovery phase.
Note: No data, public or private are shared in this repository.
-
appstackfolder contains scripts and configuration files to deploy the stack either on AWS Elastic Container Service or to deploy on a local system running Docker. Please refer to the folder README for further details. -
doccano_autolabellingfolder contains a script to implement an trial autolabelling approach into thedoccanodeployment. -
scrapersfolder contains a scraper framework as well as a number of implemented scrapers. Please refer to the folder README for further details. -
user_storiesfolder contains a copy of the user stories which were identified as part of this discovery work.
This repository is exploratory, pre-alpha code that has been developed for demonstration and evaluation purposes only. It is not to be used as a live service. No testing has been performed apart from ad-hoc trials and tests by its developers. No guarantees are made as to its performance.
Although containing code to deploy as a cloud app, no auto-scaling or redundancy mechanisms have been built. No security reviews have been performed and therefore no guarantees are made as to the security of this release.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidance.
Distributed under the MIT License. See LICENSE for more information.
To find out more about the Analytics Unit visit our project website or get in touch at england.tdau@nhs.net.