Skip to content

pitthexai/HexAI-TJAtxt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 

Repository files navigation

HexAI-TJAtxt

A Textual Dataset to Advance Open Scientific Research in Total Joint Arthroplasty

Total joint arthroplasty (TJA) is the most common and fastest inpatient surgical procedure in the elderly, nationwide. With the growing number of TJA patients and advances being made in healthcare settings, an increasing number of scientific articles are now published in a daily basis, representing invaluable information in TJA, ranging from TJA diagnosis, prevention, and treatment strategies to genetic variants and epidemiological factors. However, little is done to computationally assemble a large-scale textual dataset from scientific articles, and make it publicly available for open scientific research in TJA. Rapid yet computational text analytics on such a large-scale scientific literature has a great potential to discover novel knowledge in better understanding joint diseases, improving the quality of care and clinical outcomes for TJA. The current dataset entitled HexAI-TJAtxt includes more than 61,936 scientific abstracts collected from PubMed using MeSH (Medical Subject Headings) terms within "MeSH Subheading" and "MeSH Major Topic", and Publication Date from 01/01/2000 to 12/31/2022. The current dataset is freely and publicly available at here, and it will be updated frequently in a bi-monthly manner from new abstracts published at PubMed.

Value of the Data:

  • Research Advancement: The HexAI-TJAtxt dataset provides a comprehensive collection of scientific abstracts related to total joint arthroplasty, assisting researchers, clinicians, health informaticians, and physicians to explore the upmost body of knowledge in the field and identify research gaps and areas.

  • Extensive Coverage: The current textual dataset comprises over 61,936 scientific abstracts from PubMed, providing a comprehensive collection of research on total joint arthroplasty (TJA) from the year 2000 to 2022, with bi-monthly updates from new abstracts that will be published at PubMed.

  • Invaluable Information: Individual scientists from different disciplines will delve into this dataset, gaining new insights and enhance their understanding of joint diseases, ultimately contributing to improved patient care and clinical outcomes in TJA.

  • Open Access for Scientific Research: Making this dataset publicly and freely available will foster open scientific research in the field of TJA.

  • Evidence-Based Medicine The HexAI-TJAtxt empowers researchers and clinicians to make evidence-based decisions, facilitating literature reviews, meta-analyses, and systematic reviews related to TJA.

  • Interdisciplinary Research: The HexAI-TJAtxt dataset encourages collaboration and knowledge exchange between researchers from different disciplines. Orthopedic surgeons, geneticists, epidemiologists, data scientists, AI scientists, and other experts can explore the dataset together, fostering interdisciplinary research and facilitating a holistic understanding of TJA.

  • Rapid Text Analytics: The HexAI-TJAtxt dataset offers an opportunity for computational text analytics on a large-scale scientific literature. Researchers can employ natural language processing (NLP) techniques, machine learning algorithms, and other computational tools to extract valuable insights, discover patterns, and identify novel associations within the dataset, in a timely fashion.

  • Future Dataset Expansion: The dataset will serve as a foundational data source for future dataset expansions, allowing for the inclusion of additional articles and updates to ensure the dataset remains up-to-date and representative of the research landscape in total joint arthroplasty.

The proposed pipeline to build the HexAI-TJAtxt textual dataset:

alt text

Utilizing this proposed pipeline, the HexAI-TJAtxt dataset will be frequently updated in a bi-monthly manner employing new abstracts published at PubMed.

HexAI-TJAtxt dataset: [Last Update: August 15, 2023]

Here, you can download the HexAI-TJAtxt dataset.

HexAI-TJAtxt_Aug2023_XLSX.zip [Download]

HexAI-TJAtxt_Auge2023_CSV.zip [Download]

HexAI-TJAtxt_Aug2023_JSON.zip [Download]

Collaborators:

Acknowledgements

This work was supported in part by Oracle Cloud credits and related resources provided by Oracle for Research. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Oracle for Research.

Publications:

The HexAI-TJAtxt dataset is fully explained in the following paper published at the Data in Brief journal. Any publication using the dataset would require to cite the following work:

[1] Amirian S, Ghazaleh H, Carlson LA, Gong M, Finger L, Plate JF, Tafti AP. HexAI-TJAtxt: A textual dataset to advance open scientific research in total joint arthroplasty. Data in Brief. 2023 Oct 31:109738. [Paper]

About

A Textual Dataset to Advance Open Scientific Research in Total Joint Arthroplasty

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages