Frequently Asked Questions
1. Does the OpenDeID corpus contain real patient identifiers?
No, the OpenDeID corpus prioritizes patient privacy. We meticulously ensure that no identifiable information is included. Instead, the corpus utilizes surrogate data to protect patient confidentiality. You can find more details on surrogate generation in the provided publications section here.
2. Can I use existing patient consent for data access?
Unfortunately, the original patient consent does not cover distribution of the OpenDeID corpus. Researchers must first submit a detailed scientific research protocol for review by a data access committee. Approval is required before researchers can access the corpus. Patient consent forms and details on the consenting process can be found here.
3. Can I share the OpenDeID corpus with collaborators?
Sharing the OpenDeID corpus is strictly prohibited. Access is limited to researchers who have received approval from the data access committee.
4. How are the surrogates generated?
We encourage you to explore the following publications for a comprehensive understanding of surrogate generation:
Alla, N. L. V., Chen, A., Batongbacal, S., Nekkantti, C., Dai, H., & Jonnagaddala, J. (2021). Cohort selection for construction of a clinical natural language processing corpus. Computer Methods and Programs in Biomedicine Update, 1, 100024. https://doi.org/10.1016/j.cmpbup.2021.100024
Jonnagaddala, J., Chen, A., Batongbacal, S., & Nekkantti, C. (2021). The OpenDeID corpus for patient de-identification. Scientific Reports, 11(1). https://doi.org/10.1038/s41598-021-99554-9
Chen, A., Jonnagaddala, J., Nekkantti, C., & Liaw, S. (2019). Generation of Surrogates for De-Identification of Electronic Health Records. PubMed, 264, 70–73. https://doi.org/10.3233/shti190185 .
Liu, J., Gupta, S., Chen, A., Wang, C. K., Mishra, P., Dai, H. J., Wong, Z. S., & Jonnagaddala, J. (2023). OpenDeID Pipeline for Unstructured Electronic Health Record Text Notes Based on Rules and Transformers: Deidentification Algorithm Development and Validation Study. Journal of medical Internet research, 25, e48145. https://doi.org/10.2196/48145
5. How does the OpenDeID corpus ensure patient privacy?
The OpenDeID corpus prioritizes patient privacy through meticulous manual annotation. Every data point is replaced with surrogate information, eliminating the presence of any identifiable patient data. This rigorous process upholds the strictest privacy standards.
6. Does the OpenDeID corpus focus on a specific type of pathology report?
The OpenDeID corpus primarily consists of pathology reports related to cancer. This focus creates a valuable dataset for applications in oncology research and related medical fields.
7. Is ethics approval required to access the OpenDeID corpus? If so, what is the turnaround time?
Yes, obtaining ethics approval is mandatory before accessing the corpus. The approval process typically takes 2-3 weeks, although it may be completed in a shorter timeframe depending on the specific project details. Refer to the SREDH Consortium Governance for a more comprehensive explanation.