Sanitizing information to comply with data privacy standards
The goal of the anonymization routines is to anonymize and de-identify the protected health information in the clinical trial datasets based on the rules defined by HIPPA and by compliance legal department of the company.
Use Machine Learning to Automate Anonymization
The machine identifies and applies the last applied rules bypassing data classification and user review.
We make use of “NLP POS” recognition and named entity extractions to annotate the unstructured data.
- NLP POS Recognition
- Named Entity Extractions
- Master Data Elements
Machine Learning Training for Document/Sentence Classification
How is Anonymization useful?
In this process, the comparison of column values is done across different tables and a hash code against the column is generated. Irrespective of what the column name is labeled across different tables, if the column shares the same data, then a score will be generated from 0 to 1 based on how much data is matched. Then, the mapping of the data will be done and the data will be merged. This score will be generated using an algorithm.
For example, if there are different tables where the column is labeled as “col”,” column”,”col1”, but the data which is shared in the columns are same, then the data is checked. A hash will be generated against that column, a score between 0 to 1 is generated and then the mapping of the data takes place by merging the columns.