Building a best-in-class automated de-identification tool for electronic health records through ensemble learning
Summary: The presence of personally identifiable information (PII) in natural language portions of electronic health records (EHRs) constrains their broad reuse.Despite continuous improvements in automated detection of PII, residual identifiers require manual validation and correction.Here, we describe an automated de-identification system that emp