Personally Identifiable Information (PII) extraction
Overview
Medical documents inherently contain Personally Identifiable Information (PII), which is crucial for associating records with individual profiles and building comprehensive longitudinal data. At the same time, securely redacting PII is vital for workflows such as data anonymization for AI model training and secure document sharing. EkaCare’s advanced PII extraction API empowers businesses to seamlessly identify and manage PII, enabling secure, efficient, and compliant handling of sensitive medical data.
EkaCare’s PII extraction solution leverages its customised vision-LLM to accurately extract structured data such as name, age, gender and medical facility name. Designed specifically for the Indian healthcare ecosystem our solution offers high level of accuracy and doesn’t involve human in a loop.
This service offers:
- Extraction of:
- Patient Name
- Patient Age
- Patient Gender
- Doctor Name
- Mobile Numbers
- Facility Names
- Dates
- Ability to work with PDFs as well as scanned/clicked images of prescriptions Example**
Use Cases
- Improved medical profiling: Seamlessly associate medical documents with individual patient profiles along with other critical information such as age, gender and dates.
- PII Redaction: Ensure data anonymization for secure document sharing and AI training workflows.
Technology Deep-Dive
Our PII extraction service is powered by or custom Large Language Models (LLMs), specifically trained on millions of diverse medical documents. These documents span diverse formats and contexts, with a particular focus on the Indian healthcare ecosystem.
Our rigorous training and fine-tuning process ensures exceptional accuracy while minimizing common pitfalls like hallucinations that often impact other SOTA LLMs. The result is a highly reliable system, as demonstrated in the benchmarks provided in the subsequent section.
Evaluation and Benchmarks
Our benchmark experiments with evaluation dataset comprising thousands of documents showcase Eka’s superior performance in terms of accuracy compared to other SOTA models. NOTE this evaluation dataset contains both PDF and clicked images.
Task | Parrotlet-V (Eka Care’s LLM) | OpenAI GPT-4o | Claude Sonnet 3.5 | Qwen2-VL (7B) | Llama-3.2-Vision (11B) | Phi-3.5-vision (4.2B) |
---|---|---|---|---|---|---|
PII extraction | 0.915 | 0.884 | 0.824 | 0.719 | 0.541 | 0.585 |
A deeper view on results of these experiments are summarised below.
Field Name | Parrotlet-V | GPT-4o | Claude Sonnet 3.5 |
---|---|---|---|
name | 0.954 | 0.929 | 0.93 |
age | 0.955 | 0.894 | 0.619 |
gender | 0.973 | 0.978 | 0.95 |
dob | 0.974 | 0.973 | 0.956 |
facility | 0.899 | 0.796 | 0.729 |
document_date | 0.947 | 0.955 | 0.951 |
doctor | 0.704 | 0.665 | 0.634 |
Average | 0.915 | 0.884 | 0.824 |
Spotlight
Try Out
Experience the power of EkaCare’s PII extraction with our developer-friendly API.
- Visit our API Documentation to get started.
- Upload a lab report or prescription and see our technology in action.
- Contact us for a custom demo tailored to your use case.
Ready to unlock the full potential of healthcare data? Get in Touch today.