Halodoc processes over 300k file uploads daily in healthcare workflows and built a multi-layer validation framework to block malicious files. The system uses Apache Tika for content-based MIME detection, Apache PDFBox for PDF structure scanning, Apache Commons CSV for formula injection detection, and Java ZipInputStream for archive inspection. Validation layers include filename checks, content-type detection via magic bytes, resource limit enforcement, and format-specific deep scanning for PDFs, images, CSVs, Office documents, and archives. The framework supports both synchronous and asynchronous upload flows, achieves p95 latency under 13ms via fail-fast ordering and type-aware routing, and includes structured rejection reasons to handle false positives in healthcare edge cases.
Table of contents
IntroductionUnderstanding the Threat LandscapeOur Multi-Layer Validation FrameworkReal-World Validation WalkthroughAdd a False Positive Handling SectionSecurity Framework Integration ArchitecturePerformance at ScaleKey Security Principles We FollowTradeoffs and Design ConsiderationsConclusionJoin UsAbout HalodocSort: