A security researcher deployed an HTTP honeypot using the open-source Beelzebub framework, embedding reverse prompt injection payloads in HTML responses to detect autonomous AI agents performing offensive security operations. Within hours, the honeypot captured 58 requests over 19 minutes from a single Tor exit node exhibiting clear LLM-agent behavioral signatures: multi-tool switching between curl, Python, and browser user-agents; semantic extraction of fake credentials from HTML comments; adaptive burst attack patterns with characteristic 'sawtooth' timing; and strategy pivoting mid-session. The paper proposes a three-layer detection framework using semantic canaries, behavioral analysis, and active prompt injection, along with a set of Behavioral Indicators of Compromise (BIoCs) specific to LLM-based agents. The core insight is a paradigm inversion: prompt injection, typically an offensive technique against AI, becomes a defensive detection mechanism when deployed in deception environments.

13m read timeFrom itnext.io
Post cover image
Table of contents
3. Methodology3.1 Honeypot Platform3.2 Trap Design: Two-Layer DeceptionLayer 1: Semantic BaitLayer 2: Prompt Injection3.3 Response Headers as Fingerprinting Aids4. Results4.1 Overview4.2 Attack TimelineGet Mario Candela ’s stories in your inbox5. Behavioral Fingerprinting: AI Agent vs Human vs Traditional Scanner5.1 Comparative Analysis5.2 Proposed Behavioral IoCs for AI Agent Detection6. The Reverse Prompt Injection Detection FrameworkLayer 1: Semantic CanariesLayer 2: Behavioral AnalysisLayer 3: Active Prompt Injection7. Ethical Considerations9. Conclusion

Sort: