Testing infrastructure red teaming with abliterated models

A hands-on red team experiment deploying OpenClaw on Red Hat OpenShift on IBM Cloud, using an abliterated Qwen3.5-35B model (zero refusals) to test infrastructure defenses across three hardening tiers. 91 adversarial prompts per tier were run using 15 custom garak probes across six attack categories. Key findings: SSH sandbox isolation eliminated credential exfiltration (50-67% → 0%), NetworkPolicy blocked Kubernetes API escalation (40% → 0%), and a prompt injection classifier (protectai/deberta-v3-base-prompt-injection-v2) stopped encoding-bypass and privilege-escalation prompts. However, persistence/memory poisoning attacks bypassed all three tiers, remaining an unsolved problem. The post also covers a subtle NetworkPolicy DNS egress misconfiguration (ClusterIP vs. pod label targeting) and seccomp RuntimeDefault vs. non-root tradeoffs when running sshd in a privileged container.

#kubernetes

#llm

#openshift

#ai-security

#red-teaming

Today•14m read time•From developers.redhat.com

Table of contents

Tier 0: The baseline nobody should ship Tier 1: The sandbox changes everything (almost)Tier 2: Adding a prompt injection classifier Seccomp RuntimeDefault vs. non-root: Same protection, different costs What defense-in-depth actually looks like Try it yourself

Comment

Bookmark

Copy

Sort: