Skip to main content

Detectors

Prompt Attack Detector

The Prompt Attack Detector is a crucial security component that identifies and prevents malicious prompt injection attempts in AI interactions. It analyzes user inputs to detect common attack patterns such as:

  • Prompt leaking attempts
  • Role-playing exploits
  • System prompt injection
  • Jailbreak attempts
  • Delimiter manipulation

The detector uses advanced heuristics and machine learning models to evaluate input patterns, context, and semantic meaning to maintain the integrity of AI interactions while allowing legitimate user queries to pass through.

PII Detector

ZenGuard Trust Layer helps maintaing privacy and trust by automatically detecting and masking PII.

PII includes any data that can identify an individual, such as:

  • names
  • U.S. mailing addresses
  • phone numbers
  • email addresses
  • IP addresses
  • credit card numbers
  • International Bank Account Numbers (IBANs)
  • U.S. Social Security Numbers (SSNs).

Organizations handling PII are legally obligated to protect it under regulations like the General Data Protection Regulation (GDPR) in the EU and the Gramm-Leach-Bliley Act (GLBA) and Health Insurance Portability and Accountability Act (HIPAA) in the U.S.

In LLM applications, PII can surface in various ways: users might input their own or others' PII, retrieval-augmented generation (RAG) systems could inadvertently retrieve documents containing PII, or data policies might not account for sharing customer PII with third-party LLM providers. ZenGuard Trust Layer addresses these challenges by not training on any PII and by detecting specific entities to prevent data leakage.

For example, the PII detector identifies full names from diverse cultural backgrounds, accounting for common typos and punctuation errors. Similarly, detectors for phone numbers, email addresses, IP addresses, credit card numbers, IBANs, and SSNs are designed to identify valid formats while being resilient to common errors.

Intended Use Detector

ZenGuard Trust Layer ensures that your AI agents are only being asked relevant questions. For example, if your AI agents is handling customer support, it should not talk about politics or give investment advice. The Intended Use Detector detects when the user's question is not relevant to the intended use of the AI agent. As the byproduct, this detector can filter out harmful or offensive content.

Secrets Detector

ZenGuard Trust Layers detects if prompt contains known formats of API keys, passwords, and other secrets.

The main use case is to prevent accidental exposure of secrets in the user prompts since those can contain significat security and liability concerns.

Keywords Detector

ZenGuard Trust Layer can be configured to detect specific keywords. This is useful when you want to flag or filter out certain types of prompts.

For example, you can detect and get notified if the prompt is talking about your competitors, canceling a subscription, or asking for a refund.

Detectors Development

ZenGuard AI team is consistently working on improving the detectors by improving:

  • Accuracy - we are continuously improving the models and heuristics used in the detectors
  • Latency - we are constantly adding both software and hardware optimization, so detectors are as fast as possible
  • Customization - we are improving the detectors based on your feedback

if you think that any detector is missing or not working as expected, please let us know by contacting us at support@zenguard.ai.