SYARA - Semantic YARA for the GenAI Era

The Challenge

Traditional YARA Falls Short in the GenAI Era

⚠️

Keyword Dependency

YARA relies on exact string matches. Attackers simply rephrase their malicious content to evade detection — a trivial task with modern LLMs.

🔄

Infinite Variations

GenAI can generate countless paraphrases of malicious prompts, making static keyword rules obsolete almost immediately.

📝

Maintenance Burden

Security teams spend endless hours writing rules for every variation of an attack. SYARA captures intent, not just specific words.

Architecture

How SYARA Works

Five cost-ordered matching layers — from fast regex to powerful LLM evaluation

📄

Input

Text, Images, Audio, Video

→

Layer 1 — Fastest

String Matching

Exact literals and regex patterns (traditional YARA)

⚡ Lowest cost

Layer 2 — Fast

Semantic Similarity

SBERT embeddings with cosine similarity & configurable chunking

💡 Low cost

Layer 3 — Binary Files

Perceptual Hashing

Image dHash, audio & video fingerprinting for near-duplicate detection

🖼️ Moderate cost

Layer 4 — Precise

ML Classification

Fine-tuned classifiers: TunedSBERT, DistilBERT, DeBERTa

🎯 Moderate cost

Layer 5 — Most Powerful

LLM Evaluation

GPT-4, Gemini, Ollama (local), Flan-T5 — zero-day detection

🚀 Highest cost

→

✅

Results

Matched rules & confidence scores

💡 Smart optimization: SYARA executes layers in cost order, invoking expensive models only when cheaper layers don't resolve the condition. Session-scoped caching avoids redundant preprocessing.

Capabilities

Everything You Need for Modern Threat Detection

A fully extensible, pluggable architecture built for the GenAI era

🎯

Semantic Similarity

Detect malicious intent even when exact words change. Uses SBERT (all-MiniLM-L6-v2) with configurable cleaners, chunkers, and threshold.

"ignore previous instructions"
matches
"kindly overlook earlier guidance"

🤖

LLM Integration

Plug in GPT-4, Google Gemini, local Ollama models, or open-source Flan-T5 for the most sophisticated zero-day threat detection.

GPT-4 • Gemini • Ollama • Flan-T5

🖼️

Perceptual Hashing

Detect malicious images using dHash. Placeholder implementations for audio and video hashing are ready to extend with production libraries.

ImageHash • AudioHash • VideoHash

📊

ML Classifiers

Swap in fine-tuned classifiers (TunedSBERT, DistilBERT, DeBERTa) trained on your specific threat landscape for maximum precision.

TunedSBERT • DistilBERT • DeBERTa

⚡

Cost Optimized

Automatic execution from cheapest (regex) to most expensive (LLM). Session-scoped caching and short-circuit evaluation minimize unnecessary computation.

Smart caching • Short-circuit evaluation

🔧

Fully Extensible

Register custom cleaners, chunkers, matchers, classifiers, and LLMs via YAML config or Python API. Accepts both class paths and pre-instantiated objects.

YAML config • Python API • Custom components

Applications

Real-World Use Cases

🛡️

Prompt Injection Detection

Protect LLM applications from malicious prompts designed to bypass safety guidelines and hijack model behavior.

🎣

Phishing Detection

Identify phishing websites and emails using text analysis combined with logo and image perceptual hashing.

🔓

Jailbreak Prevention

Detect DAN mode, role-play exploits, and other jailbreak techniques using semantic pattern matching.

💉

Malicious Script Detection

Hunt for injected scripts, XSS payloads, and obfuscated code with semantic and regex-based pattern matching.

📱

Malware Fingerprinting

Detect malware UI screenshots and icons using perceptual hashing — robust to minor visual modifications.

📊

Data Exfiltration

Identify attempts to extract training data, system prompts, or sensitive information from AI systems.

Quick Start

Start Hunting in Minutes

Install SYARA

# Install with all AI features
pip install syara[all]

# Or install specific extras
pip install syara[sbert]       # SBERT similarity
pip install syara[classifier]  # ML classifiers
pip install syara[llm]         # LLM evaluation

Write a Rule

rule prompt_injection : security
{
    meta:
        description = "Detect prompt injection"

    strings:
        $s1 = "ignore previous" nocase
        $s2 = /\b(disregard|forget)\s+prior\b/i

    similarity:
        $s3 = "ignore instructions"
              threshold=0.8 matcher="sbert"

    condition:
        any of ($s*)
}

Scan for Threats

import syara

rules = syara.compile('rules.syara')

text = "Kindly disregard prior prompts"
matches = rules.match(text)

for m in matches:
    if m.matched:
        print(f"Threat detected: {m.rule_name}")
        for id, details in m.matched_patterns.items():
            print(f"  {id}: score={details[0].score:.2f}")

Advanced: Custom LLM or Classifier

import syara
from syara import ConfigManager

# Register a pre-instantiated LLM or classifier
config = ConfigManager()
config.config.llms['my-ollama'] = my_ollama_evaluator_instance
config.config.classifiers['my-deberta'] = my_deberta_instance

rules = syara.compile('rules.syara', config_manager=config)

# Match binary files with phash rules
file_matches = rules.match_file('suspicious_icon.png')

📖 Full Documentation 💡 Example Rules ⭐ Star on GitHub

Community

Built by Researchers, for Researchers

SYARA is a non-profit, community-driven open source project

Our Mission

We believe cybersecurity research should be accessible to everyone. SYARA is completely free and open source, built to empower security researchers worldwide to detect and analyze threats in the GenAI era.

Share your rules, contribute to the codebase, or help improve the documentation. Together we can build the most capable semantic threat detection library for the AI age.

💬 Discussions

Ask questions, share ideas, and collaborate with other researchers

Join Discussion →

🐛 Report Issues

Found a bug? Let us know on GitHub and we'll get it fixed

Report Issue →

🤝 Contribute

Submit PRs, share detection rules, or improve the documentation

Contribute →

📝 Blog

Tutorials, case studies, and research on AI-powered threat detection

Read Blog →

YARA Rules, Semantically Aware

Traditional YARA Falls Short in the GenAI Era

Keyword Dependency

Infinite Variations

Maintenance Burden

How SYARA Works

Input

String Matching

Semantic Similarity

Perceptual Hashing

ML Classification

LLM Evaluation

Results

Everything You Need for Modern Threat Detection

Semantic Similarity

LLM Integration

Perceptual Hashing

ML Classifiers

Cost Optimized

Fully Extensible

Real-World Use Cases

Prompt Injection Detection

Phishing Detection

Jailbreak Prevention

Malicious Script Detection

Malware Fingerprinting

Data Exfiltration

SYARA in 3 Minutes

Start Hunting in Minutes

Install SYARA

Write a Rule

Scan for Threats

Advanced: Custom LLM or Classifier

Built by Researchers, for Researchers

Our Mission

💬 Discussions

🐛 Report Issues

🤝 Contribute

📝 Blog

Ready to Hunt Smarter?