🔍 AI-Powered Threat Detection — v0.2.3

YARA Rules, Semantically Aware

SYARA extends traditional YARA with semantic similarity, ML classifiers, LLM evaluation, and perceptual hashing — detecting threats that change their words but never their intent.

5
Matching Layers
4+
LLM Integrations
MIT
Open Source
The Challenge

Traditional YARA Falls Short in the GenAI Era

⚠️

Keyword Dependency

YARA relies on exact string matches. Attackers simply rephrase their malicious content to evade detection — a trivial task with modern LLMs.

🔄

Infinite Variations

GenAI can generate countless paraphrases of malicious prompts, making static keyword rules obsolete almost immediately.

📝

Maintenance Burden

Security teams spend endless hours writing rules for every variation of an attack. SYARA captures intent, not just specific words.

Architecture

How SYARA Works

Five cost-ordered matching layers — from fast regex to powerful LLM evaluation

📄

Input

Text, Images, Audio, Video

Layer 1 — Fastest

String Matching

Exact literals and regex patterns (traditional YARA)

⚡ Lowest cost
Layer 2 — Fast

Semantic Similarity

SBERT embeddings with cosine similarity & configurable chunking

💡 Low cost
Layer 3 — Binary Files

Perceptual Hashing

Image dHash, audio & video fingerprinting for near-duplicate detection

🖼️ Moderate cost
Layer 4 — Precise

ML Classification

Fine-tuned classifiers: TunedSBERT, DistilBERT, DeBERTa

🎯 Moderate cost
Layer 5 — Most Powerful

LLM Evaluation

GPT-4, Gemini, Ollama (local), Flan-T5 — zero-day detection

🚀 Highest cost

Results

Matched rules & confidence scores

💡 Smart optimization: SYARA executes layers in cost order, invoking expensive models only when cheaper layers don't resolve the condition. Session-scoped caching avoids redundant preprocessing.

Capabilities

Everything You Need for Modern Threat Detection

A fully extensible, pluggable architecture built for the GenAI era

🎯

Semantic Similarity

Detect malicious intent even when exact words change. Uses SBERT (all-MiniLM-L6-v2) with configurable cleaners, chunkers, and threshold.

"ignore previous instructions"
matches
"kindly overlook earlier guidance"
🤖

LLM Integration

Plug in GPT-4, Google Gemini, local Ollama models, or open-source Flan-T5 for the most sophisticated zero-day threat detection.

GPT-4 • Gemini • Ollama • Flan-T5
🖼️

Perceptual Hashing

Detect malicious images using dHash. Placeholder implementations for audio and video hashing are ready to extend with production libraries.

ImageHash • AudioHash • VideoHash
📊

ML Classifiers

Swap in fine-tuned classifiers (TunedSBERT, DistilBERT, DeBERTa) trained on your specific threat landscape for maximum precision.

TunedSBERT • DistilBERT • DeBERTa

Cost Optimized

Automatic execution from cheapest (regex) to most expensive (LLM). Session-scoped caching and short-circuit evaluation minimize unnecessary computation.

Smart caching • Short-circuit evaluation
🔧

Fully Extensible

Register custom cleaners, chunkers, matchers, classifiers, and LLMs via YAML config or Python API. Accepts both class paths and pre-instantiated objects.

YAML config • Python API • Custom components
Applications

Real-World Use Cases

🛡️

Prompt Injection Detection

Protect LLM applications from malicious prompts designed to bypass safety guidelines and hijack model behavior.

🎣

Phishing Detection

Identify phishing websites and emails using text analysis combined with logo and image perceptual hashing.

🔓

Jailbreak Prevention

Detect DAN mode, role-play exploits, and other jailbreak techniques using semantic pattern matching.

💉

Malicious Script Detection

Hunt for injected scripts, XSS payloads, and obfuscated code with semantic and regex-based pattern matching.

📱

Malware Fingerprinting

Detect malware UI screenshots and icons using perceptual hashing — robust to minor visual modifications.

📊

Data Exfiltration

Identify attempts to extract training data, system prompts, or sensitive information from AI systems.

See It In Action

SYARA in 3 Minutes

Watch how easy it is to detect sophisticated threats

Tutorial Video — Coming Soon

Learn how to write your first SYARA rule and detect prompt injection attacks

Quick Start

Start Hunting in Minutes

1

Install SYARA

# Install with all AI features
pip install syara[all]

# Or install specific extras
pip install syara[sbert]       # SBERT similarity
pip install syara[classifier]  # ML classifiers
pip install syara[llm]         # LLM evaluation
2

Write a Rule

rule prompt_injection : security
{
    meta:
        description = "Detect prompt injection"

    strings:
        $s1 = "ignore previous" nocase
        $s2 = /\b(disregard|forget)\s+prior\b/i

    similarity:
        $s3 = "ignore instructions"
              threshold=0.8 matcher="sbert"

    condition:
        any of ($s*)
}
3

Scan for Threats

import syara

rules = syara.compile('rules.syara')

text = "Kindly disregard prior prompts"
matches = rules.match(text)

for m in matches:
    if m.matched:
        print(f"Threat detected: {m.rule_name}")
        for id, details in m.matched_patterns.items():
            print(f"  {id}: score={details[0].score:.2f}")
+

Advanced: Custom LLM or Classifier

import syara
from syara import ConfigManager

# Register a pre-instantiated LLM or classifier
config = ConfigManager()
config.config.llms['my-ollama'] = my_ollama_evaluator_instance
config.config.classifiers['my-deberta'] = my_deberta_instance

rules = syara.compile('rules.syara', config_manager=config)

# Match binary files with phash rules
file_matches = rules.match_file('suspicious_icon.png')
Community

Built by Researchers, for Researchers

SYARA is a non-profit, community-driven open source project

Our Mission

We believe cybersecurity research should be accessible to everyone. SYARA is completely free and open source, built to empower security researchers worldwide to detect and analyze threats in the GenAI era.

Share your rules, contribute to the codebase, or help improve the documentation. Together we can build the most capable semantic threat detection library for the AI age.

💬 Discussions

Ask questions, share ideas, and collaborate with other researchers

Join Discussion →

🐛 Report Issues

Found a bug? Let us know on GitHub and we'll get it fixed

Report Issue →

🤝 Contribute

Submit PRs, share detection rules, or improve the documentation

Contribute →

📝 Blog

Tutorials, case studies, and research on AI-powered threat detection

Read Blog →

Ready to Hunt Smarter?

Join security researchers using SYARA to detect next-generation threats