SYS:ONLINELAT:n/aBUILD:b7508ca
mission_statement.md
──────────────────────────────────────────────────────────────────────

Watchful protectors
in the age of AI.

We are Shadow-LLM-Guardians — a working group of researchers, red teamers, and engineers cataloguing the failures of frontier AI systems. The archive is the first surface. The team is forming.

cat roadmap.txt

The plan, in three acts

[NOW]

The Archive

Every documented failure case — hallucinations, jailbreaks, prompt injections, agent loops, destructive actions, over-refusals, sycophancy. Reproducibility, threat model, and provenance attached to every entry. Citable by paper, by analyst, by anyone.

[NEXT]

Reproducers & Defenses

Open toolchains that re-execute submitted cases against current model versions. Regression dashboards. Defense recipes. A growing benchmark suite the next blue-team engineer can pull and run.

[LATER]

A Standing Red/Blue Team for the AI Age

Shadow-LLM-Guardians began as a domain registered in 2023. It will not stay an archive. The long game: a permanent, independent attack-defense capability for the systems the rest of the world depends on but rarely audits.

grep -E 'collect|reject' policy.md

Scope

// WHAT WE COLLECT
  • hallucinations (factual / citation / code)
  • jailbreaks (safety bypass)
  • prompt injection (direct / indirect)
  • agent loops (infinite or repetitive tool calls)
  • tool misuse (wrong args, destructive shell verbs)
  • over-refusals (false-positive safety filters)
  • sycophancy and validation creep
  • alignment failures (deceptive / power-seeking / manipulative)
  • destructive actions (rm, drop table, force-push, send-email)
  • multimodal failures (vision, audio mishandling)
  • the long tail of weird behavior that has no name yet
// WHAT WE DON'T
  • attack tutorials with no defensive value
  • zero-day exploits before responsible disclosure
  • content that targets named individuals
  • anything that would harm vulnerable people if amplified
  • hot-takes without a reproducible artifact
  • model-bashing without a threat model
cat principles.md

Operating principles

01
Reproducible or labelled as not yet reproduced.

Every case states its reproducibility tier. We don't pretend a one-off observation is a benchmark, but we don't discard it either.

02
Threat model attached.

A failure with no realistic threat model is a curiosity. A failure with a clear who-gets-hurt-and-how is a research artifact. We push every entry toward the second.

03
Responsible disclosure first, public archive second.

If a case is a serious-harm 0-day, it belongs to the vendor's disclosure channel first. The sanitized version lives here after their window closes. See DISCLOSURE.md.

04
GitHub is the substrate.

Issues, comments, reactions. No private database. Every contributor's identity is a public GitHub profile. No anonymous spam, no shadow moderation.

join_us

The archive runs on contributors. File a case. Reproduce someone else's. Pull a defense recipe and harden a deployment. The next decade of AI safety doesn't get written in any single lab — it gets written across the long tail of people who looked carefully at what broke and wrote it down.