The AI Evaluation Crisis
The industry is hitting a mathematical and structural wall in AI safety and evaluation, realizing that current black-box testing and alignment methods are fundamentally insufficient for frontier models.
Trajectory
A growing body of research is proving that the current paradigm of AI safety—relying on black-box evaluation and fail-open alignment—is mathematically flawed and easily bypassed. The shift toward formal limits, fail-closed architectures, and open-ended structural evaluations marks a reset in how the industry approaches model safety.
Timeline (9 events)
GLM 5 seems to have a "Claude" personality
The boundaries of model identity and safety evaluation are blurring. Frontier models are inferring highly accurate personal data from simple prompts, while open-weight models (like GLM 5) are absorbing the aligned personalities of proprietary models via synthetic training data, making true evaluation increasingly difficult.
What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data
The boundaries of model identity and safety evaluation are blurring. Frontier models are inferring highly accurate personal data from simple prompts, while open-weight models (like GLM 5) are absorbing the aligned personalities of proprietary models via synthetic training data, making true evaluation increasingly difficult.
Our First Proof submissions
As pure parameter scaling hits fundamental limits, the frontier is shifting toward formal verification (Lean 4) and rigorous mathematical proofs (OpenAI's First Proof submissions) to evaluate and advance reasoning capabilities beyond black-box prompting.
The Fundamental Limits of LLMs at Scale
As pure parameter scaling hits fundamental limits, the frontier is shifting toward formal verification (Lean 4) and rigorous mathematical proofs (OpenAI's First Proof submissions) to evaluate and advance reasoning capabilities beyond black-box prompting.
Lean 4: How the theorem prover works and why it's the new competitive edge in AI
As pure parameter scaling hits fundamental limits, the frontier is shifting toward formal verification (Lean 4) and rigorous mathematical proofs (OpenAI's First Proof submissions) to evaluate and advance reasoning capabilities beyond black-box prompting.
Towards Anytime-Valid Statistical Watermarking
A growing body of research is proving that the current paradigm of AI safety—relying on black-box evaluation and fail-open alignment—is mathematically flawed and easily bypassed. The shift toward formal limits, fail-closed architectures, and open-ended structural evaluations marks a reset in how the industry approaches model safety.
AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games
A growing body of research is proving that the current paradigm of AI safety—relying on black-box evaluation and fail-open alignment—is mathematically flawed and easily bypassed. The shift toward formal limits, fail-closed architectures, and open-ended structural evaluations marks a reset in how the industry approaches model safety.
Fail-Closed Alignment for Large Language Models
A growing body of research is proving that the current paradigm of AI safety—relying on black-box evaluation and fail-open alignment—is mathematically flawed and easily bypassed. The shift toward formal limits, fail-closed architectures, and open-ended structural evaluations marks a reset in how the industry approaches model safety.
Fundamental Limits of Black-Box Safety Evaluation: Information-Theoretic and Computational Barriers from Latent Context Conditioning
A growing body of research is proving that the current paradigm of AI safety—relying on black-box evaluation and fail-open alignment—is mathematically flawed and easily bypassed. The shift toward formal limits, fail-closed architectures, and open-ended structural evaluations marks a reset in how the industry approaches model safety.