Models & Frontiers

What the new models can actually do, how they were trained, and whether the benchmarks mean anything. Open source vs closed, and where the research is heading.

The GPU Bottleneck Isn't Compute Anymore
signals

The GPU Bottleneck Isn't Compute Anymore

NVIDIA's Blackwell GPUs doubled tensor core throughput but left shared memory and exponential units unchanged. FlashAttention-4 rearchitects attention kernels from scratch to work around this asymmetry, achieving 1,613 TFLOPs/s and up to 1.3x speedup over cuDNN on B200.

3 min read
MoE Training Just Got 4x Faster
signals

MoE Training Just Got 4x Faster

Grouter extracts routing structures from pre-trained MoE models and reuses them as fixed routers for new models. The result: 4.28x improvement in data utilization and up to 33.5% throughput acceleration.

3 min read
LLM-Powered Swarms and the 300x Overhead Nobody Wants to Talk About
signals

LLM-Powered Swarms and the 300x Overhead Nobody Wants to Talk About

SwarmBench tested 13 LLMs on swarm coordination tasks. The results show catastrophic overhead and communication that doesn't actually help.

5 min read
Attention Heads Are the New Inference Budget
signals

Attention Heads Are the New Inference Budget

Models that can technically process 128K tokens routinely fail on tasks requiring reasoning across 32K. That gap isn't a context window problem. It's an...

7 min read
MoE's Dirty Secret Is Load Balancing
signals

MoE's Dirty Secret Is Load Balancing

Every frontier lab now ships a sparse Mixture-of-Experts model. Google's Switch Transformer started the trend. DeepSeek-V3 proved it could scale....

6 min read
Synthetic Data Won't Save You From Model Collapse
guides

Synthetic Data Won't Save You From Model Collapse

The AI industry's running out of internet. Every major lab's already scraped the same corpus, and the easy gains from scaling data are tapering. The...

14 min read
MoE Models Run 405B Parameters at 13B Cost
guides

MoE Models Run 405B Parameters at 13B Cost

When Mistral AI dropped Mixtral 8x7B in December 2023, claiming GPT-3.5-level performance at a fraction of the compute cost, the reaction split cleanly...

14 min read
The Inference Budget Just Got Interesting
signals

The Inference Budget Just Got Interesting

OpenAI's o1 made headlines for "thinking harder" during inference. But the real story isn't that a model can spend more tokens on reasoning: it's that...

7 min read
Mixture of Experts Explained: The Architecture Behind Every Frontier Model
guides

Mixture of Experts Explained: The Architecture Behind Every Frontier Model

Every frontier model released in the last 18 months uses Mixture of Experts. DeepSeek-V3 activates just 37 billion of its 671 billion parameters per token. Understanding how MoE works isn't optional anymore.

10 min read
Inference-Time Compute Is Escaping the LLM Bubble
signals

Inference-Time Compute Is Escaping the LLM Bubble

Explore how inference-time compute scaling lets AI models think longer and reason deeper, boosting accuracy without retraining.

7 min read
Swarm Signal
0:00
0:00
Up Next

Queue is empty. Click "+ Queue" on any article to add it.