MrCogito Research Log #1

I’m building a ConceptEncoder — a novel transformer architecture that replaces expensive O(n²) self-attention with cross-attention through a small set of learnable concept tokens. I’m sharing my research journey openly. Here’s what happened in February 2026.

What I tried: → Redesigned the diffusion decoder — removed O(n²) self-attention, added AdaLN-Zero timestep conditioning. ~4x speedup per training step. → Compared diffusion vs MLM training objectives head-to-head. Diffusion gave 2x better concept geometry (rank 10.1 vs 5/128) but near-random semantic quality (STS-B: 0.138). → Implemented ViaDecoder evaluation — a fairer way to evaluate concept encoders that preserves their internal structure. Consistent +0.65-2.3% improvement over the naive CLS-query approach. → Added VICReg regularization with warmup to combat concept collapse.

What surprised me: → Better geometry doesn’t automatically mean better semantics. You can spread concepts apart in space without them capturing different meanings.

What’s next: → TSDAE (denoising autoencoder) training — forces concepts to reconstruct from corrupted input → Prefix generation — the decoder must generate text the encoder never saw

Full project (open source): github.com/ksopyla/MrCogito

#AIResearch #OpenScience #ConceptEncoder #ResearchLog #MachineLearning