About:

Grigory Sapunov is a co-founder and CTO at Intento, with a keen interest in modern machine learning.

Website:

Specializations:

Interests:

Modern ML Machine learning

Incoming Links:

Subscribe to RSS:
The post examines the understanding of code agents in software architecture, introducing a benchmark to evaluate their architectural beliefs and revealing significant model-dependent performance variations.
The post critically examines the Universal Reasoning Model (URM) and its performance compared to previous models, emphasizing architectural innovations and evaluation challenges.
The post explores how self-replicating programs emerge from simple interactions in computational environments, shedding light on the origins of life and the nature of computation.
The blog post discusses the NeurIPS 2025 Best Paper Awards, summarizing the award-winning and runner-up papers. It highlights key contributions such as the introduction of the INFINITY-CHAT dataset for evaluating output diversity ...
The article analyzes and compares encoder-decoder and decoder-only transformer architectures in the context of large language models (LLMs). It discusses the evolution of transformer models, highlighting the dominance of decoder-o...
The post discusses the Tiny Recursive Model (TRM), which simplifies the Hierarchical Reasoning Model (HRM) by reducing complexity while maintaining performance. It highlights the differences between TRM and traditional large langu...
The blog post discusses the Hierarchical Reasoning Model (HRM), a brain-inspired hierarchical architecture developed by researchers at Sapient Intelligence. The model features fast and slow networks, achieving high performance on ...
The blog post reviews two books about neutrinos and their historical context. The first book, 'Neutrino' by Frank Close, discusses the discovery of neutrinos, focusing on key figures like Ray Davis and Bruno Pontecorvo, and the ch...
The blog post discusses V-JEPA 2, an advanced self-supervised video model that builds a world model based on video data. It highlights the model's two-stage training process: the first stage focuses on learning robust visual repre...
The paper discusses a novel approach to deep learning architectures by introducing Tversky neural networks, which utilize a differentiable parameterization of Tversky similarity to better model human perception of similarity. The ...
The blog post discusses the challenges of memory optimization in large models, particularly during training and fine-tuning. It introduces a new method called NanoAdam, which focuses on updating a subset of parameters with small w...
Blaise Agüera y Arcas's 'What is Life?' examines the interplay between life and computation, highlighting the significance of symbiogenesis and replicators in evolution.
The post discusses the concept of stochastic activations in neural networks, particularly in the context of large language models (LLMs). It critiques traditional activation functions like ReLU and introduces new methods such as S...
2025 marked significant advancements in AI agents, revealing both their potential and reliability challenges across various industries.
The post discusses the ICML 2025 Outstanding Paper Awards, highlighting the anxiety of 'paper FOMO' among researchers due to the overwhelming volume of significant machine learning research. It summarizes key papers awarded for th...
A reconstructed list of 27 essential deep learning papers, originally suggested by Ilya Sutskever to John Carmack, highlights key research in AI despite some topics being missing.
Optimistic predictions for 2026 include advancements in AI, robotics, and understanding animal communication, alongside the emergence of reliable AI agents for everyday tasks.
The author shares their experience with the Gemini 3.0 and its Pro Image model, 'Nano Banana Pro,' highlighting its potential to revolutionize infographic generation. They discuss the limitations of the NotebookLM podcast feature,...
The blog post discusses DolphinGemma, a collaborative project involving Google, Georgia Tech, and the Wild Dolphin Project, aimed at understanding dolphin communication through a model trained on dolphin sounds. The author highlig...
The author discusses the overwhelming volume of ML papers on arXiv and introduces ArXivIQ, a multi-agent AI pipeline that produces structured deep-dives aimed at 15-minute reads. The system is designed to help researchers cover mo...
The paper introduces the Darwin Gödel Machine (DGM), a self-improving AI system that iteratively refines its own codebase and validates modifications using coding benchmarks. It draws inspiration from Darwinian evolution and depar...
The paper 'Do Language Models Use Their Depth Efficiently?' by Róbert Csordás, Christopher D. Manning, and Christopher Potts at Stanford University challenges the belief that deeper Large Language Models (LLMs) enable more complex...
AlphaEvolve is a coding agent for scientific and algorithmic discovery that runs an evolutionary algorithm to create programs improving performance metrics for a given task. It uses large language models to produce algorithms solv...
The text discusses the future of AI models and the potential of training them on large collections of educational and scientific literature. It highlights the challenges and opportunities in this area, including copyright issues, ...