About:

Sebastian Raschka, PhD, is the author of 'Ahead of AI,' a Substack publication that specializes in Machine Learning and AI research. The publication is read by tens of thousands of researchers and practitioners who want to stay ahead in the ever-evolving field.

Website:

Specializations:

Incoming Links:

Subscribe to RSS:
A comprehensive overview of ten notable open-weight LLM releases from early 2026, focusing on their architectures, performance, and innovative features.
Inference scaling significantly improves the accuracy of large language models by optimizing computational resources during text generation, with various methods explored in detail.
The article provides an in-depth analysis of the DeepSeek V3.2 model, highlighting its performance improvements over its predecessor, DeepSeek V3, and its unique features such as the DeepSeek Sparse Attention (DSA) mechanism. It d...
The article discusses the evolution of large language models (LLMs) beyond traditional autoregressive transformer architectures, highlighting various alternative approaches such as linear attention hybrids, text diffusion models, ...
The article discusses the evaluation methods for large language models (LLMs), focusing on four main approaches: multiple-choice benchmarks, verifiers, leaderboards, and LLM judges. It emphasizes the importance of understanding th...
OpenAI has released two new open-weight large language models (LLMs), gpt-oss-120b and gpt-oss-20b, marking the first such release since GPT-2 in 2019. The article discusses the models' architecture, optimizations for local runnin...
The article provides a comprehensive analysis of the evolution of large language models (LLMs) from GPT-2 to the latest architectures like DeepSeek-V3 and Llama 4. It highlights the structural similarities and advancements in atte...
The author presents a curated list of research papers focused on large language models (LLMs) and their reasoning capabilities, organized by topic rather than date. The list is divided into three categories: training strategies, i...
2025 saw major advancements in large language models, particularly with the introduction of reasoning models and RLVR, reshaping the landscape of AI development and applications.
The article provides a hands-on guide to understanding and implementing the Qwen3 architecture, a popular open-weight model family. It highlights the reasons for Qwen3's popularity, including its developer-friendly open-source lic...
A curated list of LLM research articles from mid-2025, categorized for easy access, alongside the author's annual review on LLM progress and predictions.
The article explains the concept of KV caches in LLMs and how they work conceptually and in code with a from-scratch, human-readable implementation. It discusses the benefits and downsides of using KV caches, and provides a concre...
The author shares a detailed content on how to code LLMs, which is one of the best ways to understand how LLMs work. The videos are supplementary content for the Build a Large Language Model (From Scratch) book. The author also sh...
The article discusses the recent developments in reasoning via reinforcement learning, focusing on the training methods used to develop and improve reasoning models. It explains the RLHF basics, the reinforcement learning algorith...
The text is an introduction to a new book on reasoning in large language models (LLMs). It provides an overview of reasoning methodologies and their application in LLMs, including inference-time scaling and reinforcement learning....
The text discusses the recent research advancements in reasoning-optimized large language models (LLMs), focusing on inference-time compute scaling. It explores different methods to improve reasoning capabilities of LLMs, such as ...
The article discusses the four main approaches to building reasoning models and how to enhance LLMs with reasoning capabilities. It defines reasoning as the process of answering complex, multi-step questions and outlines the metho...
The text discusses noteworthy AI research papers of 2024, focusing on the second half of the year. It covers a variety of relevant topics, from mixture-of-experts models to new LLM scaling laws for precision. The author highlights...
The article discusses the noteworthy AI research papers of 2024, focusing on LLM research. It covers a variety of topics, from mixture-of-experts models to new LLM scaling laws for precision. The author shares research papers that...
The text is about the author's plans to publish a new article with a discussion of all their research highlights from 2024, but due to an accident and serious injury, they are currently unable to work at a computer and finish the ...
The text explains the concept of multimodal LLMs, which are large language models capable of processing multiple types of inputs, such as text, sound, images, and videos. It discusses two main approaches to building multimodal LLM...
The article discusses the process of transforming pretrained large language models (LLMs) into strong text classifiers. It focuses on classification finetuning, which involves training a language model to recognize specific class ...
The text is a promotion for a 3-hour coding workshop on Large Language Models (LLMs) and includes a table of contents for the video presentation. The author also mentions a book and GitHub repository related to LLMs.
The text discusses the latest advancements in pre-training and post-training methodologies for large language models (LLMs). It reviews the development and training pipeline of four major new LLMs: Alibaba's Qwen 2, Apple Intellig...