Sebastian Raschka, PhD

About:

Sebastian Raschka, PhD, is the author of 'Ahead of AI,' a Substack publication that specializes in Machine Learning and AI research. The publication is read by tens of thousands of researchers and practitioners who want to stay ahead in the ever-evolving field.

Website:

magazine.sebastianraschka.com

Specializations:

AI Researcher

Incoming Links:

Brendan Long Brian Fitzgerald Eric Eric Logan Thorneloe Senko Rašić Vicki Boykis Vineeth Show more (3)

Subscribe to RSS:

Link

2026-02-25 • architecture machine learning large language models open weights models ai

A comprehensive overview of ten notable open-weight LLM releases from early 2026, focusing on their architectures, performance, and innovative features.

2026-01-24 • machine learning large language models model performance and optimization inference scaling ai techniques

Inference scaling significantly improves the accuracy of large language models by optimizing computational resources during text generation, with various methods explored in detail.

2025-12-03 • machine learning ai models reinforcement learning deepseek multi-head attention

The article provides an in-depth analysis of the DeepSeek V3.2 model, highlighting its performance improvements over its predecessor, DeepSeek V3, and its unique features such as the DeepSeek Sparse Attention (DSA) mechanism. It d...

2025-11-04 • machine learning large language models transformers efficiency metrics ai

The article discusses the evolution of large language models (LLMs) beyond traditional autoregressive transformer architectures, highlighting various alternative approaches such as linear attention hybrids, text diffusion models, ...

2025-10-05 • machine learning large language models evaluation data science and machine learning ai

The article discusses the evaluation methods for large language models (LLMs), focusing on four main approaches: multiple-choice benchmarks, verifiers, leaderboards, and LLM judges. It emphasizes the importance of understanding th...

2025-08-09 • openai machine learning gpt-5 ai development and strategy

OpenAI has released two new open-weight large language models (LLMs), gpt-oss-120b and gpt-oss-20b, marking the first such release since GPT-2 in 2019. The article discusses the models' architecture, optimizations for local runnin...

2025-07-19 • machine learning natural language processing deep learning ai

The article provides a comprehensive analysis of the evolution of large language models (LLMs) from GPT-2 to the latest architectures like DeepSeek-V3 and Llama 4. It highlights the structural similarities and advancements in atte...

2025-07-01 • machine learning research papers reinforcement learning ai

The author presents a curated list of research papers focused on large language models (LLMs) and their reasoning capabilities, organized by topic rather than date. The list is divided into three categories: training strategies, i...

2025-12-30 • reinforcement learning large language models deep learning technological change ai

2025 saw major advancements in large language models, particularly with the introduction of reasoning models and RLVR, reshaping the landscape of AI development and applications.

2025-09-06 • open source machine learning pytorch qwen 3.5 ai development and strategy

The article provides a hands-on guide to understanding and implementing the Qwen3 architecture, a popular open-weight model family. It highlights the reasons for Qwen3's popularity, including its developer-friendly open-source lic...

2025-12-30 • machine learning research large language models data science and machine learning ai

A curated list of LLM research articles from mid-2025, categorized for easy access, alongside the author's annual review on LLM progress and predictions.

2025-06-17 • machine learning python caching data science and machine learning

The article explains the concept of KV caches in LLMs and how they work conceptually and in code with a from-scratch, human-readable implementation. It discusses the benefits and downsides of using KV caches, and provides a concre...

2025-05-10 • programming machine learning python pytorch tensorflow

The author shares a detailed content on how to code LLMs, which is one of the best ways to understand how LLMs work. The videos are supplementary content for the Build a Large Language Model (From Scratch) book. The author also sh...

2025-04-19 • reinforcement learning ppo rlhf reasoning models gpo long range anti-ship missile

The article discusses the recent developments in reasoning via reinforcement learning, focusing on the training methods used to develop and improve reasoning models. It explains the RLHF basics, the reinforcement learning algorith...

2025-03-29 • programming machine learning education reasoning

The text is an introduction to a new book on reasoning in large language models (LLMs). It provides an overview of reasoning methodologies and their application in LLMs, including inference-time scaling and reinforcement learning....

2025-03-08 • reasoning reinforcement learning supervised fine-tuning inference-time scaling

The text discusses the recent research advancements in reasoning-optimized large language models (LLMs), focusing on inference-time compute scaling. It explores different methods to improve reasoning capabilities of LLMs, such as ...

2025-02-05 • machine learning reasoning

The article discusses the four main approaches to building reasoning models and how to enhance LLMs with reasoning capabilities. It defines reasoning as the process of answering complex, multi-step questions and outlines the metho...

2025-01-15 •

The text discusses noteworthy AI research papers of 2024, focusing on the second half of the year. It covers a variety of relevant topics, from mixture-of-experts models to new LLM scaling laws for precision. The author highlights...

2024-12-31 •

The article discusses the noteworthy AI research papers of 2024, focusing on LLM research. It covers a variety of topics, from mixture-of-experts models to new LLM scaling laws for precision. The author shares research papers that...

2024-12-08 •

The text is about the author's plans to publish a new article with a discussion of all their research highlights from 2024, but due to an accident and serious injury, they are currently unable to work at a computer and finish the ...

2024-11-03 •

The text explains the concept of multimodal LLMs, which are large language models capable of processing multiple types of inputs, such as text, sound, images, and videos. It discusses two main approaches to building multimodal LLM...

2024-09-21 •

The article discusses the process of transforming pretrained large language models (LLMs) into strong text classifiers. It focuses on classification finetuning, which involves training a language model to recognize specific class ...

2024-08-31 •

The text is a promotion for a 3-hour coding workshop on Large Language Models (LLMs) and includes a table of contents for the video presentation. The author also mentions a book and GitHub repository related to LLMs.

2024-08-17 •

The text discusses the latest advancements in pre-training and post-training methodologies for large language models (LLMs). It reviews the development and training pipeline of four major new LLMs: Alibaba's Qwen 2, Apple Intellig...