About:

Sherman Chann is a developer interested in programming and cybersecurity, known for organizing CTFs and enjoying programming puzzles.

Website:

Specializations:

Interests:

Programming Cybersecurity Open source contributions Competitive programming

Incoming Links:

Outgoing Links:

Dan Luu
Subscribe to RSS:
The post critiques the declining value of intellectual pursuits and explores the implications of AI on human activities and societal roles.
A candid reflection on a disheartening year, exploring health, career disillusionment, and the psychological impacts of technology and social isolation.

0Time To Think

2025-02-03

The author discusses the struggle of thinking with more than 60 seconds of context, the degradation of their writing ability, and the impact of AI cognition speed. They reflect on the release of GPT-4 and the limitations of langua...

0Are exploits free?

2026-02-13

...
The text discusses the cost of replicating a Google Deepmind paper titled Scaling Exponents Across Parameterizations and Optimizers. It details the experiments conducted, the cost of each experiment, and the problems with the expe...
The paper discusses the DeepSeek Core Readings 0 - Coder, which involves pretraining data from Github, constructing pretraining data, and evaluating the performance of the model. It also covers the architecture, training objective...
The paper discusses the DeepSeek Core Readings 1 - LLM - 152334H, which includes details about the pretraining, scaling laws, tokenization, model architecture, LR scheduler, infrastructure, scaling experiments, alignment, DPO, eva...
The text provides tips for remaining conscious and avoiding mental unawareness, including reading, surrounding yourself with conscious people, sleeping on the floor, disabling notifications, adding barriers to distractions, tracki...

02023

2023-12-31

The author reflects on the year 2023, which was materially and situationally better but subjectively worse in spirit and health. The author lists 5 resolutions for 2024 and discusses predictions and history. The author also provid...
The text discusses the mixture-of-experts paradigm and its potential replacement by dense models. It covers issues with fine-tuning, vRAM limitations, and the potential benefits of MoE models for certain users. The author also pre...
The text explains the mechanism of token dropping in GPT-4 and the concept of Mixture-of-Experts (MoE) in detail. It discusses the routing strategies, cutoff dates for GPT-4, GPT-4 leaks, and the process of token choice. It also e...
The text discusses the non-deterministic behavior of GPT-4, attributing it to the Sparse MoE architecture. The author presents a hypothesis that batched inference in Sparse MoE models is the root cause of non-determinism in the GP...

0Blog Refurbishment

2022-08-13

The author discusses the process of refurbishing their blog, the challenges they faced, and the changes they made to the site. They also talk about their motivation for the redesign and their plans for future content.