About:

Tom Yeh is the author of 'AI by Hand ✍️', a Substack publication with tens of thousands of subscribers. The website features content related to AI, curated and written by Prof. Tom Yeh.

Website:

Specializations:

Incoming Links:

Subscribe to RSS:
SwiGLU is an advanced activation function that combines gating and amplification to enhance the performance of large language models in Transformer architectures.
The seminar explores the evolution of reinforcement learning techniques and features insights from Cameron R. Wolfe on its application in language models and research challenges.
AI's unique capabilities can transform work processes, offering new opportunities for collaboration and creativity, while emphasizing the need for human oversight.
Gated Linear Units (GLU) introduce a dynamic gating mechanism that enhances activation functions by deriving gates from input rather than using fixed functions.
RMSNorm offers a more efficient alternative to LayerNorm by normalizing activations using root mean square, reducing computational overhead in large-scale models.
Layer normalization stabilizes neural network training by rescaling activations, preventing any single feature from dominating the computation, especially in small mini-batch sizes.
ELU improves deep neural networks by offering a smooth transition in the negative region, enhancing stability and preventing dead neurons compared to traditional activation functions.
A six-level framework for understanding Transformers is presented, guiding learners from basic concepts to coding through practical exercises and hands-on learning.
The post explains the Proximal Policy Optimization algorithm, focusing on its clipping mechanism to ensure stable policy updates in reinforcement learning.
Batch normalization stabilizes and accelerates neural network training by normalizing activations within mini-batches to maintain consistent input ranges.
Luke Yeh illustrates the workings of Generative Adversarial Networks (GANs) while highlighting the enduring value of human artistry over AI-generated creations.
The post emphasizes the need to observe AI advancements critically, featuring insights from Val Andrei Fajardo on building robust AI systems from foundational principles.
A father mentors his son in an AI project, creatively using hand-drawn illustrations to teach convolutional neural networks, highlighting the blend of art and technology.
GELU and Swish are activation functions that smooth negative values, with GELU using the Gaussian error function and Swish using the sigmoid function for gating.
Swish (SiLU) provides a smooth activation function that gradually scales inputs, contrasting with the abrupt transitions of ReLU.
KL divergence quantifies the information loss when using a predicted probability distribution to approximate a true distribution, calculated through logarithmic differences and weighted sums.
The Tanh activation function optimizes neural network learning by centering signals around zero and allowing both positive and negative outputs, unlike the sigmoid function.
Understanding debouncing through OpenClaw helps students manage anxiety and focus on core computer science principles amidst technology hype.
Claw is an innovative open-source framework that combines advanced reasoning, memory, and safety features, gaining popularity in the developer community.
Cross Entropy Loss measures the accuracy of a model's predictions by comparing its predicted distribution to the true distribution from labeled data.
Entropy measures uncertainty in probability distributions, increasing with more equally likely outcomes and reaching maximum when all outcomes are possible.
Binary cross entropy loss measures the accuracy of a model's predicted probabilities against actual outcomes in binary classification tasks.
L2 loss is a metric that measures the accuracy of a model's predictions by calculating the squared differences from the target values.
The L2 norm is a crucial mathematical concept for measuring vector magnitude and distance, with applications in AI and data analysis.