About:

Vu Trinh writes about data architecture and ETL scripts, gaining thousands of subscribers.

Website:

Specializations:

Interests:

Data architecture ETL scripts

Incoming Links:

Subscribe to RSS:
The article discusses the challenges of Apache Kafka's traditional architecture in modern cloud environments, particularly regarding cost inefficiencies and scaling issues. It introduces the concept of diskless Kafka, which utiliz...
The article discusses the emergence of vector databases, which are designed to manage data for AI workloads, particularly in the context of large language models (LLMs) like ChatGPT and Gemini. It explains the importance of vector...
YouTube engineers developed a CI/CD framework to address the unique challenges of data pipelines, enhancing data quality, testing efficiency, and team collaboration.
Delta Lake, Iceberg, and Hudi are crucial for ensuring ACID compliance in lakehouse architectures, addressing challenges in object storage systems.
The article provides an in-depth look at Apache Spark's shuffle process, its significance in data processing, and strategies for optimizing performance during wide-dependency transformations.
An analysis of Databricks' Unity Catalog, highlighting its role in lakehouse architecture, features, and the challenges it addresses in data management and security.
The article humorously outlines ineffective practices for building data pipelines, emphasizing the importance of complexity and speed over practicality. The author, Vu Trinh, shares insights from his experience as a data engineer,...
The article discusses the Log-Structured Merge-Tree (LSM-tree), a storage engine used in various OLTP and OLAP databases. It explains the architecture of LSM-trees, their advantages and disadvantages, and their increasing preferen...
Essential questions for constructing a data lakehouse include evaluating architecture, table formats, and query engines to meet specific business requirements.
A hands-on project using Apache Spark to process 20GB of data, focusing on practical learning and performance optimization techniques.
A comprehensive guide on AI agents, their components, and the importance of model selection for optimizing performance and cost in data engineering.
A structured roadmap for aspiring data engineers, emphasizing essential skills like data modeling, SQL, and Python, while sharing personal insights and experiences in the field.
Seven insights are shared to help readers quickly learn and understand OLAP systems, focusing on architectural concepts and metadata management.
An exploration of data architecture, its components, and the evolution of data management practices in modern enterprises.
An analysis of Databricks and Snowflake pricing models, highlighting their unique billing structures and best practices for managing cloud data warehouse costs.
The article delves into Apache Flink's architecture and features, emphasizing its low-latency performance and robust state management for stream processing applications.
The rise of single-node data processing engines like DuckDB and Polars marks a significant shift in data engineering, evolving from traditional cluster-based systems to more efficient, single-machine solutions.
The post delves into the semantic layer's importance in data engineering, highlighting its role in facilitating self-serve analytics and improving data accessibility for business users.
The article discusses a framework for building data pipelines, emphasizing the importance of asking the right questions to streamline the design and development process. It categorizes these questions into three sections: source, ...
A comprehensive breakdown of the pricing models for Microsoft Fabric, AWS Redshift, and Google BigQuery, aimed at simplifying cost management for cloud data warehouses.
Data engineers must understand the role of Large Language Models (LLMs) in data processing and their implications for effective data management and user needs.
A problem-driven approach to data engineering side projects enhances learning and job prospects by focusing on solving real business issues.
Nine lessons from six years in data engineering to help newcomers avoid pitfalls and understand the diverse responsibilities of the role.
Vietnam celebrates the New Year twice, with the Lunar New Year being more significant, while the author promotes a subscription discount for data engineering articles.