Daniel

About:

Daniel is a data enthusiast with a background in agriculture, passionate about data engineering and related technologies.

Website:

www.confessionsofadataguy.com

Specializations:

Data Engineer Data Analyst Manufacturing Engineer BI Report Developer BI Engineer

Interests:

SQL Data Warehousing Python ETL Data Pipelines The Cloud

Subscribe to RSS:

Link

2025-08-27 • data modeling contracts databricks saas solutions medallion architecture

The author expresses strong criticism of the concept of 'Medallion Architecture' in data modeling, particularly as promoted by Databricks. They argue that this approach is overly complicated and misleading, confusing many data eng...

2025-08-30 • python apache spark postgresql copy and paste data ingestion

The author discusses the challenges of inserting large datasets into a Postgres database using Spark's JDBC method, which is notably slow. After experimenting, the author finds that combining Python with Spark's multi-processing c...

2025-12-03 • data migration data science and machine learning databricks delta lake liquid clustering

The author shares their experience migrating hundreds of Delta Lake tables from partitioning to liquid clustering, highlighting the complexities and potential pitfalls involved in the process. They emphasize that while the change ...

2025-12-28 • performance duckdb big data polar data science and machine learning

DuckDB outperforms Polars in handling large datasets, showcasing better developer support and reliability in production environments.

2025-11-04 • cloud services big data data processing data modeling

The author discusses transitioning from Spark to Polars for handling distributed compute jobs, emphasizing the cost-effectiveness and simplicity of using single-node tools. They highlight the challenges of managing large datasets ...

2025-09-08 • technology opinions data lakehouse data modeling data science and machine learning

The author expresses a strong opinion that data modeling is dead, criticizing the current generation of data engineers for their reliance on modern technologies like Data Lakes and Lake Houses, which have overshadowed traditional ...

2025-08-15 • career development engineering software development nostalgia ai and it tools

The author reflects on the evolution of software engineering, contrasting the nostalgic past with the current landscape dominated by AI tools like Cursor. They discuss the importance of adapting to new technologies while emphasizi...

2025-11-13 • duckdb data processing evaluation polar efficiency metrics

The author expresses surprise at the lack of awareness surrounding 'Lazy Execution' in data processing, particularly with tools like Polars, Daft, and DuckDB. They emphasize the importance of processing data in batches rather than...

2025-09-12 • apache spark dataset databricks

The blog post discusses the challenges faced by data engineers, particularly in Spark performance tuning and optimizations. It emphasizes the importance of understanding DataFrame partitions and their impact on performance. The au...

2026-02-13 • data management migration data modeling cloud security databricks

A disciplined migration to Databricks emphasizes strong fundamentals, clear governance, and intentional design to ensure stability and control over the data architecture.

2026-02-11 • apache spark data conversion data science and machine learning databricks declarative pipelines

Declarative Pipelines represent the future of Spark by simplifying data engineering through structured frameworks that enhance reliability and maintainability.

2026-01-16 • data quality database apache beam, apache arrow data science and machine learning adbc

The post discusses the shift from traditional database drivers to Apache Arrow Database Connectivity, emphasizing its efficiency and performance benefits in data handling.

2025-10-03 • devops yaml infrastructure as code terraform data science and machine learning

The author expresses frustration with the overuse of Terraform and YAML in infrastructure management, arguing that it complicates development and debugging processes. They reminisce about simpler times when API calls and SDKs were...

2026-02-26 • engineering automation software development job security ai

AI will transform software engineering but is unlikely to replace engineers entirely, as human oversight and expertise remain essential.

2026-01-07 • data pipeline clean code software engineering polar ai

In the Age of AI, maintaining clean and organized code remains crucial for software developers, especially when using tools like Polars for data pipelines.

2025-12-16 • programming software development ai agents langchain ai

Embrace Agentic AI as a tool for innovation, encouraging software developers to learn and adapt rather than fear technological change.

2025-10-13 • software development chatbot large language models langchain ollama ai

The blog post discusses the evolving landscape of AI and LLMs, emphasizing the importance of hands-on experience in coding with these technologies. It highlights the growing reliance on AI tools among junior developers, leading to...

2025-09-25 • analytics duckdb polar dataframes data science and machine learning

The post discusses the comparison between DuckDB and Polars, emphasizing that there is no definitive answer to which is better as it depends on the context of use. DuckDB is described as an embedded analytical database suitable fo...

2025-08-22 • duckdb csv polar configuration schema name mismatch data science and machine learning

The author explores the challenges of loading CSV files with mismatched schemas using two data engineering tools: DuckDB and Polars. The post discusses how DuckDB handles schema mismatches by allowing for merging options, while Po...

2025-08-20 • data corruption csv polar data wrangling

The author discusses common errors encountered while using the Rust GOAT dataframe tool Polars, particularly focusing on schema mismatches when handling large CSV files. The text highlights the frustration of dealing with errors l...

2026-02-23 • sql data science and machine learning databricks delta lake custom tables

Databricks' temporary tables simplify data pipeline management for SQL teams, offering a familiar structure that reduces clutter and eases migration from traditional data warehouses.

2025-09-18 • duckdb data integration apache iceberg data science and machine learning databricks

The author discusses the integration of Apache Iceberg with DuckDB, expressing frustration over the lack of write support in DuckDB's Iceberg extension. They share their experience testing this integration on the Databricks platfo...

2026-01-19 • software development data management vector databases embedding models ai

The post advocates for using Lance as a simple and efficient vector database for storing embeddings in AI applications, emphasizing the need for software professionals to stay updated with new technologies.

2025-11-14 • cloud services big data data processing data science and machine learning pyarrow

The author discusses the increasing utility of the PyArrow Python package for data engineering tasks, particularly in data ingestion and handling large datasets in cloud storage. The text highlights PyArrow's capabilities in readi...