Open in app

Sign In

Write

Sign In

Mastodon
Giorgos Myrianthous
Giorgos Myrianthous

7K Followers

Home

About

Published in

Towards Data Science

·Pinned

How to Skip Tasks in Airflow DAGs

Skipping tasks in Airflow DAGs based on specific conditions — Recently, I was attempting to add a new task in an existing Airflow DAG that would only run on specific days of the week. However, I was surprised to find that skipping tasks in Airflow isn’t as straightforward as I anticipated. In this article, I will demonstrate how to skip…

Python

9 min read

How to Skip Tasks in Airflow DAGs
How to Skip Tasks in Airflow DAGs
Python

9 min read


Published in

Towards Data Science

·Pinned

ETL vs ELT: What’s the Difference?

A comparison between ETL and ELT in the context of Data Engineering — ETL (Extract-Transform-Load) and ELT (Extract-Load-Transform) are two terms commonly used in the realm of Data Engineering and more specifically in the context of data ingestion and transformation. While these terms are often used interchangeably, they refer to slightly different concepts and have different implications for the design of a data…

Programming

8 min read

ETL vs ELT: What’s the Difference?
ETL vs ELT: What’s the Difference?
Programming

8 min read


Published in

Towards Data Science

·Pinned

requirements.txt vs setup.py in Python

Understanding the purpose of requirements.txt, setup.py and setup.cfg in Python when developing and distributing packages — Introduction Managing dependencies in Python projects could be quite challenging, especially for people new to the language. When developing a new Python package, the chances are you will also need to utilise some other packages that will eventually help you write less code (in less time) so that you don’t have…

Python

7 min read

requirements.txt vs setup.py in Python
requirements.txt vs setup.py in Python
Python

7 min read


Published in

Towards Data Science

·Pinned

Kafka No Longer Requires ZooKeeper

Version 2.8.0 Gives You Early Access to Zookeeper-Less Kafka — Introduction Apache Kafka 2.8.0 is finally out and you can now have early-access to KIP-500 that removes the Apache Zookeeper dependency. Instead, Kafka now relies on an internal Raft quorum that can be activated through Kafka Raft metadata mode. …

Programming

5 min read

Kafka No Longer Requires ZooKeeper
Kafka No Longer Requires ZooKeeper
Programming

5 min read


Published in

Towards Data Science

·Pinned

Speeding Up the Conversion Between PySpark and Pandas DataFrames

Save time when converting large Spark DataFrames to Pandas — Converting a PySpark DataFrame to Pandas is quite trivial thanks to toPandas()method however, this is probably one of the most costly operations that must be used sparingly, especially when dealing with fairly large volume of data. Why is it so costly? Pandas DataFrames are stored in-memory which means that the operations over them are faster…

Python

3 min read

Speeding Up the Conversion Between PySpark and Pandas DataFrames
Speeding Up the Conversion Between PySpark and Pandas DataFrames
Python

3 min read


Published in

Level Up Coding

·May 18

Fixing ImportError: urllib3 v2.0 only supports OpenSSL 1.1.1+

Fixing ImportError when importing Python packages that rely on urllib3 and OpenSSL — A growing number of Python developers have been complaining about an ImportError being reported when attempting to import packages that depend on urrlib3 and OpenSSL, like openai Python package. More specifically, the error indicates that the latest versions of urllib3 only support OpenSSL versions, 1.1.1+: ImportError: urllib3 v2.0 only supports…

Python

3 min read

Fixing ImportError: urllib3 v2.0 only supports OpenSSL 1.1.1+
Fixing ImportError: urllib3 v2.0 only supports OpenSSL 1.1.1+
Python

3 min read


Published in

Towards Data Science

·May 15

How To List All BigQuery Datasets and Tables with Python

Programmatically list all datasets and tables using BigQuery API and Python — BigQuery is the managed Data Warehousing service on Google Cloud Platform that lets users store, manage and query data. A big part of Data and/or Analytics Engineering is the automation of certain tasks, including some interactions with BigQuery. …

Python

5 min read

How To List All BigQuery Datasets and Tables with Python
How To List All BigQuery Datasets and Tables with Python
Python

5 min read


Published in

Level Up Coding

·May 14

Fixing Error: Runtime Error could not find profile in dbt

How to provide a connection profile when running dbt commands — data build tool (dbt) is one of the hottest (relatively) recent addition to modern data stacks. dbt offers both, a CLI tool (dbt-core) as well as a cloud tool (dbt Cloud), which is a paid service where you can host and run your dbt projects. When using the command-line interface…

Programming

5 min read

Fixing Error: Runtime Error could not find profile in dbt
Fixing Error: Runtime Error could not find profile in dbt
Programming

5 min read


Published in

Level Up Coding

·May 11

How To Remove __pycache__ from Visual Studio Code

Hiding __pycache__ metadata from VS Code IDE — I recently switched from PyCharm to Visual Studio Code for my Python projects. However, I encountered an issue where VS Code displays __pycache__ files, which can clutter the File Explorer. This can be particularly frustrating for those working on large projects with multiple modules and folders given that cache files…

Python

3 min read

How To Remove __pycache__ from Visual Studio Code
How To Remove __pycache__ from Visual Studio Code
Python

3 min read


Published in

Towards Data Science

·May 9

What is pyproject.toml in Python

Managing Python project dependencies in pyproject.toml files — Dependency management in Python is tricky, and sometimes frustrating work. Newcomers, are usually tempted to install any dependency (i.e. package) they may find useful, even in a single virtual enviroment. Therefore, this approach increases the chances of having conflicting package dependencies and ending up in the so-called dependency hell.

Python

5 min read

What is pyproject.toml in Python
What is pyproject.toml in Python
Python

5 min read

Giorgos Myrianthous

Giorgos Myrianthous

7K Followers

I write about Python, DataOps and MLOps

Following
  • Conor O'Sullivan

    Conor O'Sullivan

  • Deepak Chopra

    Deepak Chopra

  • David Gerken

    David Gerken

  • Tim Denning

    Tim Denning

  • TDS Editors

    TDS Editors

See all (109)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech