Version 2.8.0 Gives You Early Access to Zookeeper-Less Kafka

Introduction

Apache Kafka 2.8.0 is finally out and you can now have early-access to KIP-500 that removes the Apache Zookeeper dependency. Instead, Kafka now relies on an internal Raft quorum that can be activated through Kafka Raft metadata mode. …


Save time when converting large Spark DataFrames to Pandas

Converting a PySpark DataFrame to Pandas is quite trivial thanks to toPandas()method however, this is probably one of the most costly operations that must be used sparingly, especially when dealing with fairly large volume of data.

Why is it so costly?

Pandas DataFrames are stored in-memory which means that the operations over them are faster…


Discussing different ways for dropping columns from DataFrames in PySpark

Introduction

Dropping columns from DataFrames is one of the most commonly performed tasks in PySpark. In today’s short guide, we’ll explore a few different ways for deleting columns from a PySpark DataFrame. Specifically, we’ll discuss how to

  • delete a single column
  • drop multiple columns
  • reverse the operation and instead, select the…


Exploring multiple ways for adding new columns to existing Spark DataFrames

Introduction

Adding new columns to PySpark DataFrames is probably one of the most common operations you need to perform as part of your day-to-day work.

In today’s short guide, we will discuss about how to do so in many different ways. …


Discussing the trade off between model accuracy and interpretability in Machine Learning

Introduction

In one of my previous roles, I discussed the difference between parametric and non-parametric methods in the context of Machine Learning.

Parametric methods make assumptions about the relationship between the data and the function to be estimated and thus they are generally inflexible. For example, we may assume that the…


Data Engineering

Storing, archiving, and managing data using Object Storages

Introduction

In the big data era Object Storage architecture continuously gains more traction by teams that wish to store, archive, and manage high volumes of data.

In today’s article, we are going to discuss the fundamental concepts around Object Storage architecture. …


Discussing the difference between parametric and non-parametric methods in the context of Machine Learning

Introduction

In one of my previous articles, I discussed the difference between prediction and inference in the context of Statistical Learning. Despite their main difference with respect to the end goal, in both approaches we need to estimate an unknown function f.

In other words, we need to learn a function…


Discussing the difference between prediction and inference in the context of Statistical Learning

Introduction

In simple terms, Statistical Learning refers to a collection of methods and approaches that can be applied to estimate an unknown function f.

For example, let’s suppose we have to work with some real estate data so that we can potentially find a relationship between the characteristics (i.e. the predictors…


Understanding how to create reproducible results when generating pseudo-random constructs with NumPy in Python

Introduction

Randomness is a fundamental mathematical concept that is usually used in the context of programming as well. Sometimes, we may need to introduce some randomness when creating some toy data or when we need to perform some specific calculations that will be dependent on some random event.

In today’s article…


Discussing when to use apply() or map() when applying functions to pandas columns and how to do it more efficiently

Introduction

The application of a particular function over pandas columns is a quite common approach when it comes to data transformation. In today’s short guide, we are going to discuss how to apply pre-defined or lambda functions over one or more columns in pandas DataFrames.

Additionally, we will discuss how to…

Giorgos Myrianthous

Machine Learning Engineer | Python Developer | https://www.buymeacoffee.com/gmyrianthous

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store