Version 2.8.0 Gives You Early Access to Zookeeper-Less Kafka

Photo by Christian Lambert on Unsplash

Apache Kafka 2.8.0 is finally out and you can now have early-access to KIP-500 that removes the Apache Zookeeper dependency. Instead, Kafka now relies on an internal Raft quorum that can be activated through Kafka Raft metadata mode. The new feature simplifies cluster administration and infrastructure management and marks a new era for Kafka itself.

Zookeeper-less Kafka

In this article we are going to discuss why there was a need for removing ZooKeeper dependency in the first place. Additionally, we will discuss how ZooKeeper has been replaced by KRaft mode as of version 2.8.0 …


Save time when converting large Spark DataFrames to Pandas

Photo by Noah Bogaard on unsplash.com

Converting a PySpark DataFrame to Pandas is quite trivial thanks to toPandas()method however, this is probably one of the most costly operations that must be used sparingly, especially when dealing with fairly large volume of data.

Why is it so costly?

Pandas DataFrames are stored in-memory which means that the operations over them are faster to execute however, their size is limited by the memory of a single machine.

On the other hand, Spark DataFrames are distributed across the nodes of the Spark Cluster which is consisted of at least one machine and thus the size of the DataFrames is limited by the size of…


How to Fix SyntaxError: f-string expression part cannot include a backslash

Photo by Kevin Mak on Unsplash

In Python, it’s impossible to include backslashes in curly braces {} of f-strings. Doing so will result into a SyntaxError:

>>> f'{\}'
SyntaxError: f-string expression part cannot include a backslash

This behaviour aligns perfectly with PEP-0498 which is about Literal String Interpolation:

Backslashes may not appear inside the expression portions of f-strings, so you cannot use them, for example, to escape quotes inside f-strings

In the following sections we will explore a couple of options you can use in order to add backslashes (including new lines) in f-strings.

Using backslashes in f-strings

As we already discussed, backslashes cannot be used directly in Python f-strings…


Learn how to share interactive tables with your Medium readers using Airtable

Photo by Christin Hume on Unsplash

I mostly write articles about Programming and Machine Learning and thus I usually (have to) share data tables. Until recently, I haven’t come up with a better approach than just writing the tables in markdown format.

| First Name  | Last Name | Age |
| ----------- | --------- |-----|
| Andrew | Brown | 24 |
| Anna | Fox | 26 |
| Tony | Felix | 31 |
| Anthony | Anderson | 26 |

In today’s article we are going to explore how to share interactive data tables on Medium articles (or even elsewhere).

Introducing Airtable

Airtable is…


How to compare whether two variables point to the same object in memory

Photo by Markus Winkler on Unsplash

Python comes with two operators that can be used to check equality, namely ==(which is fairly common in most modern programming languages ) and is. It may sometimes be tricky to distinguish which of the two you should use, especially if you are not familiar with Python’s dynamic typing model.

In today’s article we are going to discuss the purpose of both == and is operators in Python when it comes to comparing two variables or objects. Additionally, we’ll go through an example to showcase when to use one over the other.

Objects, variables and references in Python

Every time we assign object values to a…


A step by step guide for making your Python package available on pip

Photo by Kelly Sikkema on Unsplash

You wrote a new Python package that solves a specific problem and it’s now time to share it with the wider Python community. To do so, you need to upload the package to a central repository that can be accessed by developers across the globe.

In today’s article we are going to discuss how PyPI lets developers share packages with other people who may wish to use that particular functionality in their own application. Additionally, we are going to introduce a step-by-step guide to help you upload your Python package on PyPi so that it is available to every Python…


What’s the difference between loc[]and iloc[] in Python and Pandas

Photo by Nery Montenegro on Unsplash

Indexing and slicing pandas DataFrames and Python may sometimes be tricky. The two most commonly used properties when it comes to slicing are iloc and loc.

In today’s article we are going to discuss the difference between these two properties. We’ll also go through a couple of examples to make sure you understand when to use one over the other.

First, let’s create a pandas DataFrame that we’ll use as an example to demonstrate a few concepts.

import pandas as pddf = pd.DataFrame(
index=[4, 6, 2, 1],
columns=['a', 'b', 'c'],
data=[[1, 2, 3], [4, 5, 6], [7…


What’s the difference between logging.warning() and logging.warn() in Python?

Photo by Andreas Schantl on Unsplash

Logging is an important aspect of development — useful logs can provide developers the information they need when debugging or even just running an application.

Python comes with logging module that defines functions and classes that can be used when implementing a logging system. logging supports all usual levels (e.g. info and warning ) however there has been some confusion about warning logs. There are currently two functions that implement this functionality namely logging.warning() and logging.warn().

In today’s article we are going to discuss the difference between the two functions and which of the two you must use.

logging.warn()

Until Python…


How to display all columns of Pandas DataFrames in the same line

Photo by Stone Wang on Unsplash

When we have to work with large pandas DataFrames that may also have multiple columns and rows it’s important to be able to display the DataFrames in a readable format. This is probably useful when debugging your code, too.

By default, only a subset of columns is displayed to the standard output when a DataFrame is printed out and has fairly large number of columns. The displayed columns may even be printed out in multiple lines.

In today’s article we are going to explore how to configure the required pandas options that will allow us to “pretty-print” pandas DataFrames.

The problem

Suppose…


Exploring some of the most commonly used bash commands

Photo by Brian McGowan on Unsplash

It is very important for Data Scientists to have a basic understanding around bash and its commands. Often referred to as the terminal, console or command line, Bash is a Unix shell that can help you navigate within your machine and perform certain tasks.

In today’s article, we are going to explore a few of the most commonly used bash commands that every Data Scientist must know.

ls

The ls (list) command is used to list directories or files. By default (i.e. running ls with no options at all) the command will return the directories and files of the current directory…

Giorgos Myrianthous

Machine Learning Engineer | Python Developer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store