Version 2.8.0 Gives You Early Access to Zookeeper-Less Kafka

Photo by Christian Lambert on Unsplash

Apache Kafka 2.8.0 is finally out and you can now have early-access to KIP-500 that removes the Apache Zookeeper dependency. Instead, Kafka now relies on an internal Raft quorum that can be activated through Kafka Raft metadata mode. The new feature simplifies cluster administration and infrastructure management and marks a new era for Kafka itself.

Zookeeper-less Kafka

In this article we are going to discuss why there was a need for removing ZooKeeper dependency in the first place. Additionally, we will discuss how ZooKeeper has been replaced by KRaft mode as of version 2.8.0 …

Save time when converting large Spark DataFrames to Pandas

Photo by Noah Bogaard on

Converting a PySpark DataFrame to Pandas is quite trivial thanks to toPandas()method however, this is probably one of the most costly operations that must be used sparingly, especially when dealing with fairly large volume of data.

Why is it so costly?

Pandas DataFrames are stored in-memory which means that the operations over them are faster to execute however, their size is limited by the memory of a single machine.

On the other hand, Spark DataFrames are distributed across the nodes of the Spark Cluster which is consisted of at least one machine and thus the size of the DataFrames is limited by the size of…

12 Tricks for Smarter Google Searching

Photo by on Unsplash

Millions of people use Google to search and access information but still only few take advantage of its full potential. Most people will type a bunch of keywords and let Google figure it out.

If you don’t know, Google it.

There are actually a few operators you can use in your search queries in order to narrow down the search results. In this article we’ll explore the most popular Google Search operators that can help you use Google more efficiently and get better results.

1. Search for an exact match

Step up your Python knowledge with 4 new statements

Photo by Allef Vinicius on Unsplash

Python 3.9 has been around since late 2020 and it comes with a bunch of new syntax features and improvements. In this article, we are going to explore a few nice additions which were packed into this version. We will also discuss how to upgrade to Python 3.9 in case you want to.

1. Merging Dictionaries

As of Python 3.9, you can use the | operator to merge two or more dictionaries together. For duplicate keys the rightmost dictionary takes precedence. This syntactic sugar is part of PEP-584.

An Overlooked Extension Data Type in Python

Photo by Sebastian Pociecha on Unsplash

Tuples are among the most fundamental and widely used data structures in Python. What most people don’t know — or usually forget about — is that the language comes with an extension type called Named Tuple that is built on top of the core tuple type.

In this article, we are going to explore named tuples — a rarely used collection that enhances standard tuples. We will discuss their syntax, how to use them in your code and most importantly when.

What are Named Tuples and When to Use Them

Dictionaries offer key-lookup and are perfect fit to cases where we need to create key-value data structures using mnemonic…

Amazing Python snippets that won’t take readability away

Buzz Lightyear toy
Buzz Lightyear toy
Photo by Adam Nemeroff on Unsplash.

Python is a general-purpose programming language that can be used to build projects of any size and gives developers the provisions to write logical and clear code — even for large-scale projects.

The design philosophy of Python facilitates code readability by enforcing the use of indentation in order to define code blocks explicitly. Just having well-indented code, though, doesn’t necessarily mean that your code is also clear and well-written.

“Readability counts.” — The Zen of Python

One of the biggest advantages of Python compared to most other general-purpose programming languages is that it is less verbose. Sacrificing code readability for…

Machine Learning, Software Engineering

Building resilient, atomic and versioned data lake operations

Photo by Hannes Egler on Unsplash

Version Control Systems, such as Git, are essential tools for versioning and archiving source code. Version Control helps you keep track of the changes in the code. When a change is made, an error could be introduced, too, but with source control tools, developers can roll back to a working state and compare it against the non-working piece of code. This minimizes the disruption to other team members that are probably working with the code and helps them collaborate efficiently.

Apart from code, data changes too.

Usually, Data Scientists need to access a range of datasets to complete a specific…

Understand the conditional loop statements in Python

Black and white art installation
Black and white art installation
Photo by Denis Gažík on Unsplash.

for loops are definitely some of the most commonly used statements. Python comes with a rarely used syntactic sugar that allows else clauses to be used along with loop statements.

“Loop statements may have an else clause.” — Python docs

In this article, we are going to discuss the ability of loop statements to have else clauses. Additionally, we will explore how this particular syntactic sugar can be used in common programming constructs and make our code more readable and even more Pythonic.

What Does for/else Do?

In for:else statements, the else clause is executed upon the exhaustion of the iterable (i.e. when the…

Dynamically adjusting the width of Excel column names when using pandas.ExcelWriter and Python

Photo by Mika Baumeister on Unsplash

One of the most frustrating things you possibly need to deal with is when generating an Excel file using Python, that contains numerous columns you are unable to read due to the short width of the columns. Ideally, you should deliver readable spreadsheets where all the columns are properly formatted so that they are readable.

In this article, we are going to explore quick and easy ways one can use for

  • Dynamically adjusting all column widths based on the length of the column name
  • Adjusting a specific column by using its name
  • Adjusting a specific column by using its index

It’s always a good feeling to give back

People joining hands
People joining hands
Photo by Hannah Busing on Unsplash.

Stack Overflow is the leading community for developers where people are able to ask and answer programming-related questions. More than 21 million questions have been asked, more than 31 million answers have been provided, and more than 80 million comments have been made! I have to admit that most of the posts are pretty bad, but there are definitely tons of answers that are amazingly useful, well-written, and justified.

Apart from the very basic questions asking how to print a string with Python, Java, or Go, there are also numerous unpopular questions to which you’ll find some useful answers. …

Giorgos Myrianthous

Machine Learning Engineer | Python Developer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store