The root of language’s flexibility

Image for post
Image for post
Photo by Samuel-Elias Nadler on unsplash.com

If you have a background in languages such as Java, C or C++ which are compiled or statically typed, you might find the way that Python works a bit confusing. For instance, when we assign a value to a variable (say a = 1 ), how the heck does Python know that variable a is an integer?

In statically-typed languages the variables’ types are determined at compile-time. In most languages that support this static typing model, programmers must specify the type of each variable. …


What’s the difference between distinct() and dropDuplicates() in Spark?

Image for post
Image for post
Photo by Juliana on unsplash.com

The Spark DataFrame API comes with two functions that can be used in order to remove duplicates from a given DataFrame. These are distinct() and dropDuplicates() . Even though both methods pretty much do the same job, they actually come with one difference which is quite important in some use cases.

In this article, we are going to explore how both of these functions work and what their main difference is. Additionally, we will discuss when to use one over the other.

Note that the examples that we’ll use to explore these methods have been constructed using the Python API…


Learn how to read that Bitcoin chart

Image for post
Image for post
Photo by Austin Distel on unsplash.com

Candlestick is probably one of the most common charts used to analyse insights for decision making when trading. Usually traders prefer candlestick chart over other forms such as a bar chart, as the former offers a better and clearer visual perception of the price change over a time window.

Candlestick charts are seen almost everywhere due to the increasing popularity of cryptocurrencies and stock trading. In this article, we are going to explore the key components of a candlestick chart and what they indicate. …


What are pre-commit hooks and how they can benefit your Python projects

Image for post
Image for post
Photo by Vishal Jadhav on unsplash.com

Like most Version Control Systems (VCS), Git comes with the ability to run scripts and trigger actions at certain points. These scripts are called hooks and they can reside to either of the client or server side. Server-side hooks are usually triggered before or after a push is received while client-side hooks are typically triggered before committing or merging. There are numerous hook types that can be used to enforce specific actions to serve your workflow.

In this article we are going to explore pre-commit hooks, which are certain actions executed on client-side. We will discuss what purposes they serve…


All you need to know about classmethod and staticmethod

Image for post
Image for post
Photo by Chris Liverani on unsplash.com

Apart from instance methods — which are the most common class members in the context of object oriented programming — classes in Python can also have static and class methods. The language comes with two decorators, namely @staticmethod and @classmethod , that allow us to define such members in classes. It is important to understand these concepts as they will help you write object-oriented Python which is more clear and properly structured and will eventually make maintenance much easier.

In this article, we are going to explore what each of the two does, how to create them and also in…


A deep dive into indexing and slicing over ordered collections

Image for post
Image for post
Photo by Ryoji Iwata on unsplash.com

In Python, the elements of ordered sequences like strings or lists can be -individually- accessed through their indices. This can be achieved by providing the numerical index of the element we wish to extract from the sequence. Additionally, Python supports slicing that is a characteristic that lets us extract a subset of the original sequence object.

In this article, we are going to explore how both indexing and slicing work, and how they can be used in order to write cleaner and more Pythonic code.

Indexing

Like most programming languages, Python offsets start at position 0 and end at position N-1…


Quickly merge branches using the command line or GitHub Desktop

Branches
Branches
Photo by dorota dylka on Unsplash.

In most development teams, every time a new feature or bug fix needs to be implemented, developers create a feature branch from the development branch. While an individual engineer is working on the feature implementation, other tickets are also progressing on different feature branches that can also be merged to the development one.

In most cases, branches being merged shouldn’t really affect the implementation of your own ticket. However, there are certain scenarios in which you might want to merge another feature branch into your own branch. …


How to print huge PySpark DataFrames

Image for post
Image for post
Photo by Mika Baumeister on unsplash.com

In the big data era, it is quite common to have dataframes that consist of hundreds or even thousands of columns. And in such cases, even printing them out can sometimes be tricky as you somehow need to ensure that the data is presented in a clear but also efficient way.

In this article, I am going to explore the three basic ways one can follow in order to display a PySpark dataframe in a table format. …


All you need to know about sets in Python

Image for post
Image for post
Photo by Maxwell Nelson on unsplash.com

A Python set is a collection type introduced back in version 2.4. It is one of the most powerful data structures of the language as its characteristics can prove useful and practical in numerous use cases. In this article, we’ll have a quick look at the theory behind sets and later on will discuss the most common set operations as well as a few use-cases where sets come in handy.

A set is a mutable and unordered collection of hashable (i.e. immutable) objects with no duplicates. …


5 of the most exciting features of the new release of Apache Spark 3.0

Image for post

A new major release was made available on the 10th of June 2020 for Apache Spark. Version 3.0 — a result of more than 3,400 tickets — builds on top of version 2.x and comes with numerous features — new functionality, bug fixes and performance improvements.

10 years after its initial release as an open source project, Apache Spark has become one of the core technologies in Big Data era. …

Giorgos Myrianthous

Python | Data | ML

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store