This first Machine Learning tutorial will cover the detailed and complete data pre-processing process in building Machine Learning models.

We’ll embrace pre-processing in data transformation, selection, dimensionality reduction, and sampling for machine learning throughout this tutorial. In another opportunity, we will apply this process with various algorithms to help you understand what it is and how to use Machine Learning with Python language.

See The Jupyter Notebook for the concepts we’ll cover on building machine learning models and my LinkedIn profile for other Data Science articles and tutorials.

First of all, we need to define the business problem. After all…

The purpose of this tutorial is that we can build graphs to assist in the application of the data science process. We can employ visualizations during exploratory analysis, before or after processing data, construct statistical graphs to analyze datasets, identify variable relationships, or verify how data is distributed.

We can do all this with Matplotlib; however, we have a library that is much better and much easier when we refer to statistical graphs — Seaborn. Therefore, knowing how to create a visualization, regardless of its tool, is of fundamental importance.

Visit Jupyter Notebook to see all the concepts that we…

No último **tutorial de operações úteis em Pandas**, trabalhamos sempre com um único objeto, uma Series, DataFrame ou Array em NumPy — sempre um objeto. E se precisarmos trabalhar com mais de um objeto?

Acesse o **Jupyter Notebook** para consultar os conceitos que serão abordados sobre** SQL Join em Pandas**. Obs: as **funções**, **outputs** e **termos **importantes estão em **negrito **para facilitar a compreensão — pelo menos a minha.

Primeiro passo é importarmos o Pandas para podermos usar os pacotes, métodos e atributos, e o pacote NumPy para criarmos os Arrays:

**import** pandas **as** pd

**import **numpy **as **np

Para…

A complete Exploratory Data Analysis guide with Python | LinkedIn

In this tutorial we’ll explore the rental dataset, perform transformations and reorganize the data as if we were actually preparing the data for modeling and creating models.

Visit Jupyter Notebook to see the concepts that will be covered about Exploratory Data Analysis. Note: important **functions**, **outputs **and **terms **are **bold **to facilitate understanding — at least mine.

It is common that we receive the data to solve any problem and need to analyze and **explore **data, seek **relationships**, seek how variables are **organized**, have or not the need to transform…

In the last tutorial of valuable operations in Pandas, we worked with a single object, a Series, DataFrame, or Array in NumPy — always an object. What if we need to work with more than one object?

Visit Jupyter Notebook to see the concepts that we will cover about SQL Join in Pandas. Note: Important **functions**, **outputs**, and **terms** are **bold **to facilitate understanding — at least mine.

The first step is to import Pandas so we can use the packages, methods, and attributes, and the NumPy package to create the Arrays:

**import **pandas **as **pd

**import** numpy **as **np

…

This article will cover one of the most advanced algorithms and most widely used in analytical applications. This is an extensive subject, as we have several algorithms and various techniques for working with decision trees.

On the other hand, these algorithms are among the most powerful in Machine Learning and are easy to interpret. So, let’s start by defining what decision trees are and their representation through machine learning algorithms.

For decision tree learning models, we will study some algorithms with C4.5, C5.0, CART, and ID3. …

This article will address a very common case study, which aims to demonstrate some specific and relevant aspects of data analysis. In this analysis, we will see how to work with categorical variables and the type of transformation that we must make in the variables to balance them, ending with the construction of a predictive model of Machine Learning throughout the study.

We’ll work with a dataset that represents information from a credit card company. In the dataset is various information of customers who purchased cards with the carrier. …

Every AWS cloud computing services environment is based on Unix operating systems and their variations as the Linux operating system.

Most of the machines that we can create in the AWS environment are Linux machines. Most of the services we’ll have on AWS, whether it’s a Web server, Database, Big Data, or Machine Learning, will be servers in a Linux operating environment. Although AWS allows you to build Windows machines, much of the infrastructure likely contains some Linux Operating systems.

Linux is a Unix-based Operating System, a new Kernel that allowed developing a fully open-source operating system — the primary…

Today there are no excuses for not learning the Linux Operating System; we do not need to touch the traditional Windows machine with our applications, internet banking access, and other more routine applications that we use in Windows — all remain untouchable, we work only with the virtual environment.

Knowing a little about virtualization is a critical point because AWS works with virtualization. So, we created an instance in this tutorial.

Let’s do some basic operations on our EC2 Virtual Machine. …

The Naive Bayes algorithm is named after the Bayes probability theorem. The algorithm aims to calculate the probability that an unknown sample belongs to each possible class, predicting the most likely class.

This type of prediction is called statistical classification because it is wholly based on probabilities. This classification is also called naïve because it considers that the value of an attribute on a given class is independent of the importance of the other attributes, which simplifies the calculations involved.

The classification consists of finding, through machine learning, a model or function that describes different data classes. The purpose of…

São Paulo — Composing a repository of books (I bought), courses (I took), authors (I follow) & blogs (the direct ones) for my own understanding.