This first Machine Learning tutorial will cover the detailed and complete data pre-processing process in building Machine Learning models.
We’ll embrace pre-processing in data transformation, selection, dimensionality reduction, and sampling for machine learning throughout this tutorial. In another opportunity, we will apply this process with various algorithms to help you understand what it is and how to use Machine Learning with Python language.
First of all, we need to define the business problem. After all…
The purpose of this tutorial is that we can build graphs to assist in the application of the data science process. We can employ visualizations during exploratory analysis, before or after processing data, construct statistical graphs to analyze datasets, identify variable relationships, or verify how data is distributed.
We can do all this with Matplotlib; however, we have a library that is much better and much easier when we refer to statistical graphs — Seaborn. Therefore, knowing how to create a visualization, regardless of its tool, is of fundamental importance.
Visit Jupyter Notebook to see all the concepts that we…
No último tutorial de operações úteis em Pandas, trabalhamos sempre com um único objeto, uma Series, DataFrame ou Array em NumPy — sempre um objeto. E se precisarmos trabalhar com mais de um objeto?
Acesse o Jupyter Notebook para consultar os conceitos que serão abordados sobre SQL Join em Pandas. Obs: as funções, outputs e termos importantes estão em negrito para facilitar a compreensão — pelo menos a minha.
Primeiro passo é importarmos o Pandas para podermos usar os pacotes, métodos e atributos, e o pacote NumPy para criarmos os Arrays:
import pandas as pd
import numpy as np
In this tutorial we’ll explore the rental dataset, perform transformations and reorganize the data as if we were actually preparing the data for modeling and creating models.
It is common that we receive the data to solve any problem and need to analyze and explore data, seek relationships, seek how variables are organized, have or not the need to transform…
In the last tutorial of valuable operations in Pandas, we worked with a single object, a Series, DataFrame, or Array in NumPy — always an object. What if we need to work with more than one object?
The first step is to import Pandas so we can use the packages, methods, and attributes, and the NumPy package to create the Arrays:
import pandas as pd
import numpy as np
This article will cover one of the most advanced algorithms and most widely used in analytical applications. This is an extensive subject, as we have several algorithms and various techniques for working with decision trees.
On the other hand, these algorithms are among the most powerful in Machine Learning and are easy to interpret. So, let’s start by defining what decision trees are and their representation through machine learning algorithms.
For decision tree learning models, we will study some algorithms with C4.5, C5.0, CART, and ID3. …
This article will address a very common case study, which aims to demonstrate some specific and relevant aspects of data analysis. In this analysis, we will see how to work with categorical variables and the type of transformation that we must make in the variables to balance them, ending with the construction of a predictive model of Machine Learning throughout the study.
We’ll work with a dataset that represents information from a credit card company. In the dataset is various information of customers who purchased cards with the carrier. …
Every AWS cloud computing services environment is based on Unix operating systems and their variations as the Linux operating system.
Most of the machines that we can create in the AWS environment are Linux machines. Most of the services we’ll have on AWS, whether it’s a Web server, Database, Big Data, or Machine Learning, will be servers in a Linux operating environment. Although AWS allows you to build Windows machines, much of the infrastructure likely contains some Linux Operating systems.
Linux is a Unix-based Operating System, a new Kernel that allowed developing a fully open-source operating system — the primary…
Today there are no excuses for not learning the Linux Operating System; we do not need to touch the traditional Windows machine with our applications, internet banking access, and other more routine applications that we use in Windows — all remain untouchable, we work only with the virtual environment.
Knowing a little about virtualization is a critical point because AWS works with virtualization. So, we created an instance in this tutorial.
Let’s do some basic operations on our EC2 Virtual Machine. …
The Naive Bayes algorithm is named after the Bayes probability theorem. The algorithm aims to calculate the probability that an unknown sample belongs to each possible class, predicting the most likely class.
This type of prediction is called statistical classification because it is wholly based on probabilities. This classification is also called naïve because it considers that the value of an attribute on a given class is independent of the importance of the other attributes, which simplifies the calculations involved.
The classification consists of finding, through machine learning, a model or function that describes different data classes. The purpose of…
São Paulo — Composing a repository of books (I bought), courses (I took), authors (I follow) & blogs (the direct ones) for my own understanding.