Cryptocurrencies

Using Unsupervised Machine Learning to Discover Unknown Patterns

Crypto_Currency

Image source: Getty Images

Background

Overview of Analysis

This project consists of four technical analysis deliverables.

Purpose

Stakeholders are interested in offering a new cryptocurrency investment portfolio for its customers. The company, however, is lost in the vast universe of cryptocurrencies. We’ll create a report that includes what cryptocurrencies are on the trading market and how they could be grouped to create a classification system for this new investment.

The data we will be working with is not ideal, so it will need to be processed to fit the machine learning models. Since there is no known output for what we are looking for, we will use unsupervised learning. To group the cryptocurrencies, we decided on a clustering algorithm. We’ll use data visualizations to share our findings.

Resources

Data source:

Software:


Methodology

D1: Preprocessing the Data for PCA

Using Pandas, we’ll preprocess the dataset in order to perform PCA in Deliverable 2.


D2: Reducing Data Dimensions Using PCA

Using the Principal Component Analysis (PCA) algorithm, we’ll reduce the dimensions of the X DataFrame to three principal components and place these dimensions in a new DataFrame.


D3: Clustering Cryptocurrencies Using K-means

Using the K-means algorithm, we’ll create an elbow curve using hvPlot to find the best value for K from the pcs_df DataFrame created in Deliverable 2. Then, we’ll run the K-means algorithm to predict the K clusters for the cryptocurrencies’ data.


D4: Visualizing Cryptocurrencies Results

Using our knowledge of creating scatter plots with Plotly Express and hvplot, we’ll visualize the distinct groups that correspond to the three principal components you created in Deliverable 2, then we’ll create a table with all the currently tradable cryptocurrencies using the hvplot.table() function.


Results:

D1: Preprocessing the Data for PCA

The following five preprocessing steps have been performed on the crypto_df DataFrame:

The final DataFrame is shown below, Figure 1.1

X_scaled

Figure (1.1) X_scaled DataFrame: X DataFrame have been standardized using the StandardScaler fit_transform() function.


D2: Reducing Data Dimensions Using PCA

The final DataFrame is shown below, Figure 1.2

X_pca_df

Figure (1.2) X_pca_df DataFrame


D3: Clustering Cryptocurrencies Using K-means

The K-means algorithm is used to cluster the cryptocurrencies using the PCA data, where the following steps have been completed:

Elbow_curve

Figure (1.3) Elbow curve


K_Means_algorithm

Figure (1.3) K-Means Algorithm: used to cluster the cryptocurrencies.


clustered_df

Figure (1.3) Clustered_df DataFrame.


D4: Visualizing Cryptocurrencies Results

3D_scatter

Figure (1.3) 3D Scatter plot


3D_scatter

Figure (1.3) 3D Scatter plot with CoinName and Algorithm on hove


hvplot_table

Figure (1.3) hvplot table


total_number

Figure (1.3) Total number of tradable cryptocurrencies


total_number

Figure (1.3) DataFrame that has the scaled data with the clustered_df DataFrame index.


hvplot_scatter_plot

Figure (1.3) hvplot scatter plot


Summary

On this project, we worked primarily with the K-means algorithm, the main unsupervised algorithm that groups similar data into clusters. And build on this by speeding up the process using principal component analysis (PCA), which employs many different features to reduce the dimensions of the DataFrame.

Then using the K-means algorithm, we created an elbow curve using hvPlot to find the best value for K. Then, runned the K-means algorithm to predict the K clusters for the cryptocurrencies’ data.

Finally we created scatter plots with Plotly Express and hvplot, to visualize the distinct groups that correspond to the three principal components. Then created a table with all the currently tradable cryptocurrencies using the hvplot.table() function.

The ultimate goal for this visualizations is to present the data in a story that would be interactive, easy to understanding and that provide the correct information to help the stakeholders in the decision making process.

References

Markdown

scikit-learn

K-Means Elbow

matplotlib