R/Power BI – Principal Component Analysis on Algerian insurance market

On the following tutorial, you will learn how to use PCA to extract data with many variables and create visualizations to display that data on Power BI.

In a previous blog, we have seen how to extract the Algerian insurance market data from internet by using the PDF connector of Power BI and in another blog, how to use R-script to extract data from a website. We are now going to see how to transform the data with a R-script into Power BI in order to transform it to perform a principal components analysis and how to plot the PCA created in Power BI by using the package GGBIPLOT.

In order to use this tutorial, you must have R installed in your computer and the packages DEVTOOLS and GGBILOT (by using the command line install_github(« vqv/ggbiplot »)) installed in you R environment.

We follow these steps:

1. We import the data from the UAR’s website as described in this tutorial. In this example, I will use the data of 2016 and 2017, for the non-life companies and without specialized companies (CCR, SGCI and CAGEX). Hereafter, the list of variables that we will use:

2. We click on the R visual and add the column of the observations first (in this case “Compagnies”), the column that give the group of which each observation belong to (here “Statut”) and the variables that we use to compare our observations (in this example I use 4 variables, all numerical). We also add filters “exercice” (because there are 2 periods) and “statut” (to make comparison between observations of a same groupe). As you can see in this image, a first script is generated automatically, this script create a dataframe from the selected columns.

3. We call all libraries related to the GGBPLOT : « ggplot2 », « plyr », « scales », « grid », « ggbiplot », lib.loc= »~/R/win-library/3.4″), then we perform a PCA on the numerical variables with the script line:
Market.pca <- prcomp(dataset[,c(3:ncol(dataset))], center=TRUE, scale.=TRUE)

4.Now, we add our plot ggbiplot(Market.pca, labels=dataset$Compagnies). Here we ask R to plot the variables and the observations into a plane made by the 2 first principal components. In this picture, we filter the exercise to 2017 and the we notice that the 1st and 2nd PC explain 76% of the information.

5. It’s interesting to add circles to try to group each observation in its “statut”. We tell R that ou group s are in “statut” Market.Statut = dataset$Statut then we use the line code
ggbiplot (Market.pca, ellipse = TRUE, labels=dataset$Compagnies, groups = Market.Statut)

Clearly, we see the difference between public and private insurers

6. To remove the arrows, we use: ggbiplot (Market.pca, ellipse = TRUE, var.axes=FALSE, labels=dataset$Compagnies, groups = Market.Statut)

And that’s it 👍

As you notice, using R-script to analyze your data in Power BI is very intuitive. Use R visualization in PBI is great, and the main advantage is the filters, with them, you can play easily with your visualizations.

Hope you enjoyed this tutorial. If you want me to share with the PBI file, just ask for it in the comment below.

If you want to learn more on how to use and perform analysis with Power BI and R, I strongly recommend the white book written by Leila Etaati, you can download it here. To learn more about the package “ggbiplot”, you can read a nice course by DATACAMP here.

Keep learning and see you next time 😄

2 commentaires sur “R/Power BI – Principal Component Analysis on Algerian insurance market

Votre commentaire

Entrez vos coordonnées ci-dessous ou cliquez sur une icône pour vous connecter:

Logo WordPress.com

Vous commentez à l’aide de votre compte WordPress.com. Déconnexion /  Changer )

Photo Google

Vous commentez à l’aide de votre compte Google. Déconnexion /  Changer )

Image Twitter

Vous commentez à l’aide de votre compte Twitter. Déconnexion /  Changer )

Photo Facebook

Vous commentez à l’aide de votre compte Facebook. Déconnexion /  Changer )

Connexion à %s