5.0 / 5
We will cover the following tasks in 22 minutes:
In this task, I am going to outline what the goals of the project are. I will go over some basics with you including a little bit about the Rhyme interface. We will run a few code chunks. At the end I will share a little bit about me.
How many are Poisonous
Whenever you are working with a new dataset, it is always wise to do some exploratory analysis. Often, I find that I want to jump right into the algorithm - perhaps you do too? But that’s not the best way to approach solving the problems and instead, should always start with exploratory data analysis. We will do some exploratory analysis in this task in order to better understand the data. Of course, we could spend a whole project on exploratory analysis (perhaps in another project). However, for the current project, we will only do a quick analysis.
Training and Test Split
It is important to break your datasets into two sets - one for training the model and the another for testing the trained model. And to do this, you often need to take random samples for both sets. Once the model is trained, we can use the test set (data the model hasn’t been trained on) and see how well are model performs. I will show you one technique that you can use to do this important step.
Training the Model
In this task, we are going to train the model. There are hundreds of different machine learning algorithms, but most of them follow the same overall structure. Pay attention to the function calls and especially the tilde since this approach will be used in many other machine learning models and algorithms.
In this task, I am going to show you some of the cool features of the FFTrees package. I will show you a confusion matrix. A confusion matrix can be used to quantify the performance of the model. I will also show you how to extract the most important attributes which can be used in a real world scenario when you need a couple data points to classify a problem.
In this task, I am going to use the trained model on the test set to see how well it performs. After that I will close the project with a few comments about the project an our learnings from it.
About the Host (Chris Shockley)
I am an R enthusiast, hiker, and amateur astronomer. My favorite hike is located in Mt. Rainier National Park, my favorite Deep Sky Object is Alberio, and my favorite R package is dplyr (since I use it everyday). I have a dog named Coog (Lllasa Apso)., I work as a Data Analyst/Financial Analyst for a Metals Co. located in Seattle, WA. I have been in my current position for 5 years. I work in SQL, R, R Shiny, QGIS. Because I have traveled the roads you are on I believe I will be an asset and will add value to your programming repertoire. We will walk through multiple examples and get to know each other through the process. Don't take my word for it though. Come on in and take a Project or two. Regards, Chris Shockley