Scraping Data with Rvest: Data Harvesting

Hadley Wickham’s Rvest package is your go to for scraping web data using R. In this tutorial I will show you how to scrape a single page, a table and then we will bump up the speed where we scrape multiple fields and multiple pages at once. This skill set will help you pull data from various sources. It is then that the magic can happen. By being able to pull this data you can then use the R language to draw insights or just display data that prior was difficult to pull down. Come on in and I’ll show you how.

Join for $4.99
Scraping Data with Rvest: Data Harvesting

Duration (mins)

Learners

5.0 / 5

Rating

Task List


We will cover the following tasks in 36 minutes:


Introduction

Cloudy Nights is a forum I visit for Astronomy. There are conversations boards on various topics but mostly I use the Classified Section, which has telescope equipment for sale. In this chapter I will show you how I drew some interesting insights from the Classifieds. After that I will give you a quick bio. And then we’ll get going.


IMDB Simple Scrape

In this lesson we will scrape a single rating on the IMDB website. Though simple, this lesson will set the ground work for future lessons and give you a quick understanding of the code. You will want to stay for this one because it really sets the foundation for the future chapters.


SuperBowl Data

It is football season after all - and to some that’s the only season, though I’m a baseball fan. I digress. Let’s scrape historical data on Superbowl winners in Wikipedia in this lesson. After we do this you will know how to scrape all wiki pages. There is a trick to it. I’ll show you. Hint: We don’t use CSS Selectors.


American Ninja Warrior Parsing URLs

This is a fun show. American Ninja Warrior. Stephy Graph. She’s awesome. Anyhow.
In this section and the remaining chapters we’re going to parse the data for a few seasons of the TV show.
Our scraping is getting more and more complex. By the way, web scraping doesn’t have a one size fits all method. But I’m exposing you to different methods so this is good and it will serve you well going forward. It’s like adding tools to a tool belt.


American Ninja Warrior (CSS Selectors)

In this lesson we’re going to pull the Selectors for the data we want to extract. It’s a must to know how to do this using the CSS Selector Gadget by Chrome. I will go over the gadget, how to find the css code and where to put it in the code.


American Ninja Warrior (Rvest)

In this lesson we’re going to set up our code using Rvest. Once you learn how to set this code chunk up your scraping days will be much easier in the future. I’ll also give you a little tidbit on Messier a famous astonomer as we’re typing it out.


Amrican Ninja Warrior (Function)

This is advanced as I wrap our rvest code into a function. If you don’t know about functions it’s ok. Just follow me a long and take a course on functions later. If you do now about functions better yet. This is a time saver. Especially for scripts that you plan on running more than once. Automation baby!


Final Thoughts

Final Thoughts on the Session and your future scraping projects. Thank you for taking my course. If you liked this course please email me or better yet take another one. You did good on this course. It’s a nice skill set. Go out there and pull down some data.

Watch Preview

Preview the instructions that you will follow along in a hands-on session in your browser.

Chris Shockley

About the Host (Chris Shockley)


I am an R enthusiast, hiker, and amateur astronomer. My favorite hike is located in Mt. Rainier National Park, my favorite Deep Sky Object is Alberio, and my favorite R package is dplyr (since I use it everyday). I have a dog named Coog (Lllasa Apso), and I love talking about Coog (as you'll find out soon). I work as a Data Analyst/Financial Analyst for a Metals Co. located in Seattle, WA. I have been in my current position for 4 years. I work primarily in R for Analytics and R Shiny for Deployment. I have built numerous apps over the past few years. My hope is that I can help you in learning R. Yes, the Rhyme Interface will help you. But. You also must take what you learn and practice, practice, practice. So... Let's get after it. See you soon.



Frequently Asked Questions


In Rhyme, all projects are completely hands-on. You don't just passively watch someone else. You use the software directly while following the host's (Chris Shockley) instructions. Using the software is the only way to achieve mastery. With the "Live Guide" option, you can ask for help and get immediate response.
Nothing! Just join through your web browser. Your host (Chris Shockley) has already installed all required software and configured all data.
You can go to https://rhyme.com/for-companies, sign up for free, and follow this visual guide How to use Rhyme to create your own sessions. If you have custom needs or company-specific environment, please email us at help@rhyme.com
Absolutely. We offer Rhyme for workgroups as well larger departments and companies. Universities, academies, and bootcamps can also buy Rhyme for their settings. You can select sessions and trainings that are mission critical for you and, as well, author your own that reflect your own needs and tech environments. Please email us at help@rhyme.com
Please email us at help@rhyme.com and we'll respond to you within one business day.

Ready to join this 36 minutes session?