API Setup and Tweepy
1/20/22
Welcome back to the Odyssey! Today I wanted to make a bit more of an informational post to talk about using the Twitter API to scrape tweets through Python.
Twitter API
Social media platforms know how useful their platforms are for researchers and offer APIs that allow you to interact with their app. Twitter in particular is very widely used by researchers due to the fact that almost every tweet is public and their search functionality allows for very specific queries. This makes knowing how to scrape tweets a pretty useful skill to have. In order to get started you have to go to Twitter’s developer portal and create a new app. After you finish the form to create the app, you will receive a pair of consumer keys and a pair of access keys. These keys are necessary to use the API later in Python, so write them down somewhere.
The Twitter API allows you to access Twitter through various endpoints. Which are essentially functions that let you read information off Twitter or post onto Twitter. Some of the most useful endpoints are the search and timeline endpoints, which respectively return the tweets from a certain query and return the tweets from a specified user. I will primarily be using the search endpoint, but the timeline endpoint is a handy alternative in case I just want to scrape replies to posts by news agencies. You can check out the other endpoints here.
Tweepy
Tweepy is library that allows you to access the Twitter API through Python. Using it is pretty straight forward and if you have any experience with Python you shouldn’t really run into any problems. You can use pip to install — ‘pip install tweepy’.
That’s about it - its pretty straight forward to scrape tweets from Twitter using the API. That being said, I actually can’t scrape any tweets currently since I only have standard access. Twitter offers 3 levels of access (standard, elevated, and academic researcher) and in order to use the exhaustive search endpoint, you need the academic research level of access. Additionally, the standard access level only allows each Twitter App to pull at most 500,000 tweets per month at a rate of 300 requests per 15 minutes. This means that it will take me 41 hours to scrape 500,000 tweets, which won’t be enough for all of my models. Luckily, you can apply for the higher access levels, so hopefully by the time I post my next update, I’ll have the academic research access.