Brick Walls

14 Mar

2/16/22

Welcome back to the odyssey! I finally have an update on the project, unfortunately it wasn’t what I was hoping for.

The Problem: Twitter API

In the past two weeks, I have constantly been hitting brick walls trying to scrape the Covid tweets. From the very beginning, using the twitter API was a struggle. Immediately after I created my account, I received the following email:

I was confused about what I had done to deserve a suspension and appealed it through Twitter’s support system. Four days later my account was reinstated. Apparently, Twitter’s automated system had accidentally flagged my project as an abusive API:

The next problem wasn’t far around the corner. As I mentioned in my previous posts, the standard access to the Twitter API is very constraining and doesn’t have the functionality I need to scrape tweets for this project. Twitter does have an application process that allows you to request access to either the Elevated or Academic Research version of the API. I applied first for the Elevated access, which was given to me only after I explained several times what my project was and what my intentions with the API were. However, I quickly realized that the elevated API adds barely any functionality and the exhaustive search endpoint that I need is only accessible through the Academic Research access level. I once again went through the application process to try and get access to the Academic Research API. This time though, I was rejected and no amount of follow up emails were able to change Twitter’s mind. Turns out that the Academic Research API is not available to undergraduate students who aren’t officially affiliated with a research department.

The Solution: Snsrape

I reached the conclusion that I wouldn’t be able to use Twitter’s API, leaving me in a bit of a pickle because I wasn’t sure how else I would be able to get tweets. I have discussed some possible alternatives to Twitter (e.g. Facebook) in one of my earlier posts, but I didn’t want to give up so easily. After doing some research I stumbled across open source web scrapers, which provide source code that allows you to extract data from web pages. I landed on snscrape, which is an open source scraper that specializes solely in social network services. The scraper provides a module specifically for Twitter which allows me to input a search query and extract all the tweets meeting that criteria. This is the same functionality that I would have gotten with Twitter API’s exhaustive search endpoint.

I think this experience highlights an important part of getting involved with AI. These projects are very complex and there will always be issues. If you treat these setbacks as crises that can only set you back, you will not be successful. If you instead treat each setback as an opportunity to find a better way forward, you will find that the problems you run into are a blessing and not a curse. For me specifically, using snscrape will now allow me scrape more tweets than I would have been able to with Twitter’s API while maintaining the exact same functionality. In fact, I have already run a succesful test scrape, so stay tuned for the results soon!

Benjamin Pusch

Brick Walls

The Problem: Twitter API

The Solution: Snsrape

Servers and Remote Control

Ethics and AI