Happy Holidays!

1/1/22


Welcome back to the Odyssey! Today I wanted talk about the specific project I have decided to work on, how I got to that decision, and the beauty of AI’s interdisciplinary nature.


I hope you’re enjoying your winter holidays, I had a fantastic time celebrating Christmas with my extended family here in Germany. It’s the first time that I have been able to see them since 2019 (thanks Covid), which made the celebration even more special. Covid will actually play a significant role in today’s post because a particularly fiery discussion at dinner a couple of days ago is one of the reasons why I am changing my original plan for this project.

Originally, I wanted to expand on KhudaBukhsh’s paper by researching how the growth of Telegram, an encrypted social media platform similar to WhatsApp, has lead to increased polarization in Brazil ahead of their elections this coming October. While I still want to work on this project, after looking into it I realized that I would need a stepping stone to get there. My lack of familiarity with Brazilian Portuguese and Brazilian culture coupled with my lack of experience regarding machine translation models makes it a difficult project to start with. I plan on starting the Brazilian Election project this summer, so stay tuned!

As an alternative, I will be researching the community perception of contentious issues in Germany. I will be using another one of Ashiqur R. KhudaBukhsh’s papers, namely Mining Insights from Large-scale Corpora Using Fine-tuned Language Models. The overarching focus is the same: to use AI, in this case language models, as a scalpel with which to study social behavior and societal issues. Specifically, I will be using Google’s BERT language model to track how Covid has impacted the perception of vaccines in Germany as well as Germany’s sexist differences in the perception of politicians. Since I am culturally German and fluent in the language, this project provides a better entrance into research than my previous idea. I will go into more depth about the paper and the specific project in my next blog. Stay tuned!

You might be wondering why I am focusing on a different language at all as I could just as easily have done a similar project in English, where I would have had the added benefit of being able to access a lot more resources. There are two main reasons why I chose Germany:

The first reason is personal. While I am culturally German, I have never had the chance to intertwine my culture and heritage with my academic pursuits. Despite how important being German is to me, I rarely ever engage with German issues as most, if not all, of my information about these problems comes through conversations with my parents. The other night, I was embarrassed by my lack of knowledge about German issues when an argument broke out about Covid and Annalena Baerbock, a German politician, and I had no points to argue.

I knew then that my research was the perfect opportunity to study these issues, allowing me to further my passion whilst exploring a cornerstone of my identity. This interdisciplinary nature is something that I honestly find so incredibly beautiful about AI because it allows you to simultaneously pursue diverse interests. No matter what your passion is, I guarantee you that there is some way or another AI can be intertwined with that passion. Also, this flexibility allows you to work with people who might have completely different interests, making it one of the most collaborative and fascinating areas to work in.

The second reason leans more to the ethical side. I want to focus on a language other than English due to the linguistic inequality in Natural Language Processing. I have discussed the ethical challenges that the AI field faces before, but one of the problems I haven’t touched on yet is the lack of diversity in the development of AI systems. Specifically, I am going to focus on the fact that research done in AI is concentrated in rich countries (North America, Western Europe, and East Asia).

While this can be said for the academic research in any field, the impact a concentration of development in machine learning and AI has is especially devastating due to the field’s data-centric approach. The massive amounts of data these models require, tend to be at least somewhat unique to the country they were created in, making it infeasible for less advanced countries to implement these models due to cultural, societal, and physical differences.

Take for example autonomous driving, where in recent years Tesla has outperformed competitors due to the vast amount of data they have available to them. However, if a company in another country, say Columbia, wants to train their own autonomous vehicles, Tesla’s data would not be sufficient and might lead to dangerous errors due to the differences in the appearance of roads, street signs, and the natural environment.

It can easily be argued that this problem is the most severe in Natural Language Processing and Language Modeling. Stark linguistic differences make it difficult to create standardized benchmarks across languages, preventing researchers from accurately evaluating how model architectures trained in different languages compare to each other. This causes researchers to focus on the language that is the most widely studied (English), incentivizing the majority of datasets and future benchmarks to be created in that language and creating unequal access to a technology that has immense potential to improve our society. In case you’re curious, this journal article discusses the issue in more depth.

Tune in for the next post where I will discuss the Covid and sexist perceptions of politicians in German in more depth!

Previous
Previous

What’s going on in Germany?

Next
Next

Left to Right and Right to Left