Albert Chan’s Expanded Definition of Machine Learning

Introduction

The purpose of this 750-1000-Word Expanded Definition is to explore the definition of the term “machine learning” with regards to the scientific community and society. I will be analyzing the term in a study on fairness, education, and machine translation. My working definition will be provided afterwards.

Definitions

In the article “A Snapshot of the Frontiers of Fairness in Machine Learning” by Alexandra Chouldechova and Aaron Roth, the definition of machine learning is straightforward. “Machine learning is no longer just the engine behind ad placements and spam filters; it is now used to filter loan applicants, deploy police officers, and inform bail and parole decisions, among other things.” (Chouldechova, Roth, 2020, p.82). To Chouldechova and Roth, machine learning is a process that has evolved to automate more complex data.

On the other hand, the New York Times article “The Machines Are Learning, and So Are the Students” by Craig S. Smith defines the term differently. “Machine-learning-powered systems not only track students’ progress, spot weaknesses and deliver content according to their needs, but will soon incorporate humanlike interfaces that students will be able to converse with as they would a teacher.”(Smith, 2019). According to this, it can be seen that Smith defines machine learning as a means to an end, and this end being helping students learn better.

As for the article “On the features of translationese” by Vered Volansky, Noam Ordan, and Shuly Wintner, machine learning is a simple one. “In supervised machine-learning, a classifier is trained on labeled examples the classification of which is known a priori. The current task is a binary one, namely there are only two classes: O and T.”(Volansky, Ordan, Wintner, 2015, p. 103). To Volansky et al., machine learning is an assisting tool to help create more humanlike translation and must be supervised in order to function correctly.

Context

The context of all three articles is quite simple. The quotes I have used above are the most relevant to the topic of choice, as well as definition since machine learning wasn’t clearly defined in each article. So I will bounce off of that.

In the first article, the context used is a scholarly article searching into how machine learning can be made “fair”, or better put, “objective”. “With a few exceptions, the vast majority of work to date on fairness in machine learning has focused on the task of batch classification.”(Chouldechova, Roth, 2020, p.84). For better or for worse, the quote tells us that fairness has typically through batch classification. Batch classification, in this context, is sorting data by inputting user-defined characteristics and then judged through user-defined fairness. Machine learning is just the process of automating this process and even “learning” how to do it with other types of data. But the fallacy of fairness with such a method is laughable since humans are the ones defining fairness. Since humans have inherent bias, fairness is difficult to judge.

In the second article, it is a news article speaking about technology in education, specifically, machine learning and how beneficial it is for teachers. “The system also gathers data over time that allows teachers to see where a class is having trouble or compare one class’s performance with another.”(Smith, 2019). For teachers, this system is a way to track a student’s progress or performance without having to personally analyze the sheet data.

For the last article, the context is the machine translation. What should come to mind when hearing the term machine translation should be famous web browser-based translation services such as Google Translate, Niutrans, Sougou, and DeepL. That’s about it.

Working Definition

Personally, I am majoring in Computer Systems: IT Operations track. However, I have a hobby in translation with the assistance of machine translation. So, my working definition for machine learning is “the application of gathering vast amounts of data, categorizing the data, sorting them out, and analyzing data to find out the psyche of people.” For example, if given a group of 100, the data collected must be categorized by their gender or whatever category is set. Then, the answers gathered will be sorted by correct/incorrect based on the generally accepted answer. Finally, the data is analyzed so that there are percentages of what questions were answered correctly most of the time based on the sorted category. With that, the machine has a sample of what to expect if someone of x category answers the same data collection set. Done on a macro-scale, the machine will be able to predict what a population’s answer could be.

References

Smith, C. S. (2019, Dec. 18). The Machines Are Learning, and So Are the Students. New York Times. https://www.nytimes.com/2019/12/18/education/artificial-intelligence-tutors-teachers.html

Volansky V., Ordan N., Wintner S. (2015). On the features of translationese. Digital Scholarship in the Humanities, 30(1), 98–118. https://doi.org/10.1093/llc/fqt031

Chouldechova, A., Roth, A. (2020). A Snapshot of the Frontiers of Fairness in Machine Learning: A group of industry, academic, and government experts convene in Philadelphia to explore the roots of algorithmic bias. Communications of the ACM, 63(5), 82–89. https://doi.org/10.1145/3376898

Summary of Ling, Balci et al.’s “A First Look at Zoombombing”

TO: Professor Ellis
FROM: Albert Chan
DATE: Sept. 22, 2020
SUBJECT: 500-Word Summary

The purpose of this 500-Word Summary is to condense the contents of “A First Look at Zoombombing”, by Ling C., Balci U. et al., with the purpose of the article analyzing why and how zoombombing (henceforth known as zbing) occurs, then suggesting a simple solution to the issue of zbing.

It starts out by identifying various virtual conferencing tools before mentioning the recent series of attacks of zbing. Then, there is a discussion of best practices to prevent zbing but not enough insider information on how the attacks are done (e.g. whether it is via brute force, insider, etc). There is also a cursory introduction to a later analysis on 2 social media platforms (Twitter, 4chan) and research on how to identify which postings of meeting credentials are “asking” attackers to zoombomb(henceforth known as zb) a meeting room. Research shows that most (above 50%) postings on both social media platforms are indeed “asking” attackers to zb their meeting room. Something to note is that nothing in the article is censored because everything is available online.

According to the Ling C., Balci U. et al, zbing is “composed of four phases…empirical evidence reported by previous research that studied coordinated online aggression, trolling, and harassment on other social media platforms”(p. 2). The four steps of the threat model are as follows: Call for attack, Coordination, Delivery, Harm. It is quite self-explanatory.

Later on, there is an identification of the top 10 most used online conferencing tools. There is a chart of data on these tools (e.g. free or not, how much to upgrade, year of release). Zoom was established in 2011, but has risen to prominence and gained infamy during the pandemic, thus coining the term zbing. Eight of the ten popular online meeting services are free to use. All services have a “you know meeting ID, you know the way in.” Less than half of the services provide security.

Twitter and 4chan are selected as social media platforms to analyze data(e.g. creating an API to collect posts [Twitter]), live threads with meeting ID on Zoom (4chan) or posts with meeting ID (Twitter). 

An introduction on how researchers separated zbing posts from non-zbing posts by organizing a codebook. Most likely still some false positives and false negatives in the end. On 4chan, Zoom and Google Meet have ~50% accuracy of zbing; ~50% of the posted links and messages are people asking to be zb-ed. On Twitter, much less % of people ask for attackers. It should be noted that the majority if not all Google Hangouts and Skype links are posted with good intentions. Identification of each post asking to be attacked, time, insider/not insider, others via codebook. Identification as well as separation of terms, themes, identity, contact. 

The solution to zbing is creating unique meeting links for each participant.

References

Ling, C., Balci, U., Blackburn, J., Stringhini, G. (2020). A First Look at Zoombombing. Computers and Society, 1(1), 1-14. https://arxiv.org/pdf/2009.03822.pdf