By James Vincent
A new report from Reuters reveals that contract workers are looking at private posts on Facebook and Instagram in order to label them for AI systems. Like many tech companies, Facebook uses machine learning and AI to sort content on its platforms. But in order to do this, the software needs to be trained to identify different types of content. To train these algorithms they have to analyze sample data, all of which needs to be categorized and labeled by humans a process known as “data annotation.”
Reuters’ report focuses on Indian outsourcing firm WiPro, which has employed up to 260 workers to annotate posts according to five categories. These include the content of the post (is it a selfie, for example, or a picture of food); the occasion (is it for a birthday or a wedding); and the author’s intent (are they making a joke, trying to inspire others, or organizing a party).
Employees at WiPro have to sort a range of content from Facebook and Instagram, including status updates, videos, photos, shared links, and Stories. Each piece of content is checked by two workers for accuracy and workers annotate roughly 700 items each day. Facebook confirmed to Reuters that the content being examined by WiPro’s workers includes private posts shared to a select numbers of friends, and that the data sometimes includes users’ names and other sensitive information. Facebook says it has 200 such content-labeling projects worldwide, employing thousands of people in total. “It’s a core part of what you need,” Facebook’s Nipun Mathur, director of product management for AI, told Reuters. “I don’t see the need going away.” Such data annotation projects are key to developing AI, and have become a little like call center work outsourced to countries where human labor is cheaper.
In China, for example, huge offices of people label images from self-driving cars in order to train them how to identify cyclists and pedestrians. Most internet users have performed this sort of work without even knowing. Google’s CAPTCHA system, which asks you to identify objects in pictures to “prove” you’re human, is used to digitize info and train AI. This sort of work is necessary, but troubling when the data in question is private. Recent investigations have highlighted how teams of workers label sensitive information collected by Amazon Echo devices and Ring security cameras. When you talk to Alexa, you don’t imagine someone else will listen to your conversation, but that’s exactly what can happen. The issue is even more troubling when the work is outsourced to companies that might have lower standards of security and privacy than big tech firms. Facebook says its legal and privacy teams approve all data-labeling efforts, and the company told Reuters that it recently introduced an auditing system “to ensure that privacy expectations are being followed and parameters in place are working as expected.”
However, the company could still be infringing the European Union’s recent GDPR regulations, which set strict limits on how companies can collect and use personal data. Facebook says the data labeled by human workers is used to train a number of machine learning systems. These include recommending content in the company’s Marketplace shopping feature; describing photos and videos for visually-impaired users; and sorting posts so certain adverts don’t appear alongside political or adult content