Data workers and AI: moving from 'invisibility' to better working conditions

1 Apr 2025 - Who are the data workers behind AI, what are their working conditions, and how can their working conditions be improved? In a webinar hosted by WageIndicator, the lens was turned on the workers who make AI work, and who still lack a fair deal - and the attention they need.

AI data workers paid in beans, asked to review surgery videos with no medical experience, or forced to listen to snippets of conversations on Alexa over and over again.

These are just a few examples of what working for AI looks like. Far from being truly 'artificial', AI relies on the work of a hidden global workforce, annotating data, labelling images and training systems to work perfectly.

When it comes to the labour rights of AI data workers, there are a number of issues that remain unaddressed and prevent them from being visible. Who are these workers? What are they looking for when they apply for AI jobs? How do they contribute to the AI supply chain? What can be done to improve their working conditions?

To answer these questions, WageIndicator hosted the online event  The Ghost Workers: Do You Know Who's Behind Your AI? from which you can find some takeaways below. Martijn Arets, independent expert in the gig economy and member of the WageIndicator gig team, organised and moderated the event.

 

Antonio Casilli: A world map of data work

When data workers are based in Madascagar, and the AI company is in Europe 

Where are AI data workers most commonly located, and how is the global AI supply chain distributed across the globe?

To set the scene, Antonio Casilli, author of 'Waiting for Robots' and co-founder of DiPLab, has been researching data workers from a global perspective for a long time to get a fuller picture of the supply chain of AI and what data workers do.

Let's take smart speakers like Alexa and Cortana as an example: "During our interviews, we heard the story of a data worker who was paid very little to listen to snippets of conversations on Cortana and check that Cortana had transcribed them correctly. This is just one example of how simple, but also repetitive, these activities can be for data workers." At the same time, such activities can be very challenging, as the worker receives a lot of stimulus in a short period of time.

And as Castilli explained, "It gets more complicated when we look at the supply chain. We may have a company based in the US that subcontracts the recruitment of these data workers to a Chinese platform. The Chinese platform subcontracts to a Japanese platform, and the Japanese platform subcontracts to a Spanish platform, which ends up recruiting people in France. The people in France produce the data and they send it back to the US.”

The geographical distribution of the AI supply chain creates many complex situations to manage.

“Another case study we looked at was from a company that claims to make smart surveillance cameras that are used in European supermarkets. These cameras are very efficient because they can not only detect people, but also interpret their behaviour. What kind of complex algorithm would you say is behind this AI solution? Actually, a group of 120 people in Antananarivo, Madagascar, work day and night in a shared house that has been turned into a data factory. All young, well-educated workers who could have found better jobs in other situations, and who were aware of the fact that they were producing a kind of AI. They were AI.

Most AI companies are officially based in the Global North, but when it comes to where the data is produced, the vast majority is produced in countries such as Indonesia, the Philippines, India, Nepal and Bangladesh. “And that data circulates through the UK and eventually ends up in the US, where it’s used to create the AI solutions we all love. In the case of the Madagascan workers, the company that supposedly makes the smart cameras sends data from Europe to Madagascar, and the people in Madagascar annotate the data, or at least look at it, but there is no actual annotation. They are the algorithm. What’s worse, they are paid the equivalent of 100 euros in local currency every month. And some of them are paid in food: one kilogram of sugar, one kilogram of beans, five kilograms of rice, depending on their performance.”

This is one of the main fluxes that Casilli and his team, and other research groups, have been investigating, along with another remarkable internal flux in China, that links the coastal towns with the inland ones, where the GDP is lower and the data work is performed; the Latin American flux to the US and Europe; and then finally there is Africa.

From where they are to who they are: Data Labellers Association and Claartje ter Hoeven for a realistic portrait of data workers around the world

Who are data workers? Workers with very different needs, aspirations, and backgrounds. Far from the indistinguishable workforce with no specific characteristics that these workers appear to be due to their status as 'ghost workers' and invisibility.

To shed light on who data workers are and what kind of issues they face WageIndicator invited Joan Kinyua, President of the Data Labellers Association and a former data annotator, Ephantus Kanyugi, Vice President of the DLA, and Claartje ter Hoeven, professor at Utrecht University, who is working on the ERC project 'The Ghostworker's Well-Being: An Integrated Framework' on the conditions and well-being of AI data workers in Europe.

"This is not an emerging field," Joan Kinyua insisted. "I started working as a data labeler in 2017, and my job was basically to work on the datasets we received from Europe for data annotation. I can say for sure that I participated in the project that Antonio Casilli mentioned, because we watched videos of someone picking something from a supermarket, and then we had to label the same thing. How they put it in the cart or how they returned it, that was all up to us to annotate”.

When it was still an emerging field, working with data seemed like a great opportunity to many: "Data work is a very informal job, that doesn't require any form of education, and companies often claim to pay the same salary as a house manager or a nanny. I had a lot of faith in data labelling, I always believed that pay was going to improve, workers' rights were going to improve, but five years later things were only getting worse, that's why we started the Data Labellers Association".

The DLA continues its mission to advocate for fair treatment, mental health support, and dignified pay for data workers, who are recognised as critical contributors to the advancement of AI.

Awareness is key: "It is exploitation, nothing less, especially for those workers who don't know they're being exploited. We learned from our experience, gathered knowledge from other people in the field, and started the association.” But many other data workers are not fortunate enough to benefit from such awareness.

Ephantus Kanyugi, Vice President of the association, also brought his experience to the table: "I've worked on the same project as Joan, but also on projects to label corpses, pornography, and you have to take it all because there is no structured work, no fixed hours, no overtime pay. Workers often get used to it, but through conversations and interactions we started to understand that there was something that could be done about it through policy and advocacy”.

It’s all about word of mouth: “In the first week of the association's existence, without any active campaigns, we managed to attract around 400 members. It's a very closed community. So far, we are in the process of starting campaigns, but at the moment we have about 700 members”.

On their part, by interviewing more than 5,000 data workers in 27 EU countries on platforms such as Amazon Mechanical Turk, Micro Workers, Click Worker and Appen, Claartje ter Hoeven and her team were able to shed light on different categories or portraits of data workers.

“There's a large group of workers who go into this work for a while, see what it's like and then leave. That's the majority. For another group, working with data is a second job that they do while benefiting from the social security of their main job. Some of them try to make a living from this work, but it's a very small percentage, because in most of the research countries, data work is almost always below the poverty line. Finally, there are those who enjoy working with data and do it for fun.”

Some of the key findings of this research include how data workers feel about their jobs, as explained by Claartje ter Hoeven: “Some think that working alone at home, without colleagues or human bosses, is not so bad because they have already been through a lot in their lives and working with data makes things easier. At the same time, when we met some of them in Rotterdam to make a documentary, they actually enjoyed sitting together for two days and sharing their experiences, even the strangest ones, like sharing their body measurements or taking pictures of themselves in summer clothes”.

Claartje ter Hoeven (Utrecht University) and her research team reveal the hidden world of Europe's data or ghost workers. She spoke to Martijn Arets about her project for WageIndicator's The Gig Work Podcast. Listen to the episode Ghostwork: the invisible world of work behind AI.

 

Data workers also have to deal with the fact that "the law is usually very slow to catch up, and a lot of companies try to take advantage of that," Joan Kinyua added during the roundtable. Tasks are often becoming more complicated and workers are being given more detailed and 'micro' tasks than before.

"So the level of work before was just drag this, add that, draw a person, draw a tree, draw a house, that's it. But then it goes into detail. Is this person working? Is this person a child? Is this person an adult? Are they sniffing? If it is a car, is the indicator on the left or right? Is it on? Is it off? In every frame. From something you can do in two hours to something you need 20 hours to do".

Ephantus Kanyugi added: "We usually work 20 hours, 18 to 20 hours, six days a week, and we get paid like 10 or 20 dollars for the whole week. "And after all those hours, you can't even get paid if you don't pass the quality check," concluded Joan.

Sometimes projects are not what they look like when companies advertise ("They claimed to have a job offer for data annotation, but they asked for pictures of children doing cartwheels or like very queer positions"), or conditions change depending on the country where workers apply.

 

Claartje ter Hoeven shared a conversation with a worker “who was based in Africa, but somehow with a VPN he made it look like he was based in Europe. The tasks were the same, the difference was the pay”.

It's no secret that the amount of work required to get paid can be consistently different depending on where the workers are located: as Ephantus Kanyugi explained, "When people in Africa have access to the exact same work, which is the exact same thing, we find that you have to hit higher targets, so it's just that we don't have time off. Once I had to hit a 95% target to get paid, but that was not the same for someone overseas. And then someone overseas gets paid, let's say, $13 an hour, while I might only get, let's say, $20 after a whole week. That's fraud."

 

High levels of surveillance and poor mental health are other factors contributing to worsening working conditions.

As underlined by Joan Kinyua, “They usually like the monitoring applications installed in our cameras because they just want to make sure you haven't given your account to anyone.”

And a healthy mental space is usually lacking: “Most of us live in one-room houses, and when you're given a project with porn content, you might be forced to work on that with your children or your spouse around".

 

It’s (also) a question of data quality

Why might this affect the quality of AI? What are the liability implications for AI companies? 

Data work is not really an emerging sector. Rather than asking for new legislation, it might make sense to enforce the existing ones.

"Every worker deserves decent work," continued Claartje ter Hoeven. "Decent work should be provided on the basis of the International Labour Association's five different pillars of decent work. And I don't think this kind of work meets any of them. First and foremost for workers' rights, but also for the quality of data work, because we just know from organisational psychology and organisational studies that if people don't have decent working conditions, the quality of work will suffer.”

On the quality of data work: "As one of the chat participants said, there are people who have to annotate medical data without any medical knowledge. If we're talking about digitising records, I don't see that as a big problem. But if you had to annotate clips from doctors' surgeries, that could be problematic for the products that are delivered. And that could be a problem in terms of liability, because the data won't match the targets”.

Antonio Casilli added a reflection on the quality of data work and corporate responsibility: "The reason we don't see these workers slacking off on these tasks is that the platforms are organised in such a way that if the workers don't perform to the best of their abilities and skills, their accuracy rate drops and their tasks are denied, which means they are denied compensation. Data workers can now evaluate surgeries without any medical knowledge, because that's something that's now possible. Because AI training is based on large amounts of data, and companies want to have a normal distribution of answers to the tasks. If they have a normal distribution, they can focus on the average and exclude the rest. Which is a risk in itself.That quality is difficult for companies themselves to judge. The only thing they can do is train and retrain and retrain and retrain. So this cycle never stops. This work will never disappear if the situation stays like this.

 

It's not all new: the parallel with some 'traditional' sectors such as garments in Indonesia

As data work is not an emerging field, it shares some characteristics with other 'traditional' sectors where workers lack social security and fair work.

Lydia Hamid, Project Coordinator at Gajimu (WageIndicator's Indonesia team), gave the example of the garment sector in Indonesia, "a really fermented landscape, with lots of subcontractors, where workers don't know who is responsible for who and who to go to if they have a question or a complaint.”

Over the past six years, Lydia and her team have worked with union partners on a data academy and Makin Terang projects focusing on data transparency to improve working conditions in the sector.

What can we learn from other sectors? Are there any similarities?

"The garment sector is a very important part of our economy: it's the second largest contributor to our GDP. In 2023, we had around 19 million workers in the manufacturing sector, and of those, around 20% will be employed in the textile, clothing and footwear sector.

These findings present a number of challenges: "This industry in Indonesia consists of a mix of higher tier factories and also lower tier factories, many of which are located in rural areas. Indonesia has a different minimum wage for each region, so these lower tier factors tend to prioritise cost reduction and also high volume production. This includes hiring workers from rural areas to pay them lower wages, and they can also relocate their factories to take advantage of cheaper labour in other regions, similar to what AI companies from the Global North are doing. What's more, many of them operate as sub-contractors for the larger and higher-tier factories, and workers don't know where to go with complaints about their working conditions. They remain largely disconnected from the larger supply chain, and labour practices often go unchecked. Poor working conditions go hand in hand with a lack of compliance with labour standards, including safety measures and maternity protection.”

Lydia also mentioned the case of home-based workers: "They make a significant contribution to garment production but they often go unnoticed and they face similar challenges as data workers like underpayment and lack of job security. They may even involve their family, such as children, in the work because of the need to meet tight deadlines".

What are the goals of the project to bring improvements in this context? “We want to increase the transparency of the data and the supply chain by targeting more than 40 percent of factories in open API logistically or now known as open supply hubs. The project aims to ensure that government workers have fair access to quality information about their labour."

Martijn Arets adds a conclusion: “The case of the garment industry showed that there are lots of similarities between industries and that the dataworker sector is not a unique sector that needs to wait for new regulations: enforcing existing labour law would make a big difference. There are some differences to mention: in the garment industry unions were fragmented, but well organized within the sector, where in the datawork sector unions are not active. Which could be because in many countries unions are ‘politisized’ and follow the tech-optimist frame where the government is happy to welcome big tech companies to their countries, without asking questions on the quality of the job they provide.”

 

Solutions: Contracts to honour, but also corporate liability, collective action, emotional support, and new tools

Are there any solutions on the horizon? The speakers were asked to answer this question in the final part of the webinar, moderated by Fiona Dragstra.

"The first step is to make these workers aware of their rights," she said, opening the debate. "This is not a new area, and labour laws apply to people in these countries, especially for what we've discussed today: if you are a worker with a contract or even a different type of contract in Madagascar, the laws of Madagascar apply to you. So the labour laws should apply”.

Much of the problem is related to the variety of contracts that are available, even in the same countries. 

“The contracts that these workers have are very different," explains Antonio Casilli. "Sometimes we meet people who have a contract for one month. Sometimes, in the same country, in the same company, we meet people who are hired for two days. One day, five days. And sometimes we meet a lot of people who are paid by the task. This is probably the case that has been studied the most, because it started with Mechanical Turk, which was probably the first platform, that platforms work with the anything-goes philosophy, basically they will adopt any kind of contractual arrangement that is most advantageous to them.

So sometimes we can use the law, if we have contracts, to get the contracts honoured, sometimes we have to find more, let's say, innovative ways of tackling the problem, and sometimes we have to come up with new, in sociology we call it repertoires of contention, which are tools that people who are engaged in collective action have at their disposal.”

 

Corporate liability is another crucial aspect and very difficult to implement because, as one of the participants stressed, platforms have plenty of people available to replace those who leave. But they remain liable for these workers.

Antonio Casilli continued: “We need to make sure that AI companies are accountable for what happens all along the supply chain. What happens in Kenya is relevant to someone in Texas or California. How much or how little people are paid in Indonesia is relevant to the quality of the AI being produced somewhere in Amsterdam. We have some tools, particularly in Europe. We have the GDPR, which is about regulating private data. We have the Platform Work Directive, which was passed last year and can be applied to data workers. And then we have, I think most importantly, the Due Diligence Directive, which was also passed last year. And this is basically a directive that says that if you are a European company of a certain size and you want to outsource a certain business process, you are responsible for the human conditions and the working conditions and the environmental conditions of what happens all along the supply chain.”

 

Collective action is still the way forward for data workers to stay together, expose unfair treatment and get involved in policy and advocacy, as the Data Labellers Association proves. Joan Kinyua and Ephantus Kanyugi shared some of the key steps that have been taken: "We have been very fortunate to meet some key stakeholders in the industry who are putting us in touch with stakeholders in government. This will allow us to present our case to policy makers and hopefully enable regulations that will protect workers”.

The other aspect that supports data workers is community building. Practical solutions can include "offices where people can go and work in a community for a few days and share what they do. We're so used to working in our homes and being isolated, we've never really met each other, so that kind of interaction and bringing people together could actually help.”

Also in the audience was Krista Polovski, who shared her experience as a data worker, active in worker advocacy, and an organiser of Turkopticon, which was born in 2019 as a review website to give Amazon Mechanical Turk workers a space to share information about bad requesters and tasks. Now, it has evolved into a worker-led nonprofit that "advocates" for workers and their rights.

"We are part of a coalition that is working really hard on policy. “Many data workers are independent contractors, they don't have an office that they go to, they don't have health insurance, but they think they're lucky that they get even the little bit of work that they get. And so trying to get the workers to be motivated to want to fight for better treatment is sometimes a very big struggle”.

“We have managed to get some contacts within Amazon to encourage them to look at banned and suspended accounts and keep the lines of communication open.”

Last but not least, the mental health of these workers needs to be taken into account. Joan continued: “Due to the nature of work, where people work very long hours, sitting at a desk, without any interaction with other people, workers develop mental problems, leading to a lot of anxiety and depression. Also, imagine looking at pornography for 20 hours a day for almost a week...". Mental health is crucial and "we are planning workshops on this".

 

Key Takeaways

  1. Data workers' labour issues are often overlooked, but the global extension of the supply chain, with data workers in Madascagar working for European companies, and worsening working conditions, make this debate more necessary than ever, also given that this is a long-lived sector.
  2. As in other sectors, data workers have very different needs, aspirations, and backgrounds. Common issues include low pay, tight deadlines, high levels of surveillance and poor mental health due to isolation.
  3. As a long-lived sector, it's made up of workers who don't need to be treated differently, just because it's AI, but to see their working conditions and their pay improved with the support of legislation, advocacy, and aware clients and media.
  4. It's also time to start talking more and more about data work and pay. Data workers may have different contracts, but that means deeper conversations. WE should forget about pay just because they are 'atypical' workers.
  5. Community building can help raise awareness and make data workers feel less alone, but law enforcement or new tools to protect and support them should go hand in hand.

 

The next WageIndicator webinar in the series on the platform economy and the impact of AI on work is scheduled for the 31th of October, 2025: register for Bargaining with the Algorithm: The Future of Work and Collective Agreements,the online event to better understand the role of collective agreements in a world of AI.

 

Loading...