Data from the South, AI in the North: An Uneven Distribution of Value
- Natasha Karanja |
- March 27, 2025 |
- Artificial Intelligence
Introduction
Artificial intelligence (AI) can be viewed as both a global commodity and an emerging supply chain, comprising a variety of interdependent components for its functionality.1 Akin to traditional goods such as clothing or footwear, AI systems necessitate natural resources, labour, infrastructure, capital, and logistics for their production.2 However, the activities surrounding AI significantly diverge from conventional manufacturing value chains.3 This divergence stems from the technological complexities and the unique nature of digital inputs that AI demands.4 Comprehending the foundations of these production practices is crucial for understanding the multifaceted nature of AI’s supply chain.
Microsoft, Amazon, Alphabet, and Meta, key players in the tech MNE landscape, are deeply involved in Large Language Model (LLM) development.5 The output of LLMs such as ChatGPT and Gemini perform tasks that involve analytical and evaluative skills. This is dependent on human feedback loops, which involve supervised fine-tuning and reinforcement learning with human feedback.6 These processes are focused on digital tasks such as image tagging, data annotation, content moderation and audio transcriptions. 7 These tasks involve human labour outsourced by external entities, frequently labelled as data training contractors or data processing companies.8 Human labourers are “AI data workers” that work behind the scenes of AI development, where the performance of these digital tasks power the AI products9
Observing the geographical mapping, we see most data requesters of the AI Labour are located within advanced economies in the Global North (home to many of the tech MNEs) whilst the AI data workers are situated within the Global South.10 This is evident with examples such as OpenAI’s collaboration with Sama, a data training firm based in Kenya, for the creation of ChatGPT. 11 Building on the observation of geographical disparities in the AI labour force, it’s crucial to acknowledge that these patterns of outsourcing and data contracting often lead to the exploitation of workers, effectively commodifying their labour. 12 This dynamic, where data requesters in advanced economies benefit from the labour of AI data workers in the Global South, highlights a concerning trend in the AI supply chain.13 Examples of this include; reports that highlight difficult working conditions at AI data processing companies such as Sama.14These reports allege that Kenyan data annotators and content moderators were paid substandard wages, unpaid for overtime work, experienced significant trauma and were “victimised” for attempting to form a trade union.15
This blog serves as a foundational basis for a research study CIPIT is undertaking in assessing how adaptable and effective national and international policies are in terms of regulating working conditions and ensuring ethical labour practices within the AI data supply chain, particularly in the Global South. Thus, the blog is an introductory and summarised piece as to understanding the AI supply chain landscape with a focus on the emergence of digital platforms and labour commodification within the Global South
The AI Supply Chain: A New Global Commodity
A robust and effective supply chain, incorporating research, data collection, algorithm creation, rigorous testing, and scalable infrastructure, is essential to the knowledge-driven production of AI solutions prior to their deployment. 16 Assessing the supply chain, AI is not solely reliant on humans to assist with designing and implementing highly advanced algorithms (qualified AI software engineers) but also relies on humans at a foundational level( “AI data workers/ micro workers” )to “produce, enrich and curate” data.17 These workers’ tasks and roles are diverse, as they include but are not limited to: data annotators, content moderators, and image taggers. 18Their tasks contribute significantly to the AI production pipeline, where they facilitate the creation of high-quality training datasets critical for the performance of AI models.19 Tasks such as spotting unsuitable online content, categorizing images, transcribing text, or translating segments of text.20These tasks produce raw data that serves as a foundational basis for AI technologies, as they are representative of datasets that are held to be extensive and capture the complexities of real-life world scenarios. 21 For instance, when training AI models for image recognition or natural language processing, the data utilized must reflect the richness of the human experience, thus requiring continuous input from a multitude of AI data workers. 22 Additionally, tasks such as data annotation provide for the labelling of large volumes of data that allow machine learning algorithms to learn from ‘human-like experiences’ that drive the operational capabilities of the AI systems. 23 The human-driven component of AI training is the backbone of the development process, where the nuanced understanding and judgment that AI data workers provide ensure AI systems are capable of performing complex interactions and analysis that reflect human cognition.24Therefore, their data contributions directly affect the outcomes of the AI applications, as poorly annotated data can lead to flawed AI outputs.
Geographical Disparities: A North-South Divide in the AI Labor Force
When assessing the capacities of AI companies to employ AI data workers, we see that there is a gap in resources and expertise.25 The nature of AI data work necessitates large-scale operations, as it is highly labour-intensive and costly.26 Bearing this, the majority of AI companies outsource AI data work through channels such as business process outsourcing (BPOs) and digital labour platforms.27 This involves the use of third-party entities, also known as data training contractors or data processing companies, to handle and manage the AI data work of AI development.28 AI data work is undervalued within the AI pipeline process; this is mirrored in the labour process, as AI data workers’ employment relations and working conditions are below par. 29 Working conditions for AI data workers are particularly alarming, with statistics revealing their low wages and inadequate protections.30 Reports indicate that many microworkers, who play crucial roles in data annotation and training AI systems, earn between $1 to $3 per hour, far below living wage standards in their respective countries.31 For instance, a research study highlights that microworkers on digital platforms receive minimal compensation for extensive hours of labour, which greatly undermines their economic security.32 Furthermore, a significant proportion of these workers are not afforded basic benefits such as health insurance, job security, or opportunities for advancement, leaving them in a perpetual state of vulnerability.33 The geographical mapping clarifies the labour dynamics further, as most AI data workers are situated in the Global South while the BPOs and digital platforms are in the Global North. 34 This geographical disparity mimics power dynamics that reproduce historical inequalities.35 This provides a conducive environment for the labour structures to exploit disparities based on corporate interests and geographical influence, fostering a digital economy rife with inequalities that require urgent ethical and policy interventions to dismantle.
Conclusion
The following blog has summarised insights into AI data workers’ role and their role within the AI development pipeline. We see the AI industry’s reliance on precarious forms of employment involving complex corporate structures, with AI companies outsourcing their data work to third-party entities. This allows for obscurity of responsibilities in relation to the worker’s rights that are held to be vital for AI development. Appreciating this gap, the upcoming research study will go in depth and explore the working conditions and wage disparities that characterize this field as well as the ethical implications of sustaining such a labour structure and lastly how policy can allow for collaborative efforts to ensure fair labour practises that prioritize the well-being of AI data workers and equitable recognition for their contributions to AI development.
1 Crawford K , Atlas of AI : Power, Politics and the Planetary Costs of Artificial Intelligence ( Yale University Press, 2021) 8; “AI is both embodied and material”
2Larsson A & Hatzigeorgiou A ; [Forthcoming] The Future of Labour: How Disruption, Technology and Practice will Change the Way We Work. ( 1st edn Routledge, 2025 ; Anwar, M A , Value Chains of AI: Data training firms, platforms, and workers,4.
3 ibid
4 ibid.
5 ibid..5.
6 Liu H, Sferrazza C & Abbeel P, Chain of Hindsight Aligns Language Models with Feedback [2023] arXiv: 2302.02676v8 [cs.lg].
7 Berg J, Furner M, Harmon E , Rami U & Silberman S M, Digital Labour Platforms and the future of work ; Towards decent work in the online world [2018] International Labour Organisation https://www.ilo.org/sites/default/files/wcmsp5/groups/public/%40dgreports/%40dcomm/%40publ/documents/publication/wcms_645337.pdf > 7.
8 Muldoon J, Cant C, Graham M & Spilda U F, The Poverty of Ethical AI : Impact Sourcing and AI Supply Chains [2023] AI& SOCIETY.
9 ibid.
10 Posada J, The Coloniality of Data Work: Power and Inequality in Outsourced Data Production for Machine Learning (DPhil Thesis, University of Toronto 2022) 65.
11 Perrigo B, Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic [2023] <https://time.com/6247678/openai-chatgpt-kenya-workers> last accessed 15th March 2025
12 Posada (n10).
13 ibid.
14 Perrigo (n11) ; OpenAI had outsourced data annotation to workers in Kenya, who were paid less than $2 per hour to label content that included hate speech and images of violence and sexual abuse … GPAI 2023. Fairwork AI Ratings: The Workers Behind AI at Sama[2023]Global Partnership on AI. Oxford: United Kingdom,12 workers noted that they worked as much as 60 hours or more per week, despite Kenyan Labour Law limiting the working week to a maximum of 58 hours.
15 ibid.
16 Castillo D P A, Artificial Intelligence, labour and society; Casilli A A, End-to-end ethical AI. Taking into account the social and natural environment of automation (EPUI, Brussels 2024)85.
17 Tubaro P, Casilli A A & Coville M, The trainer, the verifier, the imitator: Three ways in which human platform workers support artificial intelligence [2020]Big Data & Society, 7(1).
18 Miceli M & Posada J, The Data Production Dispositf [2022] Proc. ACM Hum. -Comput. Interact., Vol. 6, No. CSCW2, Article 460, 4 ; data work outsourced through crowdsourcing platforms and specialized business process outsourcing (BPO) companies.
19 ibid; Data work is the human labour necessary for data production, it involves the collection, curation, classification, labelling, and verification of data.
20 Anwar A M & Graham M, The Digital Continent: Placing Africa in Planetary Networks of Work; Digital Taylorism, Freedom, Flexibility Precarity and Vulnerability (Oxford Academic 2022)109.
21 Tubaro (n17).
22 Liu (n6).
23 Wang D, Prabhat S & Sambasivan N, Whose AI Dream? In search of the aspiration in data annotation [2022] CHI ’22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems Article No 582,2.
24 Muldoon J & Cant C, Wu B & Graham M. A typology of artificial intelligence data work [2024] Big Data & Society, 11(1) 4.
25 ibid..1.
26 ibid.
27 ibid.
28 Posada(n12).
29 ibid.
30 Ngene G, Blanket condemnation of Africa’s microwork industry could hurt it more than grow it : unpacking a recent critique [2022] Jobtecch Alliance <https://jobtechalliance.com/blanket-condemnation-of-africas-microwork-industry-could-hurt-it-more-than-grow-it-unpacking-a-recent-critique/> last accessed 18th March 2025; While workers in Kenya earn as low as $1.46 per hour, their counterparts in North America or Europe would be earning over $10 per hour.
31 ibid.
32 Berg (n7).
33 ibid..18.
34 Castillo (n16)87.
35 ibid.