The Role of Copyright Law in Text and Data Mining Research

The Role of Copyright Law in Text and Data Mining Research


When evaluating the copyright ecosystem within the African continent, a perceived imbalance is identified, where researchers face barriers in accessing materials protected by copyright.1 It has been claimed that African copyright regimes do not accommodate public interests because they do not support the research of various stakeholders, specifically within the new era of Artificial Intelligence research. For this reason, and according to one African scholar, African copyright regimes are considered “not fit for purpose”.2 The majority of the African copyright laws provide for a closed list of copyright exceptions, with none applying specifically to Text and Data Mining research.3 This article will define what Text and Data Mining (TDM) research is, its benefits, and outline the role of copyright in TDM research within the Kenyan context vis a vis other countries’ perspectives.

Defining Text and Data Mining (TDM) Research

Text and Data Mining Research involves the process of “automated processing (machine reading) of large volumes of text and data to uncover new knowledge or insights”.4 It usually necessitates the need for copying large volumes of material, extracting relevant data and recombining these processes to identify patterns.5 It is worth noting that research practises utilised for TDM are “constantly evolving” due to the widespread access of massive networked computing power and the ever-increasing presence of digital data sets.6 The potential of TDM is held to be promising, as its application cuts across several heterogeneous fields.7

The starting point of the TDM process is accessing information to be mined, which is inclusive of “published or unpublished articles, books, pictures, web pages, data sets, etc.”8 The information is then copied, this involves copying the information into a “corpus” that would be necessary for the research.9 The information is then mined using analytical processing that utilises computers and algorithms.10 Lastly, the results of the mining process are distributed; this involves the disclosure of all or part of the information used.11

The Uses and Benefits of TDM

TDM is utilised both within commercial and non-commercial contexts. TDM projects, such as Blue Dot, led to the discovery of the Coronavirus outbreak and the advancement of vaccine research.12 Examples of non-commercial settings include the biomedical field, where TDM has “increased efficiencies and the speed of biomedical discovery”13 Another example is the use of TDM within criminology to assist law enforcers to explore and detect crimes and the correlation it has with criminals.14 Within the commercial setting, we see the use of TDM within the banking sector where it assists with identifying and calculating the presence of risk in processes such as credit-risk assessment,15 as well as the application of TDM within the marketing industry as a means of tracking information spread and ‘improving content relevance’ for various target audiences.16 However, despite these benefits, there are barriers in place that ‘inhibit’ the development and application of TDM. Such barriers include legal uncertainties concerning the treatment of TDM under copyright law.

The Role of Copyright in Text and Data Mining Research (TDM)

TDM research can involve some element of copying as earlier mentioned; hence, there is a possibility of infringing the right of reproduction and distribution.17 TDM research usually includes text or data that falls under intellectual property rights (IPRs) protection; both copyrights and database sui generis rights.18 However, the research could fall outside the scope of IPRs as it may lack originality or be in the public domain.19 Therefore, any reproductions that are derived from the creation of a copy of protected work using TDM research may “trigger copyright infringement.”20 Infringement here may range from unauthorized “reproduction, translation, adaption and arrangement and other alteration” of protected copyright.21 Thus, the use of TDM technology for research usually requires express consent from the copyright owners or an exception from copyright law that permits TDM.

TDM and Copyright Exceptions

Across the world, countries have differing copyright regimes relevant to research and TDM research specifically. Some countries have reformed their copyright law to include specific exceptions that enable TDM. The extent to which they enable TDM, however, varies. The majority of countries have research exceptions that apply to the restricted use of limited kinds of works by defined users, thereby limiting the application to TDM.22 We highlight these exceptions below.

  1. Specific Copyright Exceptions for TDM

There are specific copyright exceptions that relate to TDM, key examples include copyright provisions of Japan and Singapore. Japan’s copyright exception is more general as it includes, “exploitation for using the work in a data analysis.”23 The exception is free of restrictions in relation to non-commercial purposes as well as lawfully published or non-lawfully published works.24 Singapore’s approach takes a slightly more restrictive approach as there is a limitation on the purpose applied as the exception touches on “copying or communicating for computational data analysis.” 25 This exception applies to reproduction and communication rights of any use including commercial use, however, it is restricted to lawfully accessed works.26 Countries in Europe are permitted to adopt an open exception for research.27 Estonia, for example, permits the “processing of an object of rights for the purposes of text and data mining and provided that such use does not have a commercial objective”.28 Germany, similarly, has a specific TDM research exception that is restricted to non-commercial use.29 Notably, the copyright exception applicable to TDM research is often not fully open but is restricted.

  1. TDM and Fair Use vs Fair Dealing

There are exceptions present within copyright law that permit the use of reproduction and other forms of alteration. Exceptions here include the fair use and fair dealing exceptions to copyright that can be adapted to TDM research. Assessing the fair use approach, countries such as the United States30 adopt such an approach. The doctrine here is considered to be an open, general and flexible exception as its applicability ranges from any use, work, user, or for any purpose, that is subjected to a four-part proportionality test.31 The test includes: assessing the purpose and character of the use, considering the nature of the copyrighted work, evaluating the amount and substantiality of the portion taken and lastly appreciating the effect of the use upon the potential market.32 Such an approach assists with creating a conducive environment for TDM to occur, as the exceptions are open and general, allowing TDM to fit within statutory interpretation of what constitutes as fair use.

In Kenya, fair dealing is provided under section 26 of the Copyright Act. Fair dealing is an exception to copyright infringement that permits specific uses of copyright-protected work without requiring authorization from the copyright owner. The closed-list of exempted uses is found within the second schedule of the Copyright Act of Kenya, 2001 (as amended in 2022). Notably, the Act provides an exception for the purposes of “scientific research or private use”.33 Whether TDM falls under this exemption is, however, enigmatic. The reason for this is that the exception is ambiguous and narrowly constructed, as the Copyright law is not clear enough on what constitutes “scientific research” or “private use”. With this lack of clarity, the Act gives the rights holder a vast amount of control over the use of their works whilst limiting the dissemination of information that would serve the public interest.34 Consequently, TDM researchers are restricted by an unclear copyright regime, preventing them from potentially influencing public policy in an efficient manner.


This article has outlined the role and use of TDM research in the modern day, specifically the benefits that arise from the process and the varying copyright regimes applicable to TDM research. This discussion falls in line with the “Right to Research in International Copyright Law” discourse among scholars, who aim to define and implement rights to research within international copyright law and policy.35 CIPIT is currently conducting a study to analyse the role of copyright within Kenya in relation to promoting a conducive environment for TDM research. We look forward to sharing our findings on this research soon.

image is from

1 Oriakhognu O D, The Right to Research in Africa : Making African Copyright Whole [2022] PIJIP Research Paper no 78,1. <> last accessed 22nd November 2022.

2 ibid..5.

3 ibid.

4 Rosati E, Copyright as an obstacle or an enabler? A European perspective on text and data mining and its role in the development of AI creativity [2019] APLR Review, 27:2,200.

5 UK Intellectual Property Office, Supporting Document Text Mining and Data Analytics in Call for Evidence Responses [2011]

<> last accessed 23rd October 2022

6 Rosati (n4) 198.

7 ibid.. 201.

8 Flynn S, Schirru L, Palemdo M & Izquierdo A, Research Exceptions in Comparative [2022] PIJIP/TLS Research Paper no. 75,6.

9 ibid.

10 ibid.

11 ibid.

12  Prosser M, ‘How AI Helped Predict the Coronavirus Coutbreak before It Happened’, Singularity Hub 5 (2020); W. Knight, Researches Will Deploy AI to Better Understand Coronavirus (Wired, 2020).; Blue Dot, <> accessed on 15 September 2022

13 Rosati (n6).

14 ibid; Kenyan case studies includes Wainana S & Karomo J & Kyalo R & Mutai N Using Data Mining Techniques and R Software to Analyze Crime Data in Kenya [2020] IJDSA Vol 6 no 1.

15 ibid; Kenyan case studies includes Miller M & Nyauncho E, Effective Data Mining and Analysis for SME Banking [2015] FSD.

16 Content and Influencer Marketing <> last accessed 25th October 2022.

17 Geiger C , Frosio G & Bulayenko O, The Exception for Text and Data Mining (TDM) in the Proposed Directive on Copyright in the Digital Single Market – Legal Aspects [2018] CIIPS Research Paper No 2018-02,6.

18 ibid.

19 ibid.

20 ibid..7.

21 ibid.

22 Flynn (n8) 4.

23 Copyright Act, 1970 (Act No. 48 of May 6, 1970, as amended up to Act No. 72 of July 13, 2018) (Japan): Article 30-4 .

24 Flynn(n22) 29.

25 ibid ; Copyright Act 2021 (Revised Edition 2020, Act No. 22 of 2021) (Singapore).

26 ibid…29-30.

27 Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society, 34

28 Estonia Copyright Act, 2017 (consolidated text of February 1, 2017), 19

29Germany Act on Copyright and Related Rights, 1965 (Copyright Act, as amended up to Act of September 1, 2017) “Section 60d. Text and data mining. (1) In order to enable the automatic analysis of large numbers of works (source material) for scientific research, it shall be permissible”

30 17 U.S. Code § 107 : Limitations on exclusive rights: Fair use.

31 Flynn (n24)5.

32 ibid.

33 Copyright Act Cap 130 , Sec 26(3) Second Schedule 1(a) (Kenya).

34 Amstrong C, De Beer J, Kawooya D , Prabhala A, Schonwetter T, Access to Knowledge in Africa, The Role of Copyright ( UCT Press, IDRC 2010).

35Program on Information Justice and Intellectual Property, ‘Right to Research in International Copyright Law’, < > 22nd November 2022.

Leave a Comment

Your email address will not be published. Required fields are marked