Fair use is defined in Section 107 of the Copyright Act of 1976, which I’ll quote verbatim below:
Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include—
- the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
- the nature of the copyrighted work;
- the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
- the effect of the use upon the potential market for or value of the copyrighted work.
The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.
Fair use is a balancing test which requires weighing all four factors. In practice, factors (4) and (1) tend to be the most important, so I’ll discuss those first. Factor (2) tends to be the least important, and I’ll briefly discuss it afterwards. Factor (3) is somewhat technical to answer in full generality, so I’ll discuss it last.
None of the four factors seem to weigh in favor of ChatGPT being a fair use of its training data. That being said, none of the arguments here are fundamentally specific to ChatGPT either, and similar arguments could be made for many generative AI products in a wide variety of domains.
Suchir Balaji
Interesting analysis by a former OpenAI researcher who left the company and publicly spoke against their business practices, going as far as an interview with The New York Times – a publication which last year sued OpenAI (and Microsoft) for copyright infringement, so naturally they would want to distribute Balaji’s views. Moreover, in November he became a potential witness in this trial after the Times’ attorneys named him in court filings as having material helpful to their case, along with at least twelve people, including past or present OpenAI employees.