Publish, or Perish?

When you think about the “copyright wars,” you typically think of the major record companies and Hollywood studios battling digital platform providers and developers over alleged “piracy” and the purported “value gap.” But some of the hottest engagement these days are happening in the world of publishing, including book, news media and scholarly journal publishing. And the roster of antagonists includes governments, courts, libraries and public advocates.

The U.S. Supreme Court this week handed down a ruling in a closely watched case concerning an annotated compendium of the Georgia state laws compiled by LexisNexis on behalf of the state legislature.

While the statutes themselves are a matter of public domain, under a long-standing legal principle known as the “government edicts doctrine,” the legislature maintained that the annotation it commissioned (at no direct cost to the state) represents original works of authorship covered by copyright. Since the work was performed by LexisNexis under a work-for-hire agreement, the legislature claimed the copyright in the annotated presentation for itself. In return, it granted LexisNexis and exclusive license to sell the annotated edition commercially — the only form in which a complete corpus of state laws is made available.

When the non-profit public access organization Public Resource scanned the annotated edition and posted it online for free the state legislature sued it for copyright infringement.

By a 5-4 majority that crossed ideological lines the Court held that, since the work of annotating the laws was done under a work-for-hire agreement, the legislators themselves were technically the authors of the annotations. And insofar as they were acting in their official capacity as legislators, any work of authorship they produced would fall under the government edicts doctrine and be ineligible for copyright.

Where there is no copyright, there could be no infringement my Public Resource.

But where there is no copyright, there also could be no valid assignment of the exclusive right to distribute the annotated volumes to LexisNexis.

In its brief to the Court, the legislature argued that without a licensable copyright, it could not induce private parties such as LexisNexis to help it produce affordable editions of its annotated code, ultimately harming the public. Writing for the majority, however, Chief Justice John Roberts said that was a matter of public policy better addressed to Congress than to the courts.

“That appeal to copyright policy, however, is addressed to the wrong forum,” Roberts wrote. “As Georgia acknowledges, ‘it is generally for Congress, not the courts, to decide how best to pursue the Copyright Clause’s objectives.'”

Twenty-two states, two U.S. territories and the District of Columbia rely on similar arrangements with commercial publishers to produce annotated statute books. All of those arrangements are now potentially invalid, as could be thousands of other copyrights and licensing arrangements maintained by the several states. Which means this week’s ruling won’t be the last shot fired in this skirmish.

Open Sesame

We also haven’t heard the last shot in the long-simmering battle over applying open-access publishing policies to publicly funded research.

In 2018, a coalition of scientific research funding organizations announced an ambitious plan, known as Plan S, to require recipients of their grants to make any resulting papers on their findings freely available for anyone to read, download, translate or otherwise re-use the work rather than publishing their papers in subscription- or fee-based journals.

After much hue and cry from commercial academic publishers, the original 2020 target date for Plan S to take effect was pushed back by a year to give publishers more time to adjust their businesses.

But the issue hasn’t gone away. And in February of this year, the White House Office of Science and Technology Policy (OSTP) issued a request for comments on a proposal to implement a similar policy in the U.S. for federally financed research.

Those public comments are not yet available, but in December, when word of a possible executive order imposing the policy began to circulate, a group of publishers led by the Association of American Publishers (AAP) wrote a letter to the White House strongly objecting to any such policy.

“We have learned that the Administration may be preparing to step into the private marketplace and force the immediate free distribution of journal articles financed and published by organizations in the private sector, including many non-profits,” the group wrote. Such a policy, they said, would effectively nationalize American intellectual property and “force us to give it away to the rest of the world for free.”

The issue is an odd one for the Trump White House to be pushing, given its general disdain for the fruits of scientific research. And in fact, the initiative within OSTP dates to the Obama Administration. The fact that it has been able to continue, however, demonstrates the momentum gathered behind the open-access movement.

Much of the raw data that results from publicly financed research is already made freely available, in fact, either voluntarily by the researchers themselves or the institutions they work for, or as a matter of policy by the funding organizations.

It is the scholarly papers analyzing those data that historically have been published in commercial, peer-reviewed, journals that are at issue.

Subscriptions to those journals are often very expensive, especially institutional subscriptions that allow access to all qualified users. Tension between publishers and budget-strapped public and academic libraries is a matter of long standing. But as digital technology has democratized access to information generally, the question of access to publicly funded scientific information has become a question of public policy.

There is a certain intuitive fairness to the idea that publicly funded research ought to be publicly available. We paid for it, after all.

On the other hand, organizing and managing the peer-review process, and identifying, curating and archiving the most credible and highest quality information — in a word, editing — costs money. And if commercial publishers are not able to profit from performing those functions they are unlikely to perform them.

Someone else — perhaps even the public that funded the research in the first place — would need to bear the cost of performing those fuctions.

Training Camps

Commercial publishers these days also find themselves on the front lines of a growing conflict over the use of copyrighted works to train artificial intelligence algorithms.

As discussed here before, both the U.S. Patent & Trademark Office (USPTO) and the World Intellectual Property Organization (WIPO) have launched inquiries into the intellectual property implications of A.I. technology.

Much of the public discussion of the issue has focused on whether works of authorship or invention produced by A.I. systems should be eligible for intellectual property protection, and if so, in whom or what should any such copyright or patent vest.

But the most contentious debates emerging from the USPTO and WIPO inquiries have been around the datasets used to train A.I. systems.

Artificial intelligence applications such as machine learning must be fed huge amounts of data from which the algorithm can decipher patterns, relationships among data points, and statistical correlations. In many such applications, the data being fed in is contained in works that are under copyright.

As with most computer operations, a certain amount of reproduction is involved. The works most be reproduced in a machine-readable form before they can be input, and then get reproduced again in the computer’s random access memory as it processes those machine-readable inputs.

According to the AAP, such “wholesale, un-permissioned reproduction of copyrighted works in which data subsists, even for the purpose of machine learning, is likely to be infringing.” As for any data contained within those works, “the scope and terms of such use can best be set out in a licensing agreement between the parties.”

Other rights groups argue that any output of an A.I. system represents a derivative work created from the copyrighted inputs and that the act of creating it should be licensed.

A.I. developers, however, maintain that their algorithms are merely extracting data about the works, not any part of the works themselves, and that data, like facts, cannot be copyrighted. As for any reproduction involved, it is merely functional and of no commercial significance, and should therefore be allowed under the fair use or fair dealing doctrines.

In one sense, the dispute over A.I. training datasets is simply another flare up in the age-old conflict between rights owners and technology developers over fair use/fair dealing.

What the broader debate over the copyright implications of A.I. would seem to share with the Georgia case and the drive for open access, however, is an evolving sense that the public domain — the commons on which future authors and investigators must draw — is in danger of becoming impoverished, and that its legal scope and contours need to be revised.

Copyright owners have been generally successful over the past three decades, legally and politically, at putting limits on the public domain. Decades have been added to the term to copyright, preventing works from entering the public domain for more than a century in some cases, and the publication of public information is increasingly channeled through private hands in the name of budgetary efficiency.

Last week’s Supreme Court ruling and the imposition of an open-access policy for publicly funded research cut against those gains, and could herald a broader shift in the wind.