Publish, or Perish?

When you think about the “copyright wars,” you typically think of the major record companies and Hollywood studios battling digital platform providers and developers over alleged “piracy” and the purported “value gap.” But some of the hottest engagement these days are happening in the world of publishing, including book, news media and scholarly journal publishing. And the roster of antagonists includes governments, courts, libraries and public advocates.

The U.S. Supreme Court this week handed down a ruling in a closely watched case concerning an annotated compendium of the Georgia state laws compiled by LexisNexis on behalf of the state legislature and no direct cost to the state treasury.

While the statutes themselves are a matter of public domain, under a long-standing legal principle known as the “government edicts doctrine,” the legislature maintained that the annotations represents original works of authorship covered by copyright. Under its work-for-hire agreement with LexisNexis, the legislature claimed the copyright in the annotated presentation for itself and granted LexisNexis and exclusive license to sell the annotated edition — the only form in which a complete corpus of state laws is made available.

When the non-profit public access organization Public Resource scanned the annotated edition and posted it for free the state legislature sued it for copyright infringement.

By a 5-4 majority that crossed ideological lines the Court held that since the work of annotating the laws was done by LexisNexis under a work-for-hire agreement, the legislators themselves were technically the authors of that work. Insofar as they were acting in their official capacity as legislators, however, any work of authorship they produced would fall under the government edicts doctrine and ineligible for copyright.

In its brief to the Court, the legislature argued that without a licensable copyright, it could not induce private parties such as LexisNexis to help it produce affordable editions of its annotated code, ultimately harming the public. Writing for the majority, however, Chief Justice John Roberts said that was a matter of public policy better addressed to Congress than to the courts.

“That appeal to copyright policy, however, is addressed to the wrong forum,” Roberts wrote. “As Georgia acknowledges, ‘it is generally for Congress, not the courts, to decide how best to pursue the Copyright Clause’s objectives.'”

Twenty-two states, two U.S. territories and the District of Columbia rely on similar arrangements with commercial publishers to produce annotated statute books. All of those arrangements are now potentially invalid, as could be thousands of other copyrights and licensing arrangements maintained by the several states. Which means this week’s ruling won’t be the last shot fired in this skirmish.

Open Sesame

We also haven’t heard the last shot in the long-simmering battle over applying open-access publishing policies to publicly funded research.

In 2018, a coalition of scientific research funding organizations announced an ambitious plan to require recipients of their grants to make any resulting papers on their findings freely available for anyone to read, download, translate or otherwise re-use the work, rather than publishing them in subscription- or fee-based journals.

After much hue and cry from commercial academic publishers, the original 2020 target date for the initiative, known as Plan S, to take effect was pushed back by a year to give publishers more time to adjust their business.

But the issue hasn’t gone away. And in February of this year, the White House Office of Science and Technology Policy (OSTP) issued a request for comments on a proposal to implement a similar policy in the U.S. for federally financed research.

Those public comments are not yet available, but in December, when word of a possible executive order imposing the policy began to circulate, a group of publishers led by the Association of American Publishers (AAP) wrote a letter to the White House strongly objecting to any such policy.

“We have learned that the Administration may be preparing to step into the private marketplace and force the immediate free distribution of journal articles financed and published by organizations in the private sector, including many non-profits,” the group wrote. Such a policy, they said, effectively nationalize American intellectual property and “force us to give it away to the rest of the world for free.”

The issue is an odd one for the Trump White House to be pushing, given its general disdain for the fruits of scientific research. And in fact the initiative within OSTP dates to the Obama Administration. The fact that it has been able to continue, however, demonstrates the momentum behind the open-access movement.

Much of the raw data that results from publicly financed research is already made freely available, either voluntarily by the researchers themselves or the institutions they work for, or as a matter of policy by the funding organizations.

It is the scholarly papers analyzing those data that historically have been published in commercial, peer-reviewed, journals that are at issue.

Subscriptions to those journals are often very expensive, especially institutional subscriptions that allow access to qualified users. Tension between publishers and budget-strapped public and academic libraries is a matter of long standing.

But as digital technology has democratized access to information generally the question of access to publicly funded scientific information has become a matter of public policy.

There is a certain intuitive fairness to the idea that publicly funded research ought to be publicly available. We paid for it, after all.

On the other hand, organizing and managing the peer-review process, and identifying, curating and archiving the most credible and highest quality information — editing — cost money. And if commercial publishers are not able to profit from performing those functions they are unlikely to perform them.

Someone else — perhaps even the public that funded the research in the first place — would need to bear the cost of performing those fuctions.

Training Camps

Commercial publishers these days also find themselves on the front lines of a growing conflict over the use of copyrighted works to train artificial intelligence algorithms.

As discussed here before, both the U.S. Patent & Trademark Office (USPTO) and the World Intellectual Property Organization (WIPO) have launched inquiries into the intellectual property implications of A.I. technology.

Much of the public discussion of the issue has focused on whether works of authorship or invention produced by A.I. systems should be eligible for intellectual property protection, and if so, in whom or what should any such copyright or patent vest.

But the most contentious debates emerging from the USPTO and WIPO inquiries has been around the datasets used to train A.I. systems.

Artificial intelligence applications such as machine learning must be fed huge amounts of data from which the algorithm can decipher patterns, relationships among data points, and statistical correlations. In many such applications, the data being fed in is contained in works that are under copyright.

As with most computer operations, a certain amount of reproduction is involved. The works most be reproduced in a machine-readable form before they can be input, and then get reproduced again in the computer’s random access memory as it processes those machine-readable inputs.

According to the AAP, such “wholesale, un-permissioned reproduction of copyrighted works in which data subsists, even for the purpose of machine learning, is likely to be infringing.” As for any data contained within those works, “the scope and terms of such use can best be set out in a licensing agreement between the parties.”

Other rights groups argue that any output of an A.I. system represents a derivative work descended from the copyrighted inputs and that the act of creating it should be licensed.

A.I. developers, however, maintain that their algorithms are merely extracting data about the works, not data from within the works and that data, like facts, cannot be copyrighted. As for any reproduction involved, it is merely functional, and of no commercial significance, and should therefore be allowed under the fair use or fair dealing doctrines.

In one sense, the dispute over A.I. training datasets is simply another flare up in the age-old conflict between rights owners and technology developers over fair use/fair dealing.

What the broader debate over the copyright implications of A.I. would seem to share with the Georgia case and the drive for open access, however, is an evolving sense that the public domain is becoming impoverished, and that its legal scope and contours need to be revised.

Copyright owners have been generally successful over the past three decades, legally and politically, at putting limits on the public domain, from adding decades to the term to copyright to channeling the publication of public information through private hands.

Last week’s Supreme Court ruling and the imposition of an open-access policy for publicly funded research cut against those gains, and could herald a broader shift in the wind.