Anthropic, like artificial intelligence itself, is inevitable. They’re one of the most well-funded AI’s out there, with a billion plus from Amazon, Google, and even Sam Bankman Fried. Amazingly, while SBF himself appears likely headed to jail for fifty years or so, it may turn out that FTX and Alameda are actually solvent as a result of investments in Anthropic!
On the other hand, $150,000 in statutory damages per infringement multiplied by virtually every song that has ever been equals, what, a gazillion or so?
There have been other cases but this one is huge. We all know about the Sarah Silverman-led case against Chat GPT and Meta, brought because the LLM models may have trained on copyright-protected materials scraped from the internet. Getty Images is going after Stability AI which makes image and music generating tools. The thing is, as wrong as it may sound the first time you hear of it, using copyright-protected material to train a bot that might later write a story or music that’s in some way based on what the bot learned from all that input, is not clearly illegal.
I mean, the complaint says otherwise, but it’s one of the big questions at hand. While it’s still a question, these companies are doing it as fast as they can, and the rationale probably ranges from, “Google Books v. Authors Guild went a long way toward making us believe this is all fair use” to “Screw it, whatever penalties we ever get hit with, it’s worth it. Forgiveness is way better than permission. The ship has sailed.”
Whether it’s Silverman, Anthropic, or the Ghostwriter guy wearing the bedsheet with the sunglasses making fake Drake tracks and trying to get them nominated for Grammys, there’s nothing clearly wrong about the training part. So any argument that begins with “it’s illegal to make a copy, and when you make a token for your LLM, that’s a copy, so it’s infringement,” is a begging one. And is training your large language model on copyright-protected music an “unauthorized use” for which you need a license? Derivative works are the exclusive right of a copyright holder but every piece of music I’ve ever written is in some measure influenced by every piece I’ve ever studied or even known, and I’m risking being absurd calling it therefore derivative. But this is just obviously a lot more immediate. Now, if a tool like Antropic’s “Claude” is asked to “write me a song about the day Buddy Holly died, with guitar chords and everything, and its output shows that it was clearly considering Don McLean’s “American Pie” as it composed a new-ish song, isn’t that necessarily infringement? (I got that example from the complaint btw.)
When I asked Claude to write that song a few times myself, it wrote a lot of original but not very good lyrics (still super impressive, you understand). The thing was, it sure had a hard time getting completely away from the idea that “February made me shiver” and that this was the “day the music died.” These couple of phrases made it into Claude’s lyrics each of the four times I asked it to write a song. But had it not, I wouldn’t have mapped anything else musicologically relevant about it to “American Pie.” And one could argue that while the “day the music died,” and “February made me shiver” both point to a very famous song, it’s not at all clear that either of those phrases are protectable. I asked it to write chords too, but it didn’t choose anything from “American Pie.” The complaint provides several examples: “A Change Is Gonna Come,” “God Only Knows,” “What a Wonderful World,” “Gimme Shelter,” “American Pie,” “Sweet Home Alabama,” “Every Breath You Take,” (Sting’s not litigious, don’t worry about it.) “Life Is a Highway,” “Somewhere Only We Know,” “Halo,” “Moves Like Jagger,” and “Uptown Funk,” all made the list but I haven’t looked beyond the Don McLean. He was the first concert I ever saw by the way. My first idol.
Not all uses are infringing. I mentioned “fair use” earlier. That’s a doctrine that allows some uses of copyright material without the permission of the copyright holder. When the biggest developers like Microsoft and Google make the news by offering indemnification to their users, and protecting them from infringement claims, “fair use” is one of the ideas they most depend upon. Fair use factors are brief and simple but open to plenty of interpretation. Uses for purposes such as criticism, comment, news reporting, teaching, scholarship, or research are all allowable. As to what other uses may qualify, we consider these factors:
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.
The stickiest claim is probably that Claude will print the lyrics to songs upon request. I have noticed that Chat GPT either refuses or is extremely reluctant to reprint copyright-protected materials, though it’s fun to try to trick it into giving away what it knows but has been told not to share.
If I ask Claude to “tell me the words to Freebird!” It obliges. “These are the complete lyrics to ‘Freebird’ by Lynryd Skynrd.” And at the end it offers, “The iconic southern rock ballad is about being free, restless and unable to commit to settling down. The soaring guitar solo at the end cemented “Free Bird” as one of Lynryd Skynrd’s most popular and enduring songs.”
They appear to have taken an anti-oxford comma position that I ain’t crazy about. But a blurb something like this is conspicuously included at the foot of every lyric I’ve asked for. Is that criticism, commentary, and teaching such that the whole thing has a fair use defense? I’d say they think so.