By Preston Gralla
Contributing Editor, Computerworld |
Artificial intelligence (AI) is suddenly the darling of the tech world, thanks to ChatGPT, an AI chatbot that can do things such as carry on conversations and write essays and articles with what some people believe is human-like skill. In its first five days, more than a million people signed up to try it. The New York Times hails its “brilliance and weirdness” and says it inspires both awe and fear.
For all the glitz and hype surrounding ChatGPT, what it’s doing now are essentially stunts — a way to get as much attention as possible. The future of AI isn’t in writing articles about Beyoncé in the style of Charles Dickens, or any of the other oddball things people use ChatGPT for. Instead, AI will be primarily a business tool, reaping billions of dollars for companies that use it for tasks like improving internet searches, writing software code, discovering and fixing inefficiencies in a company’s business, and extracting useful, actionable information from massive amounts of data.
But there’s a dirty little secret at the core of AI — intellectual property theft. To do its work, AI needs to constantly ingest data, lots of it. Think of it as the monster plant Audrey II in Little Shop of Horrors, constantly crying out “Feed me!” Detractors say AI is violating intellectual property laws by hoovering up information without getting the rights to it, and that things will only get worse from here.
An intellectual property lawsuit against Microsoft may determine the future of AI. It charges that Microsoft, the Microsoft code repository GitHub, and OpenAI, the parent of ChatGPT, have illegally used code created by others in order to build and train the Copilot service that uses AI to write software. (Microsoft has invested $1 billion in OpenAI.)
The future of AI may well hinge on the suit’s outcome.
To understand the lawsuit, you first need to understand Microsoft’s big bet on AI. Microsoft CEO Satya Nadella believes AI will transform Microsoft much as the cloud did. He recently said: “I think the next phase — if you say mobile and cloud was the last paradigm — is going to be AI.”
AI will be used in every part of the company, from the cloud to Bing search, and even Windows itself. At last week’s CES conference, he explained, “Artificial Intelligence is going to reinvent how you do everything on Windows, quite literally.” He only gave few details, such as a natural language interface, but there’s no doubt a lot of work going on he doesn’t yet want to reveal.
The combination of AI and the cloud is perhaps the company’s most important focus. ChatGPT is powered by Microsoft’s Azure cloud technology and was trained using Azure’s AI supercomputing infrastructure. Microsoft will be selling Azure-based AI capabilities to businesses, which means companies will be able to take advantage of AI without building any infrastructure at all — they can simply use Microsoft’s cloud-based AI. The product, called Azure OpenAI Service, is already available for preview.
Microsoft is also planning to integrate ChatGPT3 AI with Bing search to power better search results, and to directly deliver the information people are looking for, rather than just pointing them to web pages.
Now we come to a service that led to the lawsuit against Microsoft: Copilot. It’s an AI coding assistant that spits out code for creating basic software functions, freeing developers to focus on more complicated, higher-level programming tasks. Programmers need only to tell Copilot what they want built, and Copilot creates ready-to-use code they can paste into their work.
That’s just for today, though. With enough training, it wouldn’t be surprising if non-programmers would be able to build simple apps without knowing a line of code — and maybe eventually build more complicated apps as well.
To work its magic, Copilot needs to be trained on tremendous amounts of code. The Microsoft-owned GitHub open-source code repository, which runs Copilot, says it is “trained on billions of lines of public code.” The way in which Copilot gets that open source code is at the heart of the suit.
Open-source code isn’t in the public domain. It’s often copyrighted. It can be used without being paid for, but only if the person using it agrees to the software’s licensing terms. There are a variety of different open-source licenses with different terms. For example, a license might require that any software built based on the open-source code must include the name of the creator of the original code and a copyright notice.
Copilot doesn’t adhere to those licenses. Microsoft claims it doesn’t need to. GitHub’s CEO, Nat Friedman, claims Copilot can use any open source code for training, regardless of the licenses, because it falls under “fair use” in copyright law. Many other AI companies and researchers claim the same thing.
Matthew Butterick, a programmer, writer, and lawyer, disagrees. He and the Joseph Saveri law firm have filed a class action suit against Microsoft GitHub and OpenAI, claiming they “profit from the work of open-source programmers by violating the conditions of their open-source licenses.” In plain English, the suit says Microsoft and the others are software pirates, stealing the intellectual property of those who created the code used to train Copilot. (Butterick says that sometimes when someone asks Copilot to write software, the resulting code is an exact copy of open source code on which Copilot was trained.)
Butterick warns that Copilot is just the camel’s nose under the tent, and much more massive intellectual property theft is on the way — not just theft of code, but of images, writing, and data of any sort. He told The New York Times, “The ambitions of Microsoft and OpenAI go way beyond GitHub and Copilot. They want to train on any data anywhere, for free, without consent, forever.”
He’s right. AI image generators, such as DALL-E 2, which is run by OpenAI, already train in images found on the web.
One of the great ironies of Microsoft’s use of open-source software and reliance on a “fair use” argument is that for years the company fought against open source as if it were the devil itself. In 2000, Microsoft’s then-CEO Steve Ballmer said that Linux open source software has “the characteristics of communism.” A year later he doubled down, calling it a “cancer.”
To a great extent, this suit will determine the future of AI. If Microsoft wins, it’ll be full speed ahead for AI, which will be allowed to gobble up and train on code, images, articles, and data created by others. Butterick’s warning about AI using any data for free anytime, anywhere would almost certainly come true. If Microsoft loses, those who create AI will have to tread far more carefully, possibly slowing down AI creation, but respecting the intellectual property of artists, writers, programmers, and others.
As for me, I’m on the side of those who create, and against Microsoft. It’s already a tough enough life for those who try to earn a living based on their intellectual and artistic abilities. Woody Guthrie wrote in his song “Pretty Boy Floyd”:
“Yes, as through this world I’ve wandered
I’ve seen lots of funny men;
Some will rob you with a six-gun,
And some with a fountain pen.”
If Microsoft wins the suit, it’s not a six-gun or a pen that will rob the creative people among us — it’ll be a disembodied AI stealing at the behest of a trillion-dollar company and an AI industry that will be worth countless more trillions than that.
Preston Gralla is a contributing editor for Computerworld and the author of more than 45 books, including Windows 8 Hacks (O’Reilly, 2012) and How the Internet Works (Que, 2006).
Copyright © 2023 IDG Communications, Inc.
Copyright © 2023 IDG Communications, Inc.
By Preston Gralla