Scraping the data barrel to come under the legal lens with new privacy law violation challenge

In February, IBA weighed in on the topic de jour (or year!) of the rise of ChatGPT, topping out with record user growth, as well as charting the counter punch from Google with its Bard chatbot. Putting aside the hype and hysteria around what generative AI can and can’t do, we looked at the possibility of ChatGPT and similar applications coming under the litigatory spotlight – not just for its output, but for its input – and it is this that has now sparked a firestorm over privacy and misinformation.

Just a few months on and OpenAI, the Microsoft-backed company behind ChatGPT, has now been accused of violating privacy laws by secretly scraping 300 billion words from the internet, tapping “books, articles, websites, and posts — including personal information obtained without consent.”

Scraping the data barrel

In the AI arms race, data scraping is a vital tool to train models quickly and at scale. It’s certainly a key issue across web and social media channels right now. Google is constantly using various methods to prevent scraping of its search results, including IP blocking, captchas, and rate limiting.

Only this week, Elon Musk announced major changes to how many Tweets can be viewed per user, citing AI data scraping as a key concern – having already joined other tech leaders earlier in the year asking generative AI developers to freeze training and development amid an “out of control arms race”.

Press the panic button?

And tech leaders aren’t the only ones asking for a pause on generative AI progression. Bloomberg reports from the OpenAI lawsuit that: “While seeking to represent the massive class of allegedly harmed individuals, and requesting monetary damages to be determined at trial, the plaintiffs are also asking the court to temporarily freeze commercial access to and further development of OpenAI’s products.”

Sanity check

While the legal and ethical debates rage on around its data input, IBA has been sticking to what we know. Our CEO Judith Ingleton-Beer has published articles commenting on the Pros and Cons of using ChatGPT for B2B comms, including where we humans still have the upper hand for creativity and content creation — and unlike ChatGPT can tell the difference between fact and fiction! Check out some of her thoughts in PR News, eWeek, The MarketingInsider, and more.

Read on for our original AI copyright blog from February. We’d hate to say I told you so!

Not another blog on ChatGPT we hear you exclaim!

But we aren’t going to lecture you on how marketers can best use ChatGPT and incorporate it into their campaigns. This type of content is ten-a-penny at the moment – and probably written using ChatGPT!

As one commentator mentioned on my LinkedIn feed recently – “it never takes long for marketers to break the shiny new toys.”

There is a far bigger evolving market backdrop around ChatGPT and the growing number of competitive AI solutions. The future of Generative Pre-trained Transformer and Large Language Model AI tech is becoming a new technology battlefront.

The AI arms race heats up between tech giants

After ChatGPT was launched by San Francisco-based startup OpenAI in November of 2022 it soon came to light that the ‘start-up’ did indeed start up as a not for profit organization with a star studded list of benefactors from Tesla and SpaceX CEO Elon Musk to venture capitalist Peter Thiel; and up and coming entrepreneur Sam Altman, who became the CEO of OpenAI in 2019. That was the year that Microsoft made its first $1billion investment in the company just as Bing was bottoming as a search engine and was the constant butt of Google jokes. As Microsoft invests a further $10billion in Open AI, Google is in the process of hitting back with its Google Bard AI research assistant, but this got off to a pretty expensive false start when it wiped more than $120billion off Google parent company Alphabet’s market value by giving a misleading answer to a question about a NASA telescope.

Far be it from IBA to weigh-in on a brewing battle royal that’s way above most people’s pay grade. But there may well be a battle coming closer to home – as the ownership and use of ChatGPT’s content output comes under the legal lens.

The troops are assembling and IP policy is caught flat footed

IP policy has been caught off guard, so fast has this huge step-change in technology capability been.  And we aren’t just talking education institutions struggling to manage plagiarism in student submitted work. The U.S. Copyright Office has gone on record to state its key focus over the next year will be addressing legal gray areas that surround copyright protections and artificial intelligence.

And this begins at the input stage, not the output! ChatGPT and similar software uses existing text, images and code to create “new” work – and the technology has to get its ideas from somewhere. That means trawling the web to “train” and “learn” from existing content. No surprise there are already lawsuits being filed against OpenAI and similar companies, which argue that AI engines are illegally using other people’s work to build their platforms and products in the first place.

Let’s assume that this is clarified from a legal perspective. Then we come to the output stage. The question is, does the copyright lie with the creator of the AI technology, or the company or person who used the tool to create the “new” content? And this is all before any resulting content is even pitched to the media.

Ethics of AI-based content – so who does own the copyright?

We already have IP and even ethical issues around content-based PR campaigns. Generally, the copyright of a piece of content belongs to the company that created the material, for example the copyright for bylined article from a key subject matter expert is held by the company they work represent. But the IBA team is no stranger to the already existing quirks of copyright law once a piece has been submitted to a publication – something that can change depending on the region of a media outlet or even their own editorial policies and will be clearly outlined in any author agreements signed.

But what about AI-driven content? Thankfully the IBA team hasn’t been overrun by the infinite digital monkeys, bashing out the works of Shakespeare, and we’re always clear on copyright issues with clients, the ethics of play for play and with magazine editorial copyright policies. But it is something that is definitely up for debate in the PR world – which is why it was interesting to see the PR Council (PRC) questioned on exactly this topic recent by PR News

PR Council weighs in

The PRC position is very much aligned with the existing understanding of PR copyright. With recent quotes as follows from Kim Sample, PRC president, and incoming board chair Ellen Ryan Mardiks:

Sample: We’ve always focused on ethics and standards. We just did a session on ChatGPT, which was highly attended…and we’re following up with a session with legal counsel.

And we’re going to be talking with the board about when do we issue guidance and standards on using generative AI.

We’re saying to members, “Play with it. [AI is] a great thing if it gets rid of some of the mundane tasks.” But [eventually] we will be at a place where we’re going to require disclosure.

Ryan Mardiks: You know about paid spokespeople needing to disclose. All that is transparency. It’s not the exact same thing [as AI disclosure], but it’s adjacent…this is content meant to communicate not just facts and figures, but ideas and messaging and values…so, we’re going to have to be thoughtful about it…we will need to have industry standards.

But we need to lead versus the other way around. So…deploy it, smartly and thoughtfully and hopefully…we don’t lose the intellectual art that needs to be deployed in writing. If we lose that, all it becomes is prompts. We will have lost something important for consumers.

Questions need to be answered

Generative AI is throwing up some very intriguing IP questions both in legal circles and the PR and marketing sector – both in terms of how it learns and how its output is used. For now, it seems best to err on the side of caution for use in outward-facing content.

In the meantime, perhaps PR and marketing professionals should heed the advice of our last ChatGPT blog on what it can’t do. Use it as an opportunity to sharpen their skills ls that AI and bots lack: imagination & creativity, strategic & critical thinking, and emotional intelligence.

And also get out of the rut of writing marketing speak and start to write with style and flair and not like a Google Translate of a good idea!

Jamie Kightley is Head of Client Services at IBA International

Leave a comment