As Meta continues to plunge headfirst into the realm of generative AI, the company is seriously contemplating if it needs to shell out for superior and timelier training data to enhance its AI tools. Among the top contenders for sourcing this data is the news industry. Internal discussions within Meta, previously known as Facebook, indicate that they are mulling over the possibility of forging new paid agreements with news publishers to gain deeper access to news articles, photos, and videos. According to insiders, this move could be pivotal in making Meta’s AI tools more effective and competitive in a burgeoning market dominated by high-caliber AI search tools and chatbots.
Interestingly, Meta hasn’t formally approached any news outlets yet about licensing content. However, if they do, any agreements for data access will be distinct from their previous arrangements where Meta compensated publishers simply to host content links on its platforms. This pivot marks a significant shift for Meta, especially considering that just last year, the company slashed its News division’s budget by a staggering $2 billion. This budget cut was part of a broader strategy to move away from its earlier engagements with the news industry.
Meta CEO Mark Zuckerberg has previously boasted that the company’s internal data, utilized for training its Llama large language model, dwarfs even the infamous Common Crawl. Common Crawl is a colossal set of web data that many AI models rely on for training. However, Meta’s reliance on its proprietary data might not be a foolproof strategy. Should the company opt or be compelled to depend more on its own data, it risks falling behind rivals like Google and OpenAI, whose outputs might prove to be more advanced and accurate.
The landscape for generative AI dramatically changed almost two years ago with the debut of the ChatGPT chatbot, which brought the technology into the public eye. In response, numerous news outlets and websites started blocking automated bots from Common Crawl and OpenAI that were scouring their content for free. Without this unrestricted access to current news content, Meta AI’s responses to queries about recent events could become increasingly limited, outdated, or inaccurate. Competing tech giants have already begun negotiating deals with news publishers, securing access to fresh and credible content necessary for training their AI models.
The US Copyright Office is also weighing new regulations on generative AI, adding another layer of complexity to this evolving narrative. Many news publishers are reportedly amenable to licensing deals, motivated by the notion that some compensation is better than none at all. As Meta contemplates its next move, the company’s journey into the generative AI space promises to be an intriguing one. Will Meta choose to invest in high-quality training data by partnering with news publishers, or will it continue to tread the path of self-reliance and potentially risk lagging behind its competitors? The answer could very well shape the future of AI technology and its applications.