Meta, formerly known as Facebook, is diving deep into the world of generative AI work, sparking discussions internally about potentially striking new paid deals with news publishers to enhance the quality and immediacy of its training data. The company is contemplating the necessity of acquiring better access to news, photo, and video content to bolster its generative AI tools, such as Meta AI, and ensure they remain competitive in the ever-evolving landscape of AI search tools and chatbots.
While Meta has not officially initiated any conversations with news outlets regarding licensing or accessing content, the idea of forming new partnerships to obtain data for model training is gaining traction within the company. This approach would be distinct from prior agreements where Meta compensated publishers to display links to their content on its platforms. The shift towards exploring paid relationships with news publishers marks a significant departure from Meta’s previous strategies, including the dismantling of a $2 billion budget for its News division over the past 18 months, as reported by Business Insider.
Meta’s CEO, Mark Zuckerberg, has asserted that the company possesses a robust dataset for training its Llama large language model, surpassing the scale of Common Crawl, a widely-used repository of web data for AI training. However, should Meta opt to rely more heavily on its proprietary data, it risks lagging behind competitors like Google and OpenAI, which have forged partnerships with news organizations to access a broader range of content for training their AI models.
The rise of generative AI has prompted news outlets and websites to take measures against automated bots scraping their content for free. Meta may face challenges in delivering timely and accurate responses to user queries on current events if it lacks unrestricted access to news publisher content. The potential limitations in Meta AI’s responses could stem from constraints on accessing real-time data, leading to outdated or inaccurate information being relayed to users.
In the realm of generative AI, securing access to diverse and up-to-date training data is crucial for enhancing the capabilities of AI models. Many news publishers are open to licensing agreements with tech companies like Meta, recognizing the value of having their content utilized for AI model training. As the landscape of generative AI continues to evolve rapidly, Meta’s deliberations on engaging with news publishers for enhanced data access underscore the pivotal role of quality training data in advancing AI technologies.