X Has Decided to Use Public Data for AI Training

October 23, 2023

By Luca Fanicchia

X, which many still remember as Twitter, has recently made headlines with its decision to utilize user data from its social platform for training artificial intelligence models. This move was confirmed through X’s updated privacy policy, which explicitly states the company’s intent to harness publicly shared information for the advancement of its AI endeavors.

While the announcement has garnered attention from various quarters, Elon Musk, the driving force behind X and xAI, clarified that only public data would be tapped into, ensuring that direct messages and other private interactions remain untouched. This development has ignited discussions about the balance between technological innovation and user privacy, especially in an era where data is often termed the ‘new oil’.

As we delve deeper into this topic, we aim to provide readers with a comprehensive understanding of the implications and the broader context of this decision.

Elon Musk’s Neuralink Receives FDA Approval for First Human Clinical Study

Announcement Details

X’s recent update to its privacy policy has brought to light its intentions to collect not just biometric data but also details pertaining to users’ job and education history. This revelation, initially spotted by Bloomberg, signifies a broader strategy by X to harness user data. A deeper dive into the policy uncovers X’s plan to employ both the collected information and other publicly accessible data to aid in the training of its machine learning and AI models.

This change in policy was highlighted by Alex Ivanovs of Stackdiary, known for his knack for spotting significant updates in tech companies’ terms of service. The specific change appears in section 2.1 of the policy, which states: “We may use the information we collect and publicly available information to help train our machine learning or artificial intelligence models for the purposes outlined in this policy.”

Musk’s Ambitions

Elon Musk, the owner of X, has been vocal about his ambitions in the AI sector, particularly with his other venture, xAI. Observers, including Ivanovs, speculate that Musk’s overarching plan might be to leverage X as a primary data source for xAI. This theory gains traction especially when considering Musk’s encouragement to journalists to produce content on X, potentially to amass a richer dataset for AI training.

Musk has a clear stance on the matter: in the CEO’s words, xAI’s intention is to use “public tweets” for AI model training. He has also been critical of other tech giants, accusing them of using Twitter data for their AI models and even going to the extent of threatening legal action against Microsoft for alleged unauthorized use of Twitter data.

Public Reaction

The recent revelations about X’s intentions to collect biometric details, job, and education data from its users have stirred quite a buzz in the tech community. Initially brought to light by Bloomberg, the news has been met with a mix of intrigue and concern. The updated privacy policy of X, which was subsequently released, confirmed these speculations, indicating the company’s plan to use the amassed data, along with other personal information, to train its AI models. This was further corroborated by Alex Ivanovs of Stackdiary.

While the privacy policy is explicit about the company’s intent to harness both collected and publicly available data for training machine learning algorithms, Elon Musk has been quick to address concerns. He emphasized that only public data would be accessed, ensuring that private interactions, such as direct messages, remain untouched. However, with X no longer maintaining a dedicated press arm, obtaining further clarity on the specifics of data usage has become challenging.

The Word To the CEO

Interestingly, while X itself hasn’t publicly expressed any AI aspirations, Elon Musk certainly has. His recent venture, xAI, has the ambitious goal of “understanding the true nature of the universe.” The close association between X and xAI, as indicated on xAI’s homepage, has led many to speculate on the potential synergies and shared objectives between the two entities.

Another angle to consider is Musk’s recent announcement about competing with LinkedIn. Labeling the professional networking site as “cringe,” Musk hinted at a “cool” version by X. This could potentially explain the rationale behind collecting job and education histories from users.

Lastly, there’s the financial aspect. With advertising revenues not being a significant income source for X, the monetization of user data could be a lucrative avenue. While there’s no concrete evidence to support this theory, the practice of selling user data isn’t uncommon in the social media realm. Historically, Twitter primarily utilized collected user data for its benefit, rather than for third parties.

Elon Musk’s Mysteries Unveiled his Latest Book

Comparison with Other Tech Giants

Google: Google has made adjustments to its privacy policy that could allow the use of public data for AI training. There’s an indication that Google might be using data for training its AI, but whether they specifically use private Gmail data for this purpose remains unknown. As for Google Docs, Google claims not to use any content from users for any purpose other than to provide the Document AI service. Google’s updated Privacy Policy as of July 5, 2023, allows the company to retain anything posted publicly by users to train its AI models for products, including Bard. Source

Meta (formerly Facebook): Meta stores and serves training data from Tectonic, which is Meta’s exabyte-scale file system. This system serves as a storage infrastructure for their AI training models. Meta has released a significant new AI model called Llama 2. However, the company did not disclose the specific training data used for this model. Source

Microsoft: Microsoft has introduced a feature called Copilot in its Office applications like Word, PowerPoint, Excel, OneNote, and Outlook. This feature aims to enhance productivity, assist users in starting their documents and presentations more quickly, and provide insights from data or emails. When introducing the Microsoft 365 Copilot, Microsoft emphasized its adherence to AI principles and the Responsible AI Standard. They have a multidisciplinary team that reviews AI systems for potential harms and mitigations. Source

Specific tagged data, like employment preferences and job search activity, is invaluable for AI training. Such data provides context, allowing AI models to understand user behavior, preferences, and patterns. This, in turn, enables the creation of more personalized and relevant user experiences. For instance, an AI system trained on job search data can provide tailored job recommendations to users, enhancing the platform’s utility and user engagement.

Implications for the Future

X’s decision to use public data for training its AI models is indicative of a broader trend in the tech industry. As data becomes increasingly valuable, companies are looking for innovative ways to harness its potential. By utilizing user-generated content, tech giants like X are in a unique position to access vast amounts of diverse data, which can significantly enhance the accuracy and capabilities of AI models. This move could potentially set a precedent for other social media platforms and tech companies, leading to a more data-driven approach in AI development. However, this also raises questions about user privacy and the ethical implications of using personal data for commercial purposes.

Given Musk’s track record with companies like Tesla and SpaceX, it wouldn’t be surprising if he has grand plans for xAI and X in the AI domain. However, as X’s press communication remains limited, obtaining concrete details about these plans continues to be a challenge.

Elon Musk and the Second Explosion of Starship Launch

Conclusion

Striking a balance between AI advancements and personal data protection is a challenge that the tech industry grapples with. The potential of AI to process vast amounts of data and derive meaningful insights is undeniable. However, this progress is accompanied by significant concerns about privacy and data protection. As AI technologies continue to evolve, the collection and processing of vast amounts of data become essential. This reliance on data, especially sensitive personal information, raises significant privacy concerns. There’s an inherent risk of potential misuse or unauthorized access, leading to breaches of privacy.

Moreover, AI algorithms can perpetuate biases present in the training data, resulting in discriminatory outcomes. The lack of transparency in some AI models adds to the concerns about user autonomy and the protection of individual rights. As AI adoption grows, it becomes imperative to strike a balance between leveraging the power of AI for innovation and protecting user data privacy. Organizations must prioritize transparency, user consent, and adhere to data protection regulations.

In the context of X and its association with xAI, the decision to use public data for AI training could have far-reaching implications. While it promises advancements in AI capabilities, it also underscores the importance of user trust and the need for stringent data protection measures.