SELECT LANGUAGE BELOW

Public records data must be off-limits for AI

Companies are exploring innovative ways to collect data to feed data-hungry artificial intelligence systems and create innovative applications. Some You don’t have to go too far.

and, Enterprise It collects data from public records and shares it on the internet to perform analysis.

There are many reasons why public records should be kept out of AI systems, and lawmakers must consider them and act quickly before such actions wreak havoc.

First, public records are not unbiased or representative, and results from a system trained on such data are unlikely to be completely unbiased.

In some cases, such as court records, the data may not be true. Perjury laws are rudimentary and ineffectiveIn family law especially, once-close couples can become each other’s worst enemies. Fighting spouses may share each other’s most private secrets and even lie about them. Financial disclosure is common.

AI can be used to combine court documents with other public information to perform psychometric and financial risk profiling of trial parties. If such analysis is sold to potential employers, landlords, and other providers, it could unfairly jeopardize the chances of qualified applicants.

Scammers who get hold of the financial side of the analysis can easily victimize political parties. Repressive governments can use the analysis to endanger their citizens.

Laws to date have attempted to address current issues with technology. Future impacts must also be considered. Disinterested technology visionaries are needed to draft laws to regulate AI.

As the saying goes, data is the new oil. It is important that laws effectively regulate the collection, use and storage of data in its various forms. Quantum Computing, Generative AI And with the emergence of sophisticated hackers, data privacy and security will face increasing challenges.

I closed my account with AT&T in 2019, but still received a notice from the company in April of this year that some of my personal information had been leaked and was at risk for identity theft. The law should not allow companies to keep personally identifiable information for such a long period of time — over four years in this particular case.

In my big data courses, I teach my graduate students that anonymity alone is not enough to protect privacy. Even if they could train an AI model on anonymous data from public records, it could still leak sensitive data by comparing it with stolen personally identifiable information. And language models are not that hard to train. Detecting such information From unstructured text.

There are several Instance When AI models reveal personally identifiable information. Researchers have previously shown that completely removing sensitive information from large-scale language models like ChatGPT is It’s not easy.

The problem is Internet Data Persistence Once data is online, it remains there indefinitely, impacting privacy and security, an issue that is exacerbated by the rise of AI models.

Even when used for altruistic purposes, AI models are largely black boxes, making it difficult to explain the rationale behind decisions, a requirement for most uses of government data. Data fed into AI models offers little control to individuals, even to correct inaccuracies.

It is therefore imperative that governments not only restrict their use in training AI models, but also restrict their own collection and retention of personally identifiable information.

Vishnu S. Pendyala, PhD, MBA (Finance), teaches machine learning and other data science courses at San Jose State University and is a Public Voices Fellow for the OpEd Project.

Facebook
Twitter
LinkedIn
Reddit
Telegram
WhatsApp

Related News