Connecting the Dots: Data scraping, then and now

By JOHN BOS

Published: 07-22-2023 8:40 PM

The Federal Trade Commission has launched an investigation of ChatGPT maker OpenAI for potential violations of consumer protection laws. Consumers; that’s us. The FTC has asked OpenAI to provide details of all complaints the company has received from users regarding “false, misleading, disparaging, or harmful” statements put out by OpenAI.

The FTC wants to know whether OpenAI engaged in unfair or deceptive practices relating to risks of harm to consumers, including reputational harm. The agency has asked detailed questions about how OpenAI obtains its data, how it trains its models, the processes it uses for human feedback, risk assessment and mitigation, and its mechanisms for privacy protection.

“How OpenAI obtains its data” has deep roots in the history of the efforts to find out anything and everything it can about you and me. Data scraping is the new phrase in town … or at least in Silicon Valley. Data scraping, also known as web scraping, is the process of digitally extracting data from websites or other online sources. The extracted data can include text, images, links, tables, and other information, some of which may be OK with you but some, absolutely not.

Thinking back, something I do a lot these days, it occurs to me that my first awareness of data scraping was in 1942 when I was 6 years old. I can visualize my mom sitting on our telephone bench seat talking on our party line. The party line telephone was an early telecommunications arrangement beginning in the late 1800s in which several telephone users were connected to a common line. Party lines provided no privacy in communication. They were frequently a source of gossip, as well as a means of quickly alerting entire neighborhoods of emergencies such as fires.

Objections about one party monopolizing a multi-party line were a staple of complaints to telephone companies and eavesdropping on calls remained an ongoing concern. And as late as 1956, Southern Bell officials refused a request from a public utilities commissioner in Jackson, Mississippi to segregate party telephone lines on racial boundaries.

But early eavesdropping cannot compare to today’s AI-powered digital intrusions into our private shopping, medical and political “communications” (like emails to legislators and donations to political causes).

While there are some similarities between the days of early eavesdropping on old party line telephone systems 80 years ago and today’s electronically elusive data scraping, they are not exactly the same.

Data scraping refers to the automated extraction of data from websites or online platforms. It involves using AI software programs to collect information from websites, often without the explicit consent of the website owners. The collected data can be used for various purposes, such as market research, data analysis, building applications … or for illegal purposes.

Article continues after...

Yesterday's Most Read Articles

Retired police officer, veteran opens firearms training academy in Millers Falls
Valley lawmakers seek shorter license for FirstLight hydropower projects
More than 130 arrested at pro-Palestinian protest at UMass
Baseball: Caleb Thomas pitches Greenfield to first win over Frontier since 2019 (PHOTOS)
Real Estate Transactions: May 10, 2024
As I See It: Between Israel and Palestine: Which side should we be on, and why?

Eavesdropping on old party line telephones, on the other hand, refers to the act of listening in on telephone conversations without the knowledge of the parties involved. On those telephone lines shared by multiple households, individuals could listen to conversations by simply picking up their own phone receiver. This practice was generally seen as intrusive and a breach of privacy.

Both practices involve accessing information without explicit permission. However, there are fundamental differences between these two practices.

Technology: Data scraping is a digital practice facilitated by AI automated software involving interactions with websites and online platforms. Eavesdropping on party line telephone systems was a personal process that simply required picking up a phone receiver. Sometimes you could hear the “click” of a phone being picked up. No way can you tell when someone is gleaning information you deem private by data scraping.

Intent and purpose: Data scraping can be done for legitimate purposes such as data analysis, market research, weather forecasting and building applications. Data scraping itself is not inherently malicious, but it is being abused. As usual, it is the people who are illegally gathering your data who are the violators of your privacy. Back in the day, eavesdropping on party line telephone conversations was also regarded as an invasion of privacy.

Russia’s Sputnik is the cornerstone of today’s ginormous communications iceberg, most of which rests unseen on 90% of that iceberg below the waterline in an ocean of information overload. With vast amounts of information available at our fingertips, it has become overwhelming to filter through the noise and identify reliable sources.

Bottom line: In the context of consumer protection, today’s AI-based data scraping technology can produce errors, exhibit biases, and violate personal data privacy.

Greenfield resident John Bos is trying, in his own way with words, to make sense of our disintegrating democracy in “Connecting the Dots,” which appears in the Recorder every other Saturday. He is also a contributing writer for Green Energy Times. Questions and comments are invited at john01370@gmail.com.

]]>