The Legal and Ethical Implications of Web Scraping for Marketing
Although widely popular and openly used by many businesses, web scraping is a controversial topic. For some, it's synonymous with personal data gathering and misuse. Meanwhile, others consider data scraping an effective Big Data analytics tool. New political and lawful regulations like CPRA (California Privacy Rights Act, 2023) define the rules for gathering and using online data.
Simultaneously, businesses state their data-sharing policies in Robot.txt files and provide APIs for more transparency. It's essential to adhere to these rules to remain within ethical boundaries if you choose to scrape online data and use it for marketing. Internet users don't take kindly to being bombarded with irrelevant ads without consent, and you could damage your business reputation if you do so. Here's what you need to consider before data scraping:
A) Public VS Private Data
One of the most important issues is the difference between public and private data. When you visit a website like Booking.com, you see public information like hotel prices, availability, conduct policies, etc. Gathering such data is usually perfectly legal unless the website clearly states otherwise in the Robot.txt file. Moreso, even if they attempt to deny access, you can still gather such data because it is public, however, it might land you in a lawsuit like in the Ryanair VS PR Aviation case.
Gathering private data is an entirely different thing. If a website asks you to log in with a username and password to access specific information, using scrapers to gather it can get you in serious legal trouble. For example, you cannot scrape user profiles that are available only for registered people. The same logic applies to user reviews, government databases, pictures, social media posts, etc.
Ignoring these boundaries and using private data for marketing is both unlawful and extremely distasteful. Internet users consider it a violation of their privacy. Meanwhile, other businesses deem in uncompetitive behavior, and, in the end, no one wins.
B) Statistical VS Personal Data
No less important is the difference between statistical and personal data. As you might have guessed, scraping the latter crosses ethical boundaries and could result in legal troubles. One of the biggest data-gathering scandals of the previous decade – Cambridge Analytica – relates to personal data gathering. The company exploited vulnerabilities in the Facebook platform to gather over 87 million user profiles.
The data was later used to target them with political ads that were skillfully crafted to shift their political affiliations towards the Republicans in the 2016 US presidential election. Two whistleblowers disclosed these shady practices starting an avalanche that didn't stop for several years, and the ramifications of this strongly unethical case are still felt today. Cambridge Analytica went bankrupt shortly after.
Although scraping personal data for marketing can improve your product placement, the risks outweigh the benefits tenfold. Companies that get caught using personally identifiable data (PII) face heavy fines for breaking GDPR (General Data Protection Regulation) and CCPA (California Consumer Privacy Act) rules. And the damage to the company's reputation can be astounding. Here's a list of what's considered PII:
a) Date of birth
b) First and last name
c) Address
d) Phone, social security, and passport numbers
e) Credit card number
f) Medical data
g) Marital status
h) Employment details
i) Email address
j) IP address
Some businesses find ways to utilize such data by anonymizing it. Before using it for analytical tools and marketing campaigns, they strip the data of any personal details that could identify a specific person. The process must be complete because if there's a possibility to create a real person's profile, you will breach the laws. Generally, it is advisable to refrain from gathering personal data at all.
C) Copyrighted Material
Lastly, you should always be respectful towards copyrighted material. Copyrighted business content is:
a) Marketing strategies
b) Design layouts
c) Proprietary software
d) Patents
e) Trade secrets
f) Business plans
g) Customer records
h) Logos
i) Pictures
Scraping such data is extremely uncompetitive and will most likely result in a cease and desist order and lawsuit if you continue doing so. Furthermore, you cannot gather such data even if it is publicly available. For example, businesses use their company logos where they are most visible, but if you copy them in any way, you will breach the laws.
D) How to Ethically Scrape Web Data?
There are straightforward rules if you want to remain within the ethical lines while scraping online data. Firstly, follow the discussed rules and ensure the data you gather is public, statistical, and not copyrighted. Inspect the Robot.txt file to know the website's policies. Sometimes even public statistical data cannot be gathered and utilized for marketing, although limiting access to publicly available data is contradictory, as the HiQ VS LinkedIn case proved.
It's essential to respect a website's performance when scraping data. Scrapers make hundreds, if not thousands, of information, pull requests, which could overload the website and make it crash. Ensure that your scraper uses secure proxies that will not compromise the website's privacy and won't exceed its request limits and slow down the performance. Remember that website owners have numerous tools to inspect incoming traffic, and if they notice your operations are hurting their own, they will take action against you.
Web scraping is also not the only way to retrieve Big Data. Some business models rely on information sharing. For example, weather forecast agencies want their research exposed as much as possible, and news sites happily integrate such content into their structure because it's useful and attracts more users.
In this case, you can use APIs (application programming interfaces). It's software that is designed to share data between two consenting parties. You can discuss the information-sharing rules with the data owner, and they can include you in their API user list.
Final Words
In reality, almost all businesses scrape data, and all Big Tech companies do without exception. For example, Google pays fines for privacy violations rather than part ways with personal data. Don't take them as a good example. If you follow the discussed ethical practices, you will receive just as many benefits for your marketing campaigns without damaging your reputation or spending extra on lawsuits.