By Ron Kol, CTO at Bright Data
Web data collection, perhaps better known as web scraping, has become a ‘go-to’ strategic move for brands across multiple industries. It is helping companies, businesses, and organisations remain relevant, competitive, and even thrive in their market or industry. However, as with all emerging technologies, there has been an air of suspicion around data collection, and this continues to be the case as the sector grows in importance for brands in e-commerce, travel, cyber security, and more.
Web data collection has become a vital commodity, especially in the past year as the world reacted to unprecedented events. Many organisations learned firsthand that web data is really the only data that reflects our market in real time – no other report or analysis can do so. But this great resource is not available to everyone in the same way. To put it simply, the web is not transparent, and due to companies’ interests in their competitors, some public information that should be available to all is blocked. This is where web data collection comes into play – and in full force.
While it’s clear to see the business case for web data collection, many still feel the sector operates in a grey area. Therefore, especially as the industry continues to thrive, it’s important to dispel some misconceptions that have been associated with web data collection in recent years.
Let’s start with the no. 1 myth:
#1: Web data collection (or web scraping) is illegal
Web data collection on its own is not illegal. In fact, US courts have repeatedly sided with data collection companies. Take, for example, the case of hiQ Labs, Inc. v. LinkedIn Corp. and Genius Media Group Inc. v. Google LLC and Lyricfind. However, legal web data collection practices are dependent on following comprehensive compliance processes.
Startups, SMEs, and large organisations all engage in online data gathering to tap into competitors’ trends, conduct market research, and do inquisitive analytics on their data. The overall intention is to discover new opportunities for innovation and growth and to ensure that an organisation does not miss out on opportunities.
As with all processes, it is vital that businesses follow compliance regulations, and if their data collection is outsourced, they must always work with their data collection provider to ensure that the operation is legal and ethical. To avoid any doubt, businesses should work with providers to understand what can and can’t be collected both from a legal and ethical standpoint. There is still a lack of regulation in this area, which places a moral burden on all parties involved to ensure that their data collection is within the existing guidelines and is used for the greater good. If not, their plans must be re-evaluated. Failure to do so would be unethical and could mean laws are being broken.
#2: Web data collection hurts businesses and destroys their competitive edge
Quite the opposite is true here. Web data collection provides you with the transparency you need when accessing the internet. It allows all market players to openly compete by simply providing them with accurate market research information. For example, if Company A wishes to set its own pricing strategy in motion, to do that it would obviously need to be aware of the special offers or pricing of one of its main competitors, let’s call them Company B.
In the old days, Company A would send out ‘mystery shoppers’ who would manually take note of Company B’s offerings and pricing. Then Company A would adjust its own accordingly to make them more attractive for consumers. Today, our shopping ecosystem has clearly gone digital, and these ‘mystery shoppers’ have simply shifted into online data collection, which provides a company with the information it needs to decide on its pricing strategy or special offers. Online data collection ensures that companies can effectively compete and continue to attract their target consumer base.
Businesses benefit from the ability to openly compete, and the consumer communities’ benefit from better offers, cheaper pricing, and an improved shopping experience. Therefore, web data collection drives forward an openly competitive market – and promotes overall information transparency.
#3: Web data collection may be legal but it’s not ethical
This depends on the web data collection provider. Data, and even public web data discussed here, must be treated with the utmost sensitivity, integrity, and professionalism. Done right, which means following international regulations and clear ethical guidelines to preserve our digital ecosystem and users’ data privacy, then you are ensuring that you are legal and ethical.
Public online data collection simply provides you with the same internet transparency that an average user enjoys. There are obvious risks and critical requirements you must address to confirm that you are conducting your data gathering in an ethical manner. These requirements are not optional or a ‘nice to have’ addition to your company policy; they are a critical necessity that all operators must abide by – without exception.
#4: Most data sources are considered private
This is incorrect as the vast majority of web-based data is public. Internet growth statistics from Statista show that 4.66 billion people are using the internet (as of January 2021). That’s close to 60% of the world population. Considering that the majority of the world’s data has been generated within the last two years alone, it is estimated that close to 70% of the data being generated is public (out of which, humans are responsible for close to 60% of that generated data while the other 40% is generated by machines). Although these statistics only give us a rough indication, the trend is clear to see.
When it comes to data collection or web scraping, providers can only collect what is publicly available. To simplify this even further, that would be anything you or I could find on the internet through a regular browser without having to log in to view content. If you have to log in, then it is off-limits.
#5: Only ‘shady’ companies engage in online data collection
Almost all companies of all sizes, from Fortune 500 firms to startups and SMEs, gather and utilise public online data. The only difference is in the type of data they require and how frequently they need it. In today’s real-time economy, companies can’t thrive without being able to see the full market reality, and to do that you need access to the largest data source. When our reality is mostly led by digital innovation, it is no surprise that public web data has become the ‘no brainer’ solution.
As CTO of the business leading the data collection domain, it may be of no surprise that I am fighting its corner. However, for this industry to thrive, we must answer to our harshest critics and ensure that our own company and those looking to collect data aren’t tempted towards illegal or unethical activities in lieu of strict regulations.
With any emerging technology, especially within the data space, there is always going to be analysis that explores the purpose and legalities of web data collection. Web data can and should be used for the greater good, allowing businesses to thrive from the latest, publicly available online insights. When analysing data collection providers, it’s important to understand what is being collected and how it is being collected. With so many leading brands dependent on data insights, web data collection will continue to be a fast-growing industry, and it’s up to everyone in this community to drive legal and ethical compliance.
Brought to you by