Sirwin
Sirwin
Web3 Scraping

Web3 scraping and why it matters

By CrypptoCat | SKCrypto | 4 Aug 2022


Many of you probably know or at least heard about web scraping. Even if not, then you definitely did it, because copy-paste is, in fact, also its kind.

Technically, this is the process of collecting data from web pages. I won't dive into details, for those who wish there is an article on Wikipedia. I’m going to talk about a general idea of ​​the technology itself and the possibilities of its application in Web3.

TL;DR  Go to the Conclusion directly:)

Brief history and some examples

Scraping appeared almost simultaneously with the World Wide Web when the first web bots were created in the early 90s. Firstly, to analyze the weight of pages, and later to index them.

In the early 2000s, the first Web APIs were developed. They allowed access to certain public data. Today, tens of thousands of products have such solutions. Many Internet resources and databases themselves offer the possibility of parsing their data, whether it be news resources, digital archives or government statistics.

But even if this is not possible, you can create a bot that will collect public data in the source you need, without an interface available specifically for this purpose. Why?

Scraping bots are useful. For example, let's imagine that you own an online store for auto products. You, as its owner, may be interested in tracking prices from competitors. This can be done manually by checking their dynamics every day. Bur you can also use the parser to provide this data automatically in the format you need. With the help of it, your content manager additionally can receive text descriptions of positions from competitors, including SEO texts. How ethical is it? This is a question, in some countries it is regulated by law, but the tools themselves are existed and being developed.

Another example. Let's say you're a brand manager or marketer. One of your tasks might be to keep track of brand mentions. Superficially, this can be done by such tools as Google Trends, or with more advanced ones like Youscan or Mention. They usually cost a lot of money and give you the results in the form of a report once a day. Such services use similar technology, although they do not aim specifically at harvesting (collecting data and downloading it), simply track mentions and present a report with links to them.

Machine learning and artificial intelligence are used to generate more advanced reports. Thanks to this, the report with mentions acquires graphs where the tone of these very mentions is analyzed. So you are able to track whether someone on the Internet spoke well or badly about your brand. 

But natural language processing is only developing as a technology, so you can’t trust such reports completely. Most often they track “positive” and “negative” words, but do not understand, for example, the irony. And if you write about some service as “wonderful” and rate it 2 out of 10, such a robot will break. It's a joke. But here more fine-tuning will be needed; parsing such reviews in isolation from other data will not make sense.

Web3 scraping

You must have heard about the idea of ​​a new Web3. If not, you can read in detail about what it is in this article. Briefly and simply, this is a new round in the development of the World Wide Web. It will be based on the principles of decentralization and privacy, thanks to blockchain and a tokenized economy.

But let's talk about something else now. 2020. Quarantine. There are social media wars between infectious disease specialists, virologists, and all those who are for and against vaccination. Each side gives its arguments, referring to its sources. What are these sources? Where did the information that later went viral originally come from?

You probably already guessed that scraping can help with the search for the answers to these questions. But there is one problem. Due to the large amount of data on the network, it is incredibly difficult to analyze it. It is even impossible to get all the data in real-time. It isn't the same as scraping data from one source. 

And here the idea of how to solve it was born. French company ExordeLabs, consisting of programmers and data analysts, came up with the solution of ​​​​using blockchain and the ideas of the Web3 mentioned above for data scraping. Thanks to the decentralization of the network, tens of thousands of nodes will simultaneously be data validators, not separately launched bots.

But what will make them work together and perform data mining tasks? Here we recall another principle of Web3, the tokenized economy. For their work validators will receive EXD tokens. A protocol will be protected from attacks by a consensus mechanism and will be self-managed by a decentralized autonomous organization (DAO), where token holders will have the right to vote.

Due to the fact that this is a blockchain with a transparent history of transactions, the protocol will not be able to hide any outcomes from the search results. It means it will not be subject to censorship.

Thanks to all this, the developers from Exorde are trying to build real-time data collection! All the same described cases, but with almost instantaneous results. Imagine that when the UST started to lose its peg to the dollar, you would have a tool at hand that could collect all the public information on this in any language you need. Maybe there were hints to the short? Or to close the position?

One more example. Let's say you own an NFT from some collection and start seeing FUD around it. What to do, sell or ignore that info? Yes, in any case, the final decision is up to the person, but the process of collecting data is greatly simplified with available Web3 solutions.

Conclusion

Why should we care about the information above? It doesn't matter if you Web3 enthusiast or a crypto investor, scraping in Web3 has a huge potential. 

Firstly, Exorde is the only one at the moment who declares such a goal. Moreover, they not only declare but are already testing software. You can learn more about and join the test in the official Discord.

Secondly, the product is spotlighted by Coinlist, it adds credibility. At least because CL analysts do not include everyone in their batches.

In general, it sounds promising. TGE is planned for December, now you can participate in the testnet with possible rewards. It takes almost no time. 

Details can be founded here: Website | Twitter | Discrod

 

How do you rate this article?

5


CrypptoCat
CrypptoCat

Crypto Content Creator, Enthusiast, Ambassador and DeFi Believer


SKCrypto
SKCrypto

Blog about promissing crypto projects, their testnets, ambassador programs and other earning opportunities.

Send a $0.01 microtip in crypto to the author, and earn yourself as you read!

20% to author / 80% to me.
We pay the tips from our rewards pool.