Media hosting | Web page archiving

Media hosting | Web page archiving

By Cat woman | Sustainable Tech | 25 Jun 2021


The Internet Archive

The Internet Archive is a 501c3 nonprofit digital library of millions of free

  • documents (e.g., books, articles)
  • images
  • videos
  • software
  • audio (e.g., music, audiobooks)
  • archived websites


It was founded by the computer scientist and digital librarian Brewster Kahle in 1996, with a mission to provide universal access to all knowledge.

Anyone with a free account can upload media content


What is it to archive a web page?

  • It is to save a text and graphical copy of the page as it appears online at a particular time instance, providing a reliable link to an unchangable record of that page at that time.
  • All links within the webpage to be archived will be automatically archived.
  • Webpage archiving allows for reliable citations of online content, i.e. you can cite anything without fearing that the page will be removed or modified.
  • Typically, web page content that relies on user authentication (signing in) cannot be archived.
  • As of today, more than 582 billion webpages were saved from millions of websites using Internet Archive's tool Wayback Machine:
    • At the top of the page you can browse past copies of a certain page by entering its web address in the search box.  
    • At the bottom right of the page you can save (archive) "now" any page (alternatively, enter the web address you wanna save on


A centralized public storage

If you sign up to Internet Archive you´ll be able to store virtually any media type you want. However, its important to bear in mind the following:

  • You cannot store content privately – anything you upload can be accessed/streamed and downloaded by any user (even without an account).
  • Internet Archive has the right to remove any uploaded content, at any time, just like in any other centralized app.   


Why am I promoting it?

The fact that it is centralized, i.e. owned and regulated by a single entity, is a huge disadvantage to me, since all content is way more vulnerable to censorship and hacking. Even though, I decided to promote this app due to the following:

  • I´m happy with the way I´ve been treated since I signed up and started uploading content (mid 2020).
  • The Internet Archive advocates and is helping build a decentralized web (article, video): 

Contrast the current Web to the internet – the network of pipes that the World Wide Web sits on top of. The internet was designed so that if any one piece goes out, it will still function. The internet is a truly distributed system. What we need is a Next Generation Web; a truly distributed Web.



Internet Archive began a program to digitize books in 2005, and today they operate in 18 worldwide locations and digitize around 3500 books/day.

Books published prior to 1926 are available for download and hundreds of thousands of modern ebooks can be borrowed on Open Library.


Citing TV news

Internet Archive began archiving TV programs in late 2000. Its first public TV project was an archive of news surrounding the events of September 11, 2001.

In 2009, they started a service to browse US TV news broadcasts by captions, thus making it possible to use television as a citable and sharable reference (captions used were provided by automated speech recognition software, not the broadcaster).


Internet Archive Scholar (beta)

This is a recent service, still in its public beta version, and Internet Archive warns (these are some known issues):

Metadata is being improved and features have not been finalized.

The service is a competitor to Google Scholar and can be accessed on


Its a search engine for millions of full-text scholarly publications archived in/by the Internet Archive, spanning from 18th century journals through the latest Open Access eprints. Interesting features so far:

  • Publication reference generation in several styles.


I´m 🙏🏼 for a great citation tracker to be added to this app in the near future.

As of today, the most comprehensive one is provided by Google Scholar 😔 (it´s the only Google product i still use, just to track citations to my papers).



The Webpage Archive

Alternative to Internet Archive's Wayback Machine


A great alternative to Internet Archive's Wayback Machine for web page archiving is, and if you install the browser extension you´ll be able to archive any webpage with a simple click.

I typically avoid extensions but this one is quite useful (you can check its permitions and further settings by right-clicking its icon and selecting "options" on Brave browser).

How do you rate this article?



Cat woman
Cat woman

Other addresses for tiny donations 🔗 🤍

Sustainable Tech
Sustainable Tech

All about my favorite apps for multiple software categories. None of them is owned by "Big Tech" (big corporations like Google, Amazon, Facebook, Microsoft, Apple, Twitter, Zoom, etc.). It is (to me) not sustainable to feed billion/trillion-dollar companies whenever there are effective, more ethical (right to privacy and to freedom of expression) and less wealthy firms in the market. Empowering monopolies increases social inequalities and abuses of power.

Send a $0.01 microtip in crypto to the author, and earn yourself as you read!

20% to author / 80% to me.
We pay the tips from our rewards pool.