More than 90 percent of my role is dependent upon a computer with an internet connection. Laws, commentary and cases are all online. Not only are they online but they are just as authorised as the print, and so from a business (and a librarian’s) perspective it makes sense to maintain an online collection rather than a hardcopy one. This is for various reasons, primarily because an online collection doesn’t get lost as easily as a hardcopy one and more than one person can access it at a time.
Wayback Machine – The Internet Archive
The most well-known project is the Internet Archives Wayback Machine. I know this sounds more than a bit nerdy but I think the Wayback Machine is a project of absolute brilliance. I utilise this free product on a weekly, if not, daily basis. The Wayback Machine is beautiful in its simplicity. All you have to do is Insert the url of the website, click go, and it will show you how many point in time snapshots have been made of the site, and you can access them. The comprehensiveness of this project varies. However, you can often obtain access to historical versions of individual pages and also sub-sites and documents.
Whilst the Wayback Machine is the flagship project of the Internet Archive team, the project beginning in 1996, it is not the only project from the Internet Archive team. The not for profit organisation also has collections of texts, audio, moving images, and software.
The Wayback Machine is the most well-known of initiatives but it is not the only one out there. Other free initiatives include:
A range of other web archive initiatives are available on that great example of crowdsourcing Wikipedia
One thing I would like to highlight is that these types of projects are not just limited to the information enthusiast, libraries, not for profit and information professionals. A number of companies have recognised the economic opportunity that capturing the internet holds. As an example Web Preserver (http://webpreserver.com/) has capitalised on the internet as the point of activity to provide authoritative point-in-time monitoring of online content to be used in litigation.
How we can help
The Internet Archive and other like projects employ crawler software to capture pages. Employing a robot text exclusion protocol (or robots.txt) prevents the page from being captured so don’t use it. Whilst robots.txt is regarded as a security technique it’s effectiveness as such is questionable.
Consider how you are capturing and storing versions of your online services. Should you be capturing versions? This versioning can have many advantages, including enabling you to easily show the progress and development of your online services to key stakeholders and decision makers, but to also save your bacon in the event of a server crash. Losing all of your online content becomes less of a headache when you can quickly reinstate a previous version
Nothing is permanent so don’t over invest in a single information product, it will only end in tears.