As a nonprofit news venture you may only be thinking of the work you have to do today or tomorrow but do you also think about preserving the work your staff produces? Do you have an archive of all the stories you are publishing? What kind of digital storage do you have? What is your backup strategy for your digital content? If you have not thought about this, now is the time think about how to archive and preserve your news content.
Archival and preservation of news content is nothing new – print newspapers have had “morgues” for years. These “morgues” hold the print editions of the paper and other related materials (transcripts, photos, maps, documents, etc.) important to the print operation for future access and historical purposes.
In this digital age, archiving your organization’s work should be top of mind.
As a nonprofit news venture, you may need to report to board members, foundations and donors about your operations. It can be a definite plus if you can tell them what your archival or preservation strategy is for your news content. In addition, if you have an archive of your news content, you could potentially use that as a revenue generator by allowing the community to access your organization’s digital archives for a nominal fee.
As you think about your archival strategies, please consider these following questions:
- What storage capabilities do you have for hosting your multimedia and news content? This can determine what kind of space you will need to hold all the content your staff produces and plan for the future as your news content grows over the months and years ahead.
- What kind of content management system do you have? This can determine how much and what kind of content crawler tools can gather from your site.
- How often do you backup your news content? If you are not doing this often, you may be putting your content at risk. Having a backup plan is highly recommended.
- If you have your information or content hosted in the cloud, do you know what your cloud provider’s plan is for disaster or migration? If you don’t, you should find out so you can be prepared.
- How can you preserve your content? It’s one thing to archive and store your content, but if you don’t have a method by which to keep that content in its original form and accessible to anyone, you should think about tools to help you with that step.
Here are some archival tools that you can consider to help you keep your past work alive:
Heritrix is a tool that the Smithsonian Institution Archives, the Internet Archive, the Library of Congress and several European libraries use. It’s an open-source, web crawler tool. It does require some technical skills to install the tool but it can provide you a way to archive your website. If you want to view what the web crawler has found, you will need a viewer like the Wayback Machine.
It should be noted that the web crawlers are not always able to capture all dynamic content, so you should identify other methods to capture this kind of content. This may entail additional digital storage and backup strategies.
Another tool (not yet released) is WARCreate. It’s developed by Mat Kelly. According to the website, “WARCreate is a Google Chrome extension that allows a user to create a Web ARChive (WARC) file from any browseable webpage.” The tool is not yet available but you can email Mat Kelly (information on his website) to be notified when the tool will be released.
Aside from your general news content, are you archiving your organization’s tweets? If not, there is an option for that – Grabeeter. It is a an application that can search tweets of Twitter users and export the tweets into JSON or XML format. This tool may be of particular importance when you have a major news event and you would like to archive the Tweets for future use or archival purposes.
What about videos? Do you have a method to archive your videos? If your videos are featured through the YouTube channel, TubeKit may be your tool. TubeKit was created by Chirag Shah, Assistant Professor in the Dept. of Library & Information Science (LIS) within the School of Communication & Information (SC&I) at Rutgers University. The tool was developed from a NSF Grant that Shah received. It is a tool that crawls YouTube videos by a set of specific queries with specific attributes. There are two options – to create a crawler to find videos on YouTube or collect data from YouTube without crawling.
If your news organization has a Facebook page, you need to be careful about what you archive. Facebook forbids crawling without permission. If you want to be able to crawl your own Facebook page, you have to fill out a form and submit it for approval and that approval process can take a while.
Many large news organizations do have archival strategies and enterprise-wide tools to preserve their news content by working with an outside vendor. Here is a list of just a few of those companies that work with news organizations. Pricing can vary based on the services you require:
Here are some other resources worth checking out related to digital preservation of content:
These are just a few tools to help you get started. Archiving and preserving your news content should not be an option but a requirement – that will allow your organization’s important work to live on for years to come.