老司机直播

Librarian Sam-chin Li says 老司机直播 helped prevent a digital dark age of government information (photo by Noreen Ahmed-Ullah)

Harvesting the government web space

老司机直播 librarians step in to preserve electronic information

At a time when more and more government publications are online, 老司机直播 librarians have stepped in to start archiving government websites.

老司机直播鈥檚 collection is considered among the most extensive and accessible collection of online captures of government websites in the country, and the university鈥檚 efforts are critical because they鈥檙e preserving information 鈥  and in turn keeping governments accountable 鈥  in an era when the documents are no longer available in print and always changing on the Internet.

Until recently, federal agencies would send Robarts Library print copies of government publications to preserve them and make them available to the public, thereby encouraging active citizenship. But that program ended in 2014, and no new initiatives have followed for the archiving of government websites.

鈥淕overnment documents, government information, things like annual reports, statistics, are material we help researchers find on websites,鈥 said Nicholas Worby, who is in charge of web archiving at University of Toronto Libraries. 鈥淭he control for their preservation and curation is out of our hands, but we have a huge stake in making sure we have access to this stuff. This is an effort to rescue those documents and increasingly it鈥檚 becoming more of a means of extending our job into a born digital world.鈥

Those and other issues will be part of a discussion this week when Robarts Library hosts an , bringing together researchers, archivists and librarians from around the world to begin charting unchartered waters. They鈥檒l be developing open source tools and methodologies for working with web archives.

Currently, 老司机直播 Libraries is archiving material from the federal government鈥檚 website, and has begun collecting content from provincial websites. They are also working with the City of Toronto Archives to capture parts of the Toronto municipal portal.

老司机直播鈥檚 collection includes captures of about 200 Canadian federal government websites from the end of Library and Archives Canada鈥檚 web archiving program in 2007 as well as archives of 60 sites from the Ontario government web domain and 7 sites for the city of Toronto. 

The effort began three years ago when 老司机直播 Government Publications and Reference Librarian Sam-chin Li and librarians from other universities discovered that the Harper government was shutting down the Aboriginal Canada Portal site within a week. 老司机直播 librarians rushed to figure out how they鈥檇 capture the online information. They consulted fellow university librarians. They learned how to use web-harvesting software and then worked into the night to crawl part of the site.

 

 

 

 

 

 

 

 

 

 

 

 

That was followed by another shock a few months later 鈥 a leaked document showing that more federal government websites could be terminated or at least 60 percent of their content reduced.

It was a wake-up call to 老司机直播 librarians. They needed to begin archiving the website content themselves.

鈥淚t was going to be a digital dark age of Canadian government information, of what we were going to know about our government,鈥 Li said. 鈥溊纤净辈 filled that gap.鈥

Today, Library and Archives Canada says it is capturing content on federal government websites, but for the past two and a half years the sites have not been made publicly available.

鈥淲e keep asking them to send information about what they have captured so we can fill the gaps, but it鈥檚 still a big unknown to us,鈥 Li said. 鈥淲e can鈥檛 stop doing our job because we don鈥檛 know what they have done. We have students coming and asking for information. We can鈥檛 say you have to wait for Library and Archives to share the information.鈥

Worby, who is now the Government Information and Statistician Librarian, was a grad student in the Faculty of Information when 老司机直播 began the rush to harvest the sites. He says the government records are crucial for researchers. However, not every page on a government website is getting captured on a daily basis. Most of the time, the university does broad crawls semi-annually and captures media release pages every evening.

鈥淚t terrifies me, but I know we鈥檙e never going to archive everything,鈥 Worby said. 鈥淚t鈥檚 impossible to capture everything with web archiving. Having at least some fragmentary pieces of historical memory is still better than not having it.鈥

The 老司机直播 collection also includes campaign and party websites for the recent federal and the Toronto mayoral elections. 老司机直播 Scarborough Principal Bruce Kidd offered advice to the librarians about archiving Toronto 2015 Pan Am/ Parapan Am Games sites, so as to collect documents from host cities detailing the planning and experience of the games throughout the GTA.

鈥淚t is really important that documentary records of major games be kept,鈥 Kidd says. 鈥淚n the growing field of international scholarship of major games, mega events and international sport, the Olympics is fairly well covered and documented. The Commonwealth Games less so. But the Pan American Games is really a big dark hole. I thought that one of the legacy contributions of Toronto 2015 would be to leave a good documentary archive on the games. This will benefit researchers in a variety of fields for years to come.鈥

The Bulletin Brief logo

Subscribe to The Bulletin Brief