Print

Print


Colleagues, Geo-Librarians, Curators and Archivists,

Now that you have a taste for deep learning and neural nets....

I want to let you about the newly created Machine and Deep Learning in Libraries and Archives Research Group that has been formed under the American Library Association (ALA) and the Library and Information Technology Association ( LITA). The group has been created to keep archivists, librarians and curators informed about the latest research in machine and deep learning and to help foster and fund large and small scale projects that employ these rapidly emerging techniques.  As a collaborative, between academic computer science departments, cutting edge information technology companies and the archival/library community, it will attempt to look into the future and it will explore new applications for high level computing, machine learning and artificial intelligence, in Library, Museum and Archive environments. You can join at http://www.ala.org/lita/about/igs/machine-learning​ .

As most of you know, Machine and Deep Learning techniques have become critically important in extremely diverse areas of research and have led to breakthrough results in numerous machine learning tasks, such as the super human classification of images in huge data sets, in providing the framework for unsupervised control-policy-learning in the mastering by computers of sample human tasks, like Atari games. It has also managed to defeat of the world champion, in the complex and computationally intractable, game of Go, a decade before computer scientists thought possible. Machine learning and artificial neural networks of diverse forms also lay at the foundation of the various kinds of natural language processing, classification and automated translation systems that have appeared in the last few years.

Large collections like those at the Library of Congress are especially suited for these kinds of applications. As Jer Thorp, the Library of Congress’ current Innovator in Residence stated in an interview a few weeks ago, “The Library’s holdings, particularly the Prints and Photographs archives, are really ripe for exploration using some of the artificial intelligence/ machine learning techniques that have been developed over the last two years. This is a phenomenally interesting field that is advancing in staggering leaps and bounds, and I think it’d be great to see what could be done by computing across the whole archive. It’d be a technical challenge to get these archives in one place (10M+ images! 7M maps!) but I think some amazing things could be done.”

Recently there have been many successful projects in archives, museums and libraries that employ these techniques, such as the notable and recently published use of neural networks by the Smithsonian’s Museum of Natural History to identify and classify millions of digitized herbarium specimens.https://www.smithsonianmag.com/smithsonian-institution/how-artificial-intelligence-could-revolutionize-museum-research-180967065/ .

Other major projects that have a more geospatial bent are exploring techniques for toponym matching in large sets of digitized maps using deep neural networks http://eprints.lancs.ac.uk/89480/ and complex OCR at the Vatican Library https://www.theatlantic.com/technology/archive/2018/04/vatican-secret-archives-artificial-intelligence/559205/.

As these techniques, and the algorithms and theory that lay behind them, begin to make their way into large and small data projects in libraries and with the birth of the newly formed field of computational archival science http://dcicblog.umd.edu/cas/ieee_big_data_2017_cas-workshop/ , this group will provide an avenue for researching potential applications in library science, including a forum for discussion, publication and outreach to the wider Library community. Part of its goals will be to educate librarians on uses of the complex techniques of machine learning and to provide a space for critically thinking both about new applications, and about the ethical and social impact of these technologies, as the field rapidly expands into libraries and archives in the coming decade.

The new group is chaired by John Hessler, a Lecturer in Quantum Information Theory & Computing at Johns Hopkins University, who is also a specialist in Mathematical Cartography & Geographic Information Science at the Library of Congress. The group’s co-chair will be David Lacey​, Director of Library Technology & Knowledge Management Systems at Temple University.

The group will host a series of workshops at the annual ALA with Google’s new and open source deep learning software TensorFlow https://www.tensorflow.org/ . The sessions will be both introductory and hands-on and as the title suggests, “Deep Learning and Neural Nets for Librarians and Archivists," are meant to give an overview of these technologies and techniques to the less computationally aware members of the Library community. Also proposed is a roundtable that will feature Google’s Deep Mindand Stanford’s NLP Lab to talk about current and future applications of ML &DL in archives, museums and libraries. Finally the group is looking, in mid-2019, to begin publication of an on-line journal through the IEEE that will focus on computational archival science, visualization and machine learning in archives and museums.

All the best.​


John Hessler, FRGS

Specialist in Mathematical Cartography &
Geographic Information Science

Geography and Map Division
Library of Congress
Washington, DC
202-707-7223
[log in to unmask]<mailto:[log in to unmask]>

Twitter: https://twitter.com/topology_lab
Center for Open Science Page: https://osf.io/zt8c4