Use Case Civil War Data 150
Back to Use Cases & Case Studies page
Civil War Data 150
Jon Voss, Civil War Data 150 Project Manager. San Francisco, CA. firstname.lastname@example.org
Background and Current Practice
Civil War Data 150 (“CWD150”), is a collaborative project to share and connect American Civil War related data across local, state and federal institutions during the four year sesquicentennial commemoration of the Civil War, beginning in April of 2011. Currently, libraries, archives, museums, and individual researchers make an enormous amount of data pertinent to the Civil War available on the web in disparate databases. During the Civil War sesquicentennial commemoration, there is a renewed effort on the United States federal and state levels to digitize and release even more photographs, journals and ephemera related to the Civil War.
During the initial phases of this project, CWD150 will work with institutions to publish Civil War related collections metadata as structured data with CC-BY or similar licensing, moving in later phases toward the publishing of metadata in linked data formats.
(1). Researchers and the general public alike will be able to discover information about the American Civil War from across multiple institutions, and incorporate that information into their work in new and exciting ways.
(2) CWD150 will use linked data technology to create connections based on the strong identifiers and taxonomy of the Civil War, particularly the regiments, battles, battlefields, officers, and soldiers and sailors.
The various phases of this project have different target audiences, including: Local, State and Federal level archives and libraries Middle and High School students and teachers Civil War enthusiasts Civil War scholars Genealogists Digital humanities research centers Software developers and data visualization experts
Use Case Scenario
A researcher, Alex Artis, would like to create a visualization of the troop movements and engagements of the 102nd Regiment, United States Colored Infantry, as well as a corresponding timeline showing troop casualties. Alex first needs access to the history of the regiment in some sort of structured data format, including dates, places, and events. Then Alex will need a table of regimental casualties by engagement, if one exists. Finally, with this data in hand, Alex can use Simile or other tools to create a visualization of the history of the regiment, and may even use some photographs of soldiers or battle maps of engagements to personalize the presentation.
An enormous amount of data about the American Civil War now exists online in digital form. Frederick H. Dyer’s Compendium of the War of the Rebellion, published in 1908, gives us perhaps our most exhaustive resource of data culled from original sources. The Perseus Digital Library Project at Tufts University makes large parts of that work, including the regimental histories, available to view and download as structured data (XML) with a CC-BY-NC-SA 3.0 license .
The works of individual scholars, as well as the holdings of state and federal libraries and archives, add to the collection of sources relevant to this particular regiment.
By aggregating these diverse data sources and performing vocabulary alignment to an ontology specific to the American Civil War but applicable to a broader military schema, it becomes possible to query information about a particular place, regiment, battle, or officer. The results would provide our researcher with structured data that would also link to the source documents at contributing institutions for further research and citation.
Existing Work (optional)
CWD150 is still in the early stages, but our website has more detailed description of our proposed collection, aggregation, and vocabulary alignment processes (www.civilwardata150.net).
Related Vocabularies (optional)
We’ll be utilizing the military vocabulary specific to the American Civil War, including regiments, battles, battlefields, officers, etc. We plan to work with the United States National Park Service to create an ontology from their Civil War Soldiers and Sailors Database, with a crosswalk of corresponding identifiers in Freebase, DBPedia, and potentially Library of Congress Subject Headings (LCSH) (though subject headings of granular detail are not at this point available through the id.loc.gov project).
Problems and Limitations (optional)
One of our first steps will be to provide an ontology based on the National Park Service Civil War Soldiers and Sailors Database , and provide persistent URIs for all of the key entities. This source is drawn from the aforementioned Dyer Compendium and is generally considered authoritative by Civil War scholars. However, this represents no small technical hurdle from the get go. It’s entirely possible that we might utilize dbpedia/Freebase identifiers as a proxy until such time as we can point to the National Park Service as a source.
Like any other linked data project, copyright and licensing issues will be paramount. Whether data is made available with or without a non-commercial license makes an enormous difference to how such data can be used. Our work with researchers, database managers, and content providers will initially focus on copyright and licensing issues, examining the available options and educating about the often unintended consequences of choosing one over another. It’s also important for us to differentiate between the licensing and use of metadata about digital assets, and the assets themselves.
Civil War Data 150: http://www.civilwardata150.net
 Perseus Project (102nd USCT from Dyer Compendium): http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A2001.05.0146%3Achapter%3D49%3Aregiment%3DUSC102
 National Park Service Civil War Soldiers and Sailors System: http://www.itd.nps.gov/cwss/
 Wikipedia entry on 102nd USCT: http://en.wikipedia.org/wiki/102nd_Regiment_United_States_Colored_Troops
Video description by Kris Carpenter Negelescu, Director of the Web Archive at the Internet Archive (CNI: Linked Open Data: The Promises and the Pitfall, 26:30 in, for about 18 minutes)