GIS Meetup & Educational Seminar Summary #5

Presented on: February 26th, 2020.

Presenters:

  • Amy Koshoffer - Assistant Director, Research and Data Services, University of Cincinnati
  • Jennifer Latessa - RDS Student Research Consultant, University of Cincinnati
  • Paula Marques Figueiredo - Research Assistant, Marine, Earth, and Atmospheric Sciences, North Carolina State University

Data Management for GIS Projects video

Overview

The organization and management of data are the keys to running a successful GIS project. This talk presented by the University Consortium for Geographic Information Science focuses on how to plan and document GIS data to ensure that research is reproducible and replicable. The primary audience for this talk is academic researchers, however many of the recommendations are universally applicable to other GIS projects.

Getting Organized

Jennifer Latessa offered the following advice for organizing and working on a GIS project:

  • Plan and document how data can be used for reproducible and replicable research.
  • Keep separate folders for raw and processed data.
  • Label files and attributes clearly, concisely, and consistently: no spaces, do not start with numbers, keep short.
  • Record the workflow process in meta data throughout the project. Most software will not record this automatically.
  • Version your data. Use label 0 for raw data. Label everything in your workflow with names and labels. Use relative rather than absolute paths so that the structure can be replicated.
  • Backup your data with a minimum of 3 copies, on 2 different media, with at least 1 separate geographic location
  • Save your data in various formats to enable sharing and preservation.

Data Management Plan

Amy Koshoffer explained that a data management plan should address. The data management plan should cover what data will be produced in what format. Because there are many different data types and formats, it is important to consider future researchers and use open formats where possible. A data management plan should consider the following five questions:

  • What data will be produced?
  • What standards will be used to document the data?
  • How will the data be protected especially restricted data?
  • How will the data be archived and preserved?
  • How will reuse of the data and access to the data be facilitated?

In addition, the data management plan must address many issues including: protection of intellectual property; creation of readme files; creation of Digital Object Identifiers for citations (DOI); mechanisms for sharing, the choice of subject specific repositories or institution repositories; and the management of consent forms for human data. Human data may need to be destroyed after a project unless consent is recorded.

Knowledge Sharing

Paula Marques Figueiredo shared her experience working on a real-world project including writing a data management plan as part of a proposal to the National Science Foundation. Her main advice was to look for workshops on this subject and talk to successful researchers.

Q & A

The question-and-answer session covered a wide array of topics:

  • Open Science Framework which enables collaboration between multiple institutions
  • Long-term preservation of front-end and web-based components required to access projects
  • Research Data Access and Preservation Association, the UC Data Management Plan tool
  • Need to protect spatial data concerning endangered species and sacred sites
  • Archiving big data
  • Version control systems such as GitHub and Code Ocean

Reaction

It was eye opening for me to hear the challenges that research projects encounter organizing and managing GIS data. I enjoyed hearing the firsthand accounts of professionals working in this field.

I was not aware that librarians were so deeply involved in this area. I was particularly struck by the need to preserve future access to data as technology platforms continue to evolve and change.

Overall, many of the topics that should be covered in a data management plan were somewhat familiar to me from my previous work on database and enterprise systems.


Published

Last Updated

Category

Essay