Keeping Track of Document Research Data

Today, I have an announcement about a quality time software project of mine:

Many years ago, I started helping a member of the family to better organize the qualitative research data that accumulated during a digital humanities research project. At first, using an Excel file to categorize document references, notes on books, etc., worked quite well. However, over time, the Excel file grew to thousands of cells and 2 MB of pure text data. Some cells suddenly contained several pages of text, and the research project became a multi-person endeavor in which several people needed to work on the data at the same time and from different places. This pretty quickly resulted in unsettling questions such as ‘which of those many Excel files contains the latest version of the data’. And, as in any research project that grows, finding and modifying data in the sheet became more and more difficult as well.

At this point, the situation became too chaotic for my taste so I decided to program a web-based Document Research Database project in my quality time to help out. As others in our circle of researcher friends have found the solution quite helpful for their work as well, I have now open-sourced the project to make it available to a wider audience. The source code is GPL-3 licensed, so anyone can now use it for free by setting up their own instances.

So this is how the Document Research Database project started and you can find an introduction to it and lots of documentation for using, installing, and maintaining it over here. Most researchers who would like to use it probably still need a ‘local IT guy’ to set up and maintain their web database instance. However, getting the database up and running is as simple as installing Docker on a server and running a setup script. Once that is done, the example database fields can be customized, extra users can be created and there’s even an Office macro for importing the latest version of a LibreOffice Calc or Excel sheet into the database. Data lock-in is of course a no-go, so the database can be exported to Libreoffice Calc (or Microsoft Excel if you must…) by any user with the appropriate permission at any time. I could go on and on about the features here, of course, but I’ll let the documentation I’ve put together over the years speak for itself.

So here we go, it’s out in the wild now. There’s no telemetry tracking code of any kind in the software that leads back to me, so if you find this project useful, I am always happy to get feedback via the Git repository or ‘the traditional email’ way.