How to organize scientific literature?

Introduction

A huge burden for a scientist is the organiation of their literature. I, as a physicist, have over 1000 articles that I used in my work along the history of my career in physics. Besides many, many books that I also would need at any second while working.

How can I organize all those books and all those articles?
How can I reach any piece of information I need whenever I need it without wasting an hour digging in my collection of articles by filename?

This question has been going in my head for a long time, and I started a few months ago to dig for an answer. I was thinking of writing a php program that organizes my literature with copies of literature inside it and with bibtex support, but guess what… a solution with those features is already there! I just found that there are many, many organizations that have written such programs. Examples are Mendeley and Zotero.

There are more, but I only tested these.

Both are great, but…

Both programs Mendeley and Zotero are great when considered locally on your computer! They’re so good that they can take your PDF, read it, index it, and automatically find reference details for you such as Author list, DOI, ISBN, title, and many others and make this data available when you search for something and simply display the article when you click on its information. You can even add a note to some article, attach it to the article and make it reachable when needed through the note. This is extremely helpful when you want to mark a specific piece of information in some literature, making it not necessary to dig in all your literature.

I started with Mendeley, I was happy with it, until I realized that I also need a synchronization mechanism among my computers at home and at work. Mendeley is a commercial software, it offers very limited diskspace (about 2 GB), and stuff are stored on their server, making it a problem for some people due to corporate policy, which prevents them from storing work information on 3rd party servers. Besides, even if that’s allowed, 2 GB is not really good enough. That’s why, I left Mendeley.

Then I found Zotero, which is technically a program derived from Firefox. Zotero creates a local database folder on your computer with local copies of your literature inside it. You can copy the database wherever you want and even move it between computers. Then you can easily choose what directory you want to use to store your data. This is SUPER CONVENIENT! It’s very practical for making backups of everything, and not losing anything on long term use. There’s even an option for synchronization with the web. They offer also little diskspace (about 1 GB), but then if you pay 10\$, they give you infinite diskspace, which is convient, and gives you the option to synchronize your literature among multiple computers.

What if corporate policy prevents me from storing literature on a 3rd party server?

Actually it’s not only corporate policy that prevents me from doing that. What prevents me, too, is that I’m not convinced that such a service is worth 10\$ per month, especially that I own a linux server, for which I pay 30\$ per month with 1 TB diskspace. Does it make sense to pay 30\$ for a full-featured linux server and 10\$ just for literature? Not at all!

Proposing a solution to avoid 3rd party cloud intervention

The solution is very simple, and I could implement this solution due to the nice way Zotero stores data. Since Zotero stores data in a single, defined folder, all you have to do is synchronize this folder among the computers you have to use! A method to do this is by using a repository system, like GIT, which I find not convenient, since I manually have to commit, push and pull every change. So the better method I found is a synchronization system driven from my 30\$ server, called Seafile.

Seafile is an opensource cloud system (similar to Dropbox) that can be run from your own server! It uses client-side encryption and is the safest I know and most recommended, so far. I have been using it for all my work and data, and I find it very convenient. So, all you have to do is synchronize your Zotero data folder among the PCs you want to use.

If you don’t have a server for yourself, simply use some 3rd party cloud, like Dropbox, which will anyway give you more diskspace than the standard Zotero cloud offers. However, you’re, again, limited by diskspace eventually. In case you need more diskspace, I really recommend that you rent (or probably buy for your home) your own linux server. You learn a lot, and you save a lot of money and you can use it for multiple purposes for yourself and your family.

Or… you could use servers from your own institution, which are normally offered at good universities (normally universities offer free diskspace for employees and students which is globally accessible or at least through a VPN service).

Risks?

There’s some risk when doing this, but it’s not that bad for a reason. The main risk when using this method is that you could open the same Zotero database from different computers. I’m not sure whether just opening from different computers would create a problem, but I could almost be certain that if you make changes on different computers simultaneously you’ll induce a problem if your cloud tries to merge databases. However, it’s not that bad, because cloud systems usually create full history of your files with a revision for every change you make, meaning that if your database files (files with extention *.sqlite in Zotero) get corrupted, you can always roll back to a previous version and have zero losses.

Conclusion

You can create a very good and reliable scientific literature database system using Zotero, and a cloud. This is a perfect solution for personal literature. However, I still don’t have a solution for groups that won’t involve storing data on a 3rd party server.

PS: It could be possible to use your own server to synchronize Zotero database as if you would be synchronizing with the official Zotero server. However, this would involve recompiling the source code of Zotero with your server address, which, I think, is a huge burden. This depends on whether your group wants to be commited to such a solution.

Leave a Reply

Your email address will not be published. Required fields are marked *

*