15 March, 2013

Bibify all my PDFs

This week I wrote a nice little script that scans all my PDF files and spits out a magic bibtex file with their meta-data. Each entry in the bib file will provide a rough description of the PDF and two useful links:
  1. a google search link for the paper and 
  2. a local file link to the actual PDF.

This allows me to create quick and dirty draft papers and presentations with clickable references without the need to manually maintain any bib files or using any document managing tool.

Check out my GitHub page for the tool to read more!

Why did I write it?

I do not like big tools, enforcing some working pattern on my daily research. I tried some document managing tools before but was never happy.

I only use Freeplane to manage all my knowledge. This includes managing papers as well. Mind maps are just nodes and edges and everything can be easily restructured. That is very important to me.

When I read a paper I add a new node in my "" mind map, give it s short title and then summarize what I found inside and what the paper "can do for me". I also link the node to the PDF.

But now and then I need to discuss my findings with others, writing paper drafts and slide-ware. Since I use LaTeX/LyX for these tasks I need to create BibTeX files for my documents. This can be a lot of manual work, especially if you throw out many of the discussed references later on when the content becomes mature. I also have to dig out the referenced papers during the discussions.

Thus I realized that I need
  1. An automatic bib file for all papers in consideration
  2. Clickable links in my own papers and slides
And that is what the tool does. Crunch all PDFs to extract some words as "title" and add some "href" links in the BibTeX "note" attribute. It is quick and dirty but it works as expected. Either way, for a final paper I would eventually manually hand-craft a custom bib file anyway.

PS: The script also works in Windows via the bash provided by Git.