Wrangling PDFs

It’s more than ten years since I did my Masters, and a lot has changed. One of the big differences is the number of PDFs I have to deal with. In 2002 it was virtually zero. Now, a little over a month into my PhD, I have already started accumulating an unruly herd of journal articles and saved blog posts and they are all in PDF.

My first instinct was to print hardcopies so I could attack them with highlighters and pencils (old habits die hard) but I was fairly sure technology had progressed to the point where I could use a digital workflow; and, as an archivist who has also run sessions for humanities postgraduates on research data management, I should be able to practice what I preach.

So, for anyone who is interested (and as an insight into my slightly anally retentive personality) I’m going to outline my PDF workflow. Before I launch into it, a few things to note:

Though I use a Windows machine at work, more generally I am an Apple user and this is an Apple process.
The workflow I am outlining below requires up to date software, including OS X Yosemite, iOS 8 and iCloud; but I’m running it on a late 2013 iMac and an iPad 2 so brand new hardware is not a prerequisite.
This is a work in progress. Writing it up here has already resulted in me tweaking a few parts and I’m sure there is more tweaking to do.

1. Rounding up PDFs

I do pretty much all my searching on my iMac at home, to take advantage of the big screen and comfy chair, with the added benefits of solitude, music and the close proximity of tea and snacks. Apart from following trails through from the bibliographies and references of other researchers, I have been diving into Google Scholar and – taking advantage of my university privileges – big databases like Web of Science, JSTOR and the British Humanities Index. It’s amazing how many potentially useful references can be assembled with a few hours work.

2. Inside and outside Zotero

When I get to full text articles or other content I want to read more thoroughly, I do two things. First, I create a citation record in Zotero. I have the Zotero Connector extension in Google Chrome so usually clicking on the little icon in the address bar will create a citation record, plus download a copy of the full text reference and (if you’re lucky) an HTML snapshot too.

But I don’t like just working with those full text references. First, the actual files are buried in a horrible folder structure that’s not human-readable deep in the ‘Library’ folder of my Mac so I don’t feel in control of them as records. And second, if you do go to the trouble of tracking down PDFs in that horrible file structure they can have less than ideal file titles.

So I open the full text reference from Zotero – which has the added bonus of making sure it is there and accessible – and save a copy into a folder called ‘full text references’ in the PhD folder of my Mac. This contains a few sub-folders for big, broad subject categories which are mirrored in Zotero but not much more than that. I like to keep my folder structures fairly flat so I don’t have to hunt for things. Plus, whenever I create detailed folder structures I keep wanting to fiddle with them and move things around. They create more problems than they solve.

When I save my working copy of the full text PDF I use a file naming convention we use at work, because it’s force of habit and also very effective. The article I found the other day by Matthew Jones, ‘Archives and museums – threat or opportunity?’ is saved here as:

REMP – 2015-01-23 Jones 1997 – Archives and museums – threat or opportunity.pdf

Breaking it down: [a four letter project code, in this case ‘REMP’ which means it’s part of my PhD] – [the date I saved this file to my computer] [Surname of first author, plus second surname or ‘et al’ for multiple authors] [year of publication] – [full or sensibly abbreviated article title].pdf

3. From iMac to iPad

Though my study makes a good place for searching and writing, I hate reading anything substantial on a desktop screen or laptop. I also want to be able to read on the couch, or (as I did today) on a bean bag in my living room; or in a cafe, in the library, or sitting outside on a park bench. Having decided to go digital, that means using a tablet, which for me means an iPad.

After reading a few reviews, I’ve started with the Good Reader app (about $6.50) and a stylus for highlighting and underlining text, adding notes, scribbling arrows or otherwise annotating PDFs. So far it’s working really well.

The connection between my iMac and iPad is Apple’s iCloud. When I want to read a set of PDFs, I copy them from the ‘full text references’ folder into the Good Reader folder in iCloud.

4. Highlighting and annotating

I can open Good Reader on my iPad and access any PDF that’s in the iCloud folder. I can highlight, annotate and scribble on it to my hearts content. When I’m done I also add a big red tick to the front page so I can quickly tell if it’s something I’ve worked on. Rather than creating a pile of additional files I just save the annotated version over the top of the original in iCloud. If I need the clean original I’m managing those in Zotero.

5. From iPad back to iMac and Zotero

Once I’ve finished reading and am back at my desk I pull all the annotated PDFs out of iCloud and back into ‘full text references’, copying over the clean versions with my new annotated copy and deleting the files from iCloud as I go. And finally, I open my folder of annotated files on one side of my screen and Zotero on the other and I drag and drop my annotated PDFs onto each reference. That way, I have a clean and an annotated copy in my citation software, plus an additional local annotated copy which I control along with my other documents.

6. Backing up

You’ll notice in the process above that I keep files on my iMac and return my annotated files there when I’m done reading. iCloud is just a holding pen – a way of distributing and accessing stuff remotely for a specific purpose – not somewhere to store things I don’t want to lose. That stuff I want to control myself. My desktop Mac is backed up locally to a 2TB external drive using Time Machine, and (on any day where I’ve done a substantial block of work) before I log off I FTP my PhD folder and a zipped copy of my Zotero library across to a fully backed-up server at the University using FileZilla.

Phew! It may sound a bit over-engineered for some people but it works for me. Most importantly, to date it seems reliable. If it stays that way it will just be a process I can follow, leaving me free to think about more important things.

Happy PDF wrangling!

4 Comments

Add yours

What is the best ultrabook of 2015
February 4, 2015 at 4:24 pm


Great website. Plenty of useful info here. I’m sending it to some pals ans also sharing in delicious.
And obviously, thank you to your sweat!

Carl Schuyler
August 10, 2015 at 4:59 am


Hi Mike! Thanks for your blog post; in fact, I don’t think your workflow is over-engineered at all. I actually came across this post as I was looking for tools to use in a process that I use which is not unlike your own.

My problem is that I have boatloads of PDFs (other file types too, but mostly PDFs). My current goal is to be able to put them in the cloud somewhere, access them through a web app, do full-text search on them, and get back ranked highlighted results in context (basically as if I were using Google to search my own private document repository). I’ve looked at tools that I’d have thought would allow me to do this (Evernote, Dropbox), but so far to no avail. Are you aware of tools that can do this?

- Mike Jones
  August 10, 2015 at 3:55 pm
  
  
  Hi Carl – it’s an interesting question. I use Zotero to manage my references and that allows you to search the full text of attached PDFs as well as the citation metadata, tags and associated notes. But as far as I know there’s no way to rank the results. Either the text is found, or it isn’t.
  
  I know some Mac people who use DEVONthink for managing PDFs and carrying out text searches. I’ve heard it is great, and have seen some impressive demos, but haven’t tried it myself so don’t know whether it does what you want. It’s not free, but if you have particular needs it might be worth investigating. Good luck!
  
caricabatterie portatile
January 15, 2017 at 10:00 pm


Good blog post. I definitely appreciate this website.
Keep writing!

Wrangling PDFs

Mike Jones

4 Comments

Add yours

Leave a Reply Cancel reply

About

Search Context Junky

Recent Posts

Categories

Archives

Meta

Subscribe to Blog via Email

Wrangling PDFs

Share this:

Mike Jones

4 Comments

Add yours

Leave a Reply Cancel reply

About

Search Context Junky

Recent Posts

Categories

Archives

Meta

Subscribe to Blog via Email