Please log in.
Before you can vote, you need to register. Please log in or create an account.
Computer: Data Organization
iDocs   (+13)  [vote for, against]
A virtual librarian that tags and presents your literature.

iTunes has recently exhumed some of my latent obsessive compulsive tendencies, but I'm glad to share that my entire music collection now has all the correct album artwork, track numbers and consistent punctuation and capitalization.

My documents, however, are not so organised. There are over a thousand .pdf files of journal articles on my hard drive, accumulated legitimately through university access or free content of course, some of which I've even read.
The trouble is that many of the filenames are the name of the author of the article, or just a meaningless number, and searching for a particular author or subject is clumsy at best through finder or explorer.

What's needed is an application that .pdf files of journal articles can be selected and dropped into. The program would then gather information from the text and present them in a similar way to what iTunes does with music, but with sort fields for journal, date, title, author , etc.

There are reference managers that do similar things, but as far as I know, only for manually entered citations and not for complete articles.

I'd want the program to do something like this:

10 Have a look at all those files that have just been imported.

20 Make a list of them.

30 Using some clever algorithm, take the first one in the list and find a random bit of text in the body, about 20 words or so to be safe.

40 Put those words in double quotation marks and paste this into a Pubmed (or other relevant) search.

50 The first hit is probably going to be the article in question, so click on it.

60 Here you'll find all the information about the article you need. Take note of the journal, date of publication, issue number, page numbers, title, authors, institution and abstract if available. I think Pubmed's formatting is consistent enough for you to find this easily enough.

70 Go and find a decent resolution image of this article's journal cover for the issue concerned.

80 Create a modifiable tag with this information for this particular file.

90 Move on to the next file and keep doing this until you're done.

100 If you have difficulty finding the information, let me know with a polite message box. I will find the information for those files manually, but give me some useful places to look first.

110 When done, put it all together in a visually pleasing way for me to browse my files along with some standard reference manager tools.

I'd be interested to know why something like this couldn't be done.
-- shudderprose, Jan 14 2010

A-PDF http://www.a-pdf.com/rename/index.htm
Rename files based on content [shudderprose, Jan 15 2010]

Papers http://mekentosj.com/
[shudderprose, Jul 22 2010]

I would use this. +
-- csea, Jan 14 2010


I worked on a system like this in the 90s. It provided full-text searching and a front-end for what would otherwise be a confusing mass of electronic documents.

Are you sure that nothing like this doesn't exist?
-- Aristotle, Jan 14 2010


It's true about the crazy filenames. I've always thought that was really weird.

//There are reference managers that do similar things, but as far as I know, only for manually entered citations and not for complete articles.

Endnote allows you to create libraries with all the meta data as you described and hyperlinks to pub med page, where (a few more clicks and) you'll have the paper. So currently, I "store" papers there. But pretty useless if you're not on a university network.
-- leinypoo13, Jan 14 2010


Definitely useful. I hate trying to find pdf's. As soon as I invent a naming scheme I find another which won't fit into it for some reason or other.
-- RayfordSteele, Jan 14 2010


What's the point of keeping (and searching for) a copy of the journal article on your own hard drive if you can easily find it on Pubmed?
-- Jim Bob of Merriam Park, Jan 14 2010


10 I'm certain this is feasible

20 The only question is, would you have to code it yourself, or could you buy it off the shelf.

30 There's a product called Devonthink.

40 It has a fairly good reputation ...

50 ... despite the fact that its website makes rather extravagent-sounding claims ...

60 ... which sound like they might include what you're asking for.

70 You're not a BASIC programmer by any chance?
-- mouseposture, Jan 14 2010


BASIC is all I've ever had any exposure to, and not much of it, a long time ago, so I doubt I'll be doing any coding! Will have a look at Devonthink this weekend though. Thanks.

[Jim], I don't want to have to access the articles online if I already have them on disk, but do want to be able to archive them in an intelligent way with a relevant search capability. Besides, many of the articles I do have are no longer available to me online as I no longer have a university login!

[RayfordSteele], A-PDF has a product that might interest you. Windows only though. [link]
-- shudderprose, Jan 15 2010


Try this: go to a PubMed article abstract; then, in the upper left corner just below the logo, click the arrow marked Display Settings. Choose 'XML'.

You can get that programmatically, too. I recently wrote a Word macro that allows me to key in a PMID, and with one keystroke get a fully formatted citation typed into my document. I get the data from PubMed's XML metadata.
-- lurch, Jan 16 2010


This is baked quite well, for those interested. [link] attached.

Different to the idea only in that each paper needs to be 'matched' individually, instead of being done automatically and recursively.
-- shudderprose, Jul 22 2010


Dang! That looks good. Think I'm going to try it. Thank you.
-- mouseposture, Jul 23 2010


Don't know about IDocs, but I'm just liking the hell out of the very idea of PubMed --- it's like "Club Med" except you don't have to go anywhere but the Pub! Hot damn, I'm feeling better already... A round of Buns for everyone! [+]
-- Grogster, Jul 23 2010



random, halfbakery