From: Patent Tactics, George Brock-Nannestad
I wholeheartedly agree with such a project, and given time I shall also have
input available. I would much prefer a website with professionals, such as
ARSC or AES, rather than the archive.org, which has to cater for all sorts
and therefore knows very little about specialties. Also, I am not clever
enough to search the archive.
If we look at scans, there are various levels of user friendliness that one
The first is individual scan images - I have come across tiff-files, which
are huge and take a long time to download. They have been created by people
of the best will and so have a very high resolution. I find that 150 dpi
grayscale is adequate for most documents - the raw file size decreases to
1/16 by going from 600 dpi to 150dpi. Content and compression will decide
what e.g. png may provide of further reduction. jpg is 24 bit, alas, not 4
The second is a pdf file of the complete document. This is the most practical
for ordinary use.
The third is a thorough and searchable index to the file or files
The fourth is a pdf with OCR performed to make it searchable. This is utter
luxury, which is what is provided by archive.org. But in my view Microsoft or
Google scan sacrifice the image quality to making the file searchable.
I have stated again and again that when I come across a document, a journal,
even a book that I download I prefer to leaf through it page by page, so as
not to lose anything, and I do not rely on registers. For this reason I would
hate anybody wasting time on making OCR or registers. We who use the uploads
(researchers) want the beef, not the salad.
Going back to the work to be done to prepare and upload files: The only thing
that is a fundamental requirement is that the document shall be identifiable,
either because it is self-evident or because the uploader has identified it.
A scan of an unsigned carbon copy is worthless outside its context. I have
such copies: the figure 191 was preprinted on the letterhead, so the typist
only had to write 4 to make 1914. The letter was personal, so there was no
typed name with the original signature. I know who the correspondents were,
but it is only the context that has given me this information.
So, my conclusion is: aim for 1 or 2.
If ARSC considers that it might bolster a committee to include my name I
would be happy to contribute.
P.S. Hate to spoil the fun: rights! Possibly that will be the greatest
hurdle, not OCR.
> I have many documents that are good source material for researchers.
> I'd like to get some of these onto a site, preferably ARSC-hosted,
> I have strong feelings that these and others that others may have should
> available to all researchers.
> The site should be vetted by an ARSC member or committee to avoid
> perpetuation of junk data.
> It should be available past the lifetime of those contributing documents.
> The recent HRS-IRCC questions can be answered from the thick files I have
> both organizations, but I see no reason to send them around and be
> on other's interpretations of the data when it is feasible to let all see
> the same info and cross-check. This holds true for a lot of other stuff
> This is a project I feel is of great importance. How about it, ARSC?
> Steve Smolian