Page 1 of 2 12 LastLast
Results 1 to 10 of 19

Thread: Printed Document Availability

  1. #1

    Default Printed Document Availability

    I mentioned this documentation registering effort on the ccTalk mailing list and got the following response (amongst others):

    Hi,

    >> I finally have a scanning system setup here for archiving documents.
    >
    > On a tangentially related note, we've just started an effort at the VC
    > Forum to track scanned documents.

    Ahh, but why limit yourself to just scanned documentation? In terms of systems
    preservation, typically I imagine that the documentation that *doesn't* get
    scanned is the more useful as it's for the rarer machines and harder to come
    by. Problem is there's less incentive to scan documentation for machines with a
    low production volume, and for more complex or specialist machines the
    documentation can be huge (I catalogued all the Torch stuff I have and there's
    well over 13,000 pages - no way I'm scanning that!

    Collectors might not be willing to scan in everything they own - but they might
    be willing to make it known that they have paper copies of xyz and so therefore
    could look up information on somebody's behlaf if needs be. Could be invaluable
    for bringing less common machines back to life again.

    This won't work for those of us who constantly trade machines back and forth
    (and there's nothing wrong with that!) but I imagine lots of us have
    collections that only ever get added to, or have machines that won't (likely)
    ever be traded or sold on.

    Experience has been that the classiccmp list - whilst invaluable - isn't always
    the best source of information, plus posts with questions get missed etc.

    > Think of it as an index to available online documents of interest to
    > vintage computer collectors.

    just drop the 'online' bit

    What data do you actually store? I ended up with the following for my stuff:

    Related manufacturer
    Page format / type
    Issue / date
    Author
    Notes
    Location
    Quantity
    Source
    Part number
    Size (pages, approx)

    Most of those are optional. 'Location' is just something I use to tell me where
    things are when they're stored in a binder or whatever - for a system used by
    several people it could be dropped (or kept private from other users). 'Source'
    tells me where I got xyz from and when - I've found that to be useful to know
    in the past. Again, could be private data. 'Size' is handy to know for when
    somebody asks whether you could scan something - gives a good idea of effort
    involved!

    For a system shared between users I'd probably add a 'Related machine' column
    too, and it'd of course need an 'online location' field and some sort of user
    contact details too. Some of those fields would be common to multiple entries
    for the same document, others on a per-item basis. ('date entry added' might be
    nice too)

    I only thought of this about a month ago and have been too busy to make a start
    on it other than run a few ideas by Tony (from the classicmp list). Initial
    thought was to use something like Hypersonic as the database; the software
    footprint is only a few hundred KB, plus it's Java so portability is less of a
    problem as is interfacing to some sort of web-based system.

    One step at a time and all that, but of course it doesn't end with
    documentation, but could also be extended to systems, software, ROM images and
    the like (a lot of ROMs must be close to failing in classic machines these days
    and not many people make an effort to archive those!)

    Put these thoughts on your site if you think it makes sense; I'm happy to
    bounce ideas around with people.

    Getting people to actually submit data is of course the hard part I
    imagine those with rarities are the ones who'll be interested in this, and
    they're precisely the people who need to be attracted to an effort like this.

    > It's just in its infancy, but I think it's a great idea

    Same here. I just think limiting things to online data doesn't help the
    preservation movement as much as it could - but it does help those with
    more-common machines who want to get a bit more out of them.

    cheers

    Jules
    I think that this is an excellent extension of the idea, so this new forum has been created for folks to post available documents that they haven't yet, or may never scan. Please only post documents that you'd be willing to either copy or lend out for copying to assist another in need.

    Erik

  2. #2

    Default Shhhhh! This is a library!

    I, too, have grappled with the problem of cataloging and preserving technical documentation.

    I currently have a collection estimated at 11,000 documents and 125,000 pages. Scan it? Sure....

    Cataloging it sounds like a more realistic approach, and I think I've found a program to do it. I company in Great Britain puts out a package called LexFile, which stores data in MARC (MAchine Readable Cataloging) format, the format used by 95% of libraries in this country as well as the Libarary of Congress.

    There are a lot of programs that will handle the MARC data, but for me it can't be a Windows program, and the Lexfile is DOS based so it will run under my OS/2 network with no problem. The fact that the DOS version is also free was just a bonus. I would gladly have paid for it after reviewing it.

    It should also be noted that the MARC format accomodates widely varying data, not just books. They have catagories for physical items (such as EPROMS some else mentioned) and intellectual property (such as source code, regardless of the media format).

    If you haven't worked in a library (I did, in school) it may be a bit confusing, but I would be happy to answer any questions I can.

    The most important thing to stress is consistancy and standards. It might be in the best interest of a few like-minded professionals to found an organization dedicated to the task of preserving this important history. My personal goal is to have my catalog on the internet, so that other people can google a particular model and see that I have the book they want. Imagine if a few of us who have larger collections could get together and set down standards for cataloging...

    Standard abreviations for manufacturers, product lines, OEMs, etc.

    Standard Media Type Classifications: (Paper hardbound, paper softbound, Microfiche, etc.)

    Standard Distribution Types: (Sales Brochures, Service Documents, Programming Manuals, Users Guides, etc.)

  3. #3
    Join Date
    Sep 2006
    Location
    South Florida
    Posts
    1,586

    Default

    I'd rather set up a doc mgmt system, scan x amount per week, say one manual a week/day/whatever. Something based on MS SQL Server/MSDE would work great, or MySQL, free sybase SQL servers, etc..

    Wouldn;t be too hard to get something done, even using Access to create the dB in MSDE.

    The main problem, is the originals are aging, and getting worse.

    Unfortunately, when they're gone, they're gone.


    Tony

  4. Default

    I'm not just interested in scanning, but doing OCR as well. Having the text of the things I scan searchable is important.

    Has anybody looked into the current state of the art for OCR packages? I'm sure that Adobe has something good, but I generally can't justify spending their kind of money for a hobby project like this.

  5. #5
    Join Date
    Jul 2003
    Location
    Västerås, Sweden
    Posts
    5,978

    Default

    I don't associate Adobe with OCR software. More likely Paperport, or whoever OmniPage comes from. I've tried some OEM versions that come bundled with scanners. They generally are good if the source is readable and mostly text, but as always it is a bit of post-processing. In particular if the documentation contains tables, illustrations and other pictures. Once the document is finished, you may want to save it as PDF since it is the least proprietary among proprietary formats that maintains layout and images. Something HTML-ish might work too, but more fiddly to download.
    Anders Carlsson

  6. #6
    Join Date
    Sep 2006
    Location
    South Florida
    Posts
    1,586

    Default

    Quote Originally Posted by mbbrutman View Post
    I'm not just interested in scanning, but doing OCR as well. Having the text of the things I scan searchable is important.

    Has anybody looked into the current state of the art for OCR packages? I'm sure that Adobe has something good, but I generally can't justify spending their kind of money for a hobby project like this.
    Acrobat files are searchable. I scanned in a PC-MOS Troubleshooting Guide, and when I clicked search, it asked if I wanted it to build the database, it did (this scans all pages and OCR's them and adds a db of words) and done.

    OCR alone wouldn;t work for me, as alot of the docs I have also have images, etc... Do OCR programs add them in? Or just import text only?

    When i say images, I mean important stuff, like layouts, inter-connections, system diagrams, etc...


    Tony

  7. Default

    Which version of Acrobat includes the OCR feature?

  8. #8
    Join Date
    Sep 2006
    Location
    South Florida
    Posts
    1,586

    Default

    I'm using the Acrobat Standard 7 that came with my Fujitsu 5110EOX2 scanner from work. When I scanned in, it was just images. When I tried searching, it said it needed to OCR it (or something like that). As each page was processed, you could see it's progress messages - 'skewing page', scanning for letters, scanning for words, running OCR service, etc..


    Tony

  9. #9

    Default

    I actually purchased the acrobat distiller, paid around $900.00 for a 10,000 page license. I've never gotten around to starting the project. Well, for starters, I need to find a decent scanner.

    I agree that the quality and availability of source documents is declining, so I suppose time is of the essence. But I still think there should be some kind of standards in place for doing it.
    Windows: worst operating system in the world, almost two decades running!

  10. #10
    Join Date
    Dec 2006
    Location
    Dayton, Ohio
    Posts
    1,833
    Blog Entries
    8

    Default

    My recommendation is to coordinate with the folks from bitsavers.org since they have already plowed through all these issues and have established standards on how to scan documentation, etc.

    Al Kossow is on this forum some place and maybe he can chime in. The people at bitsavers.org have an excellent system in place and I would make any solution for the problem consistent with what they have already done.

    Thanks!

    Andrew Lynch

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •