- Accuracy – the measurement of accuracy for all stages of the document conversion process: Accuracy Percentage = Total Opportunities minus Missed/Incorrect opportunities divided by Total Opportunities. Industry standards include: Data capture – 95%, Scanning – 100%, Image-quality match of original – 99%.
- Automated Indexing – another term for computerized indexing, which doesn’t require human decision-making or data entry. Automated indexing software populates index fields by reading information from bar codes or scanning digital documents that have undergone OCR conversion.
- Back file Conversion – a process in which a large backlog of documents is scanned, indexed and stored on an imaging system. Benefit: it eliminates the need for costly storage space, reduces filing and re-filing time, and reduces filing errors or misplacements.
- Barcode – a series of machine-readable lines of varying widths that contain data. Barcodes can be used to facilitate automated indexing. For example, if standard business forms such as invoices are preprinted with a barcode to indicate that the form is an invoice, an indexing system can automatically populate the “document type” field after the paper form is scanned and converted to OCR. Barcodes also remain intact after fax transmission.
- Batch Processing – a technique by which items to be processed are collected into groups prior to processing.
- Boolean Search – a common search strategy for selecting information that uses AND, OR, NOT functions.
- CCITT Groups III & IV – a raster image compression format designed to be used for fax transmission but also used in other image processing systems.
- CD-ROM – stands for Compact Disc-Read Only Memory, which is an optical disk capable of storing large amounts of data. The most common size is 650 MB and a single CD has the storage capacity of 700 floppy disks. Benefit: the contents of an entire file cabinet can be stored on a CD.
- Character Recognition – the ability of a machine to read legible text.
- Coding – a process that “shrinks” an image so that it occupies less storage space, and can be transmitted faster and easier. (See Indexing Compression.)
- Contextual Search – locating documents stored in a system by searching for text that appears in them, rather than by searching for them by file name or other indexing technique.
- Database – a collection of information organized in such a way that a computer program can quickly select desired pieces of data. Data often comes in the form of fields, records or files.
- Data Dictionary – an organized or compiled collection of information about data or metadata. A data dictionary is an automatic component of most database management systems.
- Data Element – a unit of data that is considered to be indivisible, and is a building block for all data processing systems. Examples include document type, creation date, disposition date, Social Security Number, etc.
- Day Forward – the process of scanning, indexing and storing documents on an imaging system as they are produced or received in the normal course of business.
- De-skewing – the adjustment made to an image to make up for physical distortions inherent in the system or the adjustment made to an image to compensate for justification errors in scanning.
- Decompression – the process of reversing the procedure conducted by compression software or hardware, thereby returning compressed data to its original size and condition.
- Deskew – to straighten out a misaligned image, which improves OCR accuracy and reduces image file size.
- Digital Audio Tape (DAT) – a technology that records noise-free digital data on magnetic tape. Generally used for audio, a DAT cassette can store up to 2 gigabytes of information.
- Digital Linear Tape (DLT) – a technology used for backing up huge amounts of data. A DLT can store up to 35 or 70 gigabytes of information (35 without compression and 70 with compression).
- Directory Structure – hierarchical file management used by operating systems consisting of directories and sub-directories that branch away from the main or root directory.
- Document – 1. Any format that contains information, documents may be word-processing files, e-mail messages, spreadsheets, database tables, voice mail or other audio recordings, faxes, business forms, images, information captured from the Internet, and so forth. Documents are sometimes called “records.”
- 2. According to ANSI/AIIM TR40-1995, a collection of zero or more pages that are related, linked, or bound to each other in some way appropriate to the application. In an electronic image management system, the provision of a zero-page document allows the creation of a document entity prior to capturing and linking its page(s).
- Document Classes – types of documents or “document types” that require similar indexing fields. Examples of document classes include invoices, contracts, timesheets, e-mail messages, etc.
- Document Lifecycle – the period of time that includes creation, maintenance, use, and ultimate disposition or destruction of a document. The records manager needs to know the lifecycle of every document in the organization.
- Document Imaging Process – a designed solution of digitizing paper documents for organizations that handle large amounts of information; an automation of the paper process, which is a valuable tool in the search for and organization of knowledge.
- Electronic Data Management (EDM) – application of technology to save paper, speed up communications, and increase the productivity of business processes.
- Electronic Image Management (EIM) – a system that organizes information in all formats for use throughout its lifecycle.
- Field – a single piece of information within a database.
- File – a collection of records within a database.
- File Transfer Protocol (FTP) – the Internet protocol that permits the transfer of files between one system and another.
- Flat File Database – a database used to manage a simple collection of information or a database with only one table.
- Flatbed Scanner – a device for scanning that has a flat surface for input material that is generally used for scanning bound, delicate or small material.
- Full Text Retrieval – a type of retrieval process that uses an inverted index to retrieve every document that contains the word or words in the search parameter. This type of searching requires a powerful search engine and is much slower than retrieval processes based on indexing values. It is also much less accurate because it is not based on standardized search terms. For example, a search that retrieves all documents containing the word “invoice” will miss those that are designated as “bill” or “voucher.” However, full-text retrieval systems initially are more economical to implement because indexing costs are eliminated. Full-text retrieval is sometimes called “free text searching” or “fuzzy searching,” in contrast with keyword retrieval.
- Hypertext Markup Language (HTML) – the authoring language used to create documents on the Internet. HTML is similar to SGML, although it is not a strict subset.
- Intelligent Character Recognition (ICR) – a form of OCR (optical character recognition) that uses sophisticated lexical tools. ICR is typically used to convert handwritten material to ASCII text.
- Indexing – 1. This is the process of identifying various pieces of information in a document, such as author, document type, creation date, etc., and then transferring that information into a database for search and retrieval. It’s also called “coding” in the legal profession. 2. It describes the process of analyzing the information content of recorded knowledge and expressing this information content in the language of the indexing system (NFAIS Indexing in Perspective Education Kit). 3. It is also the representation of the results of the analysis of a document by means of a controlled or natural language system.
- Inverted Index – a computer file in tabular format, in which rows represent documents and columns represent words. Intersections of rows and columns are marked when certain documents contain certain words. At the point of retrieval, the computer scans the entire inverted index for documents that contain the words in the search query.
- Image – the digitized representation of a picture, graphic or document.
- Image Resolution – the fineness or coarseness of an image after being digitized, which is measured in dots per inch or DPI.
- Keyword – a word associated with a document or document image to aid in its retrieval from digital storage.
- Keyword Retrieval – a type of retrieval process that searches an index with fields to locate documents that contain information related to the search parameter. Keyword retrieval, in contrast with full-text retrieval, requires indexing of documents but provides extremely accurate retrieval as long as the indexing is accurate. To guarantee accuracy of indexing, data elements and indexing values should be carefully designed to match the retrieval needs of the document users. Quality control is an essential part of the indexing process.
- Mark Sense Code – a method of automatic indexing in which the person responding to a questionnaire or form does so by filling in circles or other spaces. A scanner passes over the marks and reads them automatically, digitizing the responses.
- Metadata – data about data, which is information that is required to define the characteristics of and relationships between information contained within databases (field names, length of field, type of data, etc.). Sometimes metadata is called “higher level information” or “processing information.”
- Multi-Occurring Field – a field that can have more than one entry for a given document.
- Optical Character Recognition (OCR) – the process of electronically reading digital images and converting them to text. After OCR conversion, a document is “live,” or editable. For instance, users can edit OCR documents on the computer as if they were word processing documents that they created.
- OCR Repair – manual examination and correction of OCR conversion. Some OCR software is capable of flagging documents that it couldn’t convert, requiring examination and correction only of the flagged documents, rather than all of them.
- Orientation – the relative direction of a display or printed page, either vertical (portrait) or horizontal (landscape).
- Page – the equivalent to one side of a 2-dimensional sheet of paper, microfilm, transparency, etc. In the case of input media other than paper, a page is the data in a single image frame (ANSI/AIIM TR40-1995).
- Portable Document Format (PDF) – a file format developed by Adobe Systems, a PDF captures formatting information from a variety of desktop publishing applications, making it possible to send formatted documents and have them appear on the recipient’s monitor or printer as they were intended. The software, some of which is free and accessible on the Internet from Adobe Acrobat, allows the use of PDFs.
- Records – one complete set of fields within a database.
- Relational Database – a set of tables containing data organized into predefined categories. Each table contains one or more data categories in columns. Each row contains a unique instance of data for the categories defined by the columns.
- Scanner – a device that optically senses a readable image, such as ink on paper, and contains software to convert the image to machine-readable code.
- Skew – a discrepancy in the proper alignment of the orientation of a document.
- Tagged Image File Format (TIFF) – a bitmap (bmp) file format for describing and storing color and gray scale images.
- Validated Fields – all entries in a field chosen from a list of possible values. The restriction of values entered to those on a selected list permits ease of information retrieval.
- Verification – the process wherein data entry is performed twice by one operator, or once each by two operators, and the computer verifies that the same data were entered each time. If there is a discrepancy in the data, the computer prompts the operator to enter the data a third time.
- Workflow – the amount and flow of work to and from an employee, department, or office. The efficiency of workflow is greatly facilitated by imaging systems that electronically transfer documents from person to person, as opposed to a paper file folder traveling from inbox to inbox. Imaging systems can also transfer electronic documents that are part of the same file to different people and then reassemble the information later.