Information literacy ...
begins with literacy enabled information
Is my document well-behaved? download the checklist here.
Well-behaved documents contains meaningful metadata for automatic classification!
Metadata is data about data which helps to describe the content or characteristics of an electronic or physical object. The author of a scientific paper, the publisher of a book, the copy right of a photograph, the scene of a painting, the theme of a song and the architect of a 2500BC pyramid are all valid metadata or attributes of the item concerned. They serve to classify it in your private search engine or to share it with colleagues or even with the world at large.
This latter point deserves special attention, because different users may require different pieces of data. Art critiques, veterinaries, physicists and lawyers use different jargons, yet all download regularly content from other domains to integrate in their own knowledge base, preferably without manual intervention, based solely on embedded metadata such as title, author, subject, description, keywords etc., provided documents are well-behaved.
While there are dozens of metadata standards belonging to dozens of fraternities, each with its own vocabulary, none covering all needs of all users. A student may need data he can use to generate citations and an engineer would search his collection of documents by technical criteria.
The digi-libris pragmatic solution to this dilemma:
An individually extensible and universally applicable metadata set.
It builds on the widely used Dublin Core standard (minus refinements) plus an unlimited number of customizable attribute/value pairs for the data. Consider it as an alternative Dublin Core application profile (DCAP) for individuals who may or may not have to rely on a single standard issued by an institution. One does not exclude the other! For the exchange of Metadata with third parties it relies on Adobe’s widely accepted XMP technology. This is the format already implemented in PDF documents and it includes a placeholder for arbitrary or custom variables (pdfx).
dMeta stands for Data plus Metadata Zip-packagewhich are files with the ending *.dMeta or *.dpmz or even *.zip. [Demo]
They are simply assemblies of electronic files of any kind tied firmly together with their corresponding metadata. Such a package may contain multiple pairs of data and metadata sets and even single metadata files containing virtual items or links to an external resource.
The concept aims at
In digi-libris you can add/edit metadata (citation relevant, search relevant, abstracts etc.) to any document in any format (downloaded material, your own work in progress, machine generated datasets, presentations and even virtual items such as printed papers, remote links or YouTube videos etc.) and then create an XMP-sidecar file containing all this metadata.
You can then bundle and distribute one or more items together with their corresponding XMP file into a dMeta zip file. These files are akin to EBPU files in that they are simply zip files with a known structure. There are normally two files with exactly the same name but with different extensions, one of which must be .XMP.
digi-libris will automatically associate the two as long as they are in the same directory or folder, even unzipped.
Stand-alone XMP files without associated electronic data file are treated as metadata of virtual items which is displayed in the same way as real objects. Double clicking on such an entry will activate the link to the underlying resource, if available.
Any software that can interpret dMeta files will automatically associate the XMP metadata with the right file for automatic classification in the knowledge base and display of the metadata.
A well-behaved document is an electronic document that is both user friendly and search friendly
The Open Access idea is gaining momentum and the sheer amount of (scientific, professional and other) documents available on the Internet makes keeping an overview a real challenge. To best organize the mass of material that accumulates over time and re-find the information again when needed for work, documents must be easy to read and easy to classify with little or no manual intervention. And you must have suitable software (such as digi-libris Reader) to automatically index and alphabetically sort all newly added documents to help you keep track of it all. (see example).
A well-behaved PDF or ePub document that is user friendly and search friendly offers important advantages for document producers, distributors and end users:
means a document is easy to read and easy to navigate on any reading device and for which reading software is readily available. It is in an open format and does not depend on proprietary (paid) software for display, styles and multimedia content. It must be searchable, has bookmarks (in applications that allow for it, such as PDF files in Acrobat or Adobe Reader), an interactive table of contents, i.e. one with “clickable” links to the correct target page, and possibly an interactive index, cross references and links to external resources. Except for copyrighted material it should not be password protected or encrypted but must allow the user to print it out and to copy/paste portions of the text and possibly to add bookmarks and comments of his own.
This applies not only to scientific papers, monographs and manuals but to all documents that one would consult or refer to rather than read in a continuous stream from cover to cover, like novels or literary works.
is a document that has useful embedded meta data which librarians, digital asset managers and individuals can exploit to classify a document in his personal knowledge base with little or no manual intervention.
University and public libraries prefer to keep the meta data of all their documents in separate catalogues or data bases for reasons of integrity and maintainability, but since one does not exclude the other, embedding the same meta data or a selection thereof also directly into a digital resource, automatically makes this data available to third parties who download or otherwise obtain access to such resources which they may want to preserve locally in their own knowledge base and/or to consult off-line. Notation in attribute/literal pairs is probably adequate for most private or local repositories.
Making documents interactive and embedding metadata does not necessarily require any extra work if properly planned and some simple rules (consistent use of styles) are observed.
The author having spent a year on a thesis can certainly spend 10 more minutes to write down some keywords plus a description, the typesetter who produces a table of Content anyhow has only to check a single box before exporting to PDF and the publisher can easily import an XMP file containing metadata into the final document.
Search-friendly scholarly publications
Search-friendliness, or machine-readability, is increasingly important in view of the global influence of digitization and open access in the changing publishing and archiving environment. Most scholarly publications are becoming available on the Internet, which makes their processing and systematic archiving a real challenge. To organize a bulk of the Internet content, scholarly papers should be easily classifiable with little or no manual intervention, which requires properly embedding metadata. Explicit metadata facilitate the work of librarians, digital asset managers and non-expert users because
Dozens of metadata standards are currently available, each being linked to its own vocabulary. Unfortunately, none of the standards is universally applicable. A student seeks data to generate citations while an expert searches his collection of papers, employing certain technical criteria. Information about book publishers, image or painting copyrights holders, song writers, or architects of ancient pyramids are all essential metadata and attributes of the items. Metadata are processed to classify items in search engines to share them with the global community. Different users seek different pieces of data. Art critiques, veterinary specialists, physicists, and lawyers download contents from interdisciplinary web domains, and they would prefer to do so without manual intervention, relying on embedded metadata.
Universities and public libraries are challenged to upgrade their services and to more actively contribute to scientific research. Although they prefer to integrate and preserve metadata of all their documents in separate catalogues or databases, I think that one should not exclude the other. Embedding a descriptive selection thereof in a digital resource automatically makes this data available to users for off-line consulting and referencing. And it saves their time. A notation in attribute/literal pairs is probably adequate for most private or local repositories. A separate sidecar Extensible Metadata Platform (XMP) file can be linked or sent along if direct embedding is impossible (eg due to checksum).
A pragmatic solution
Documents with embedded metadata are gradually increasing in open-access repositories and on publishers’ websites. It is partly due to the institutional requirements to provide metadata along with documents. New forms of metadata such as those on HTML pages pointing to Facebook and Twitter are constantly developing. Citation specific variables are currently used in conjunction with Citation Style Language (CSL). And, adding to the jumble, there is a wide range of proprietary name spaces, where each organisation defines metadata specific for different subjects. A document can, therefore, include hundreds of metadata variables, which may or may not be meaningful for users. Solution to this issue should be universal. I suggest an individually extensible and universally applicable metadata set that builds on the
widely used Dublin Core standard (minus refinements) plus an unlimited number of customizable attribute/value pairs for the data. Consider it as an alternative Dublin Core application profile (DCAP) for individuals who may or may not have to rely on a single standard issued by a parent institution. For the exchange of metadata with third parties it relies on Adobe®’s XMP technology.
Who should provide and embed metadata?
The ultimate responsibility for the inclusion of useful metadata lies with the publisher. However, all other stakeholders of the development12 and distribution of documents should also contribute by adding metadata to the final versions of their documents becaus
Adding metadata to PDF files
Adobe’s® XMP technology is well suited for embedding metadata. This is the format implemented in PDF documents. It has placeholders for Dublin Core elements and other standard meta types such as Dicomed for medical applications and IPTC which is used by the
International Press community and professional photographers to secure their copyrights. It also allows to define proprietary sets with their own namespaces as well as unlimited number of custom attribute/variable pairs which can be used to describe anything. To view, to edit and to export metadata, a suitable (free or low cost) software is required. 3 To embed these in a PDF document Acrobat® or another PDF tool that can import XMP files are used.
digi-libris Reader is a non-tech personal organization tool for today’s multitasking user who works and lives in a digital world.
This is metadata centric software for the automatic organization of your own catalogue or searchable collection of things.
Mix documents and data sets of any type, photos, music, videos and web links in a single list, see vital attributes at a glance.
You can also add physical or virtual items manually.
Authors and scholars use it to share knowledge (see Blog on dMeta) and to generate bibliographies (see blog on Citation Style Language).
It is free and it is very easy to use: Just drag a file or a web link into the main window and it will automatically be classified.
Information literacy begins with literacy enabled information which you collect to amass knowledge in order to gain understanding.