Glossary
Browse through a limited glossary of common data jargon.
An A-Z list of some of the terms and jargon that we come across. Browse through our up-to-date list .
Term | Description |
---|---|
‘Open’ Data | Data that is published with few or no restrictions constraining its reuse. Typically shared under an Open Government licence, Creative Commons, or GNU Open licence. |
1-star | Content is available online under an open licence. |
2-star | Content is available online under an open licence, in a structured format (i.e. Excel). |
3-star | Content is available online under an open licence, in an open structured format (i.e. .csv). |
4-star | Content is available online under an open licence, in an open structured format (i.e. .csv), employing URIs. |
5-star | Content is available online under an open licence, in an open structured format (i.e. .csv), employing URIs, and the data is linked to other open data in order to provide context. |
5-star Open Data Standards | Schema for identifying the openness of a published dataset. |
Access | As defined in the Handbook, access is assumed to mean continued, ongoing usability of a digital resource, retaining all qualities of authenticity, accuracy and functionality deemed to be essential for the purposes the digital material was created and/or acquired for. |
Access Conditions | Within an archival description, this provides information about anything that might affect the availability or usability of the materials being described. This will include details of any restrictions on access imposed by the donor or the repository, or any legal restrictions. Access conditions might tell you if you need to contact the repository to make an appointment, or take ID to register as a user. If the access conditions say 'open', there should be no restrictions on accessing the material, but you may wish to contact the repository before your visit to confirm that the material you're interested in is available. |
Access Points | These are names, places and subjects, acting as index terms or keywords for searching and browsing on the Archives Hub. Access Points on the Archives Hub are links which you can click on to see more descriptions with the same index term. This can help you to find other archival descriptions which mention the same people, places, or subjects. |
Accession | Material that comes into an archive as a single acquisition is described as an accession. It may be a gift or a purchase, and ownership or copyright may be legally transferred. A number of accessions may form one single collection with shared provenance, e.g. the records of a business may be transferred to an archive over time. |
AIP | Archival Information Package. An Information Package, consisting of the complete set of digital files and a complete set of metadata for the AIP (to support preservation and access) that is preserved within an OAIS archive. |
Anonymity | Where participants’ identities are not collected (nor shared). For example, a survey that does not collect names or email addresses; vox pop interviews; etc. NOT synonymous with Confidentiality. |
Anonymity | A situation in which the identity of the research participants neither collected nor shared. I.e. where no-one, including the researcher, knows the identity of the research participants. NOT synonymous with confidentiality. Examples include anonymous surveys, tip-offs, etc. |
API | Application Programming Interface. Figshare’s API is a set of routines, protocols, and tools for building software applications, helping to automate researcher workflows. The Figshare API allows you to manage your figshare data (push data to figshare or pull data out), create collections out of public content or build applications on top of the functionality. The API is fully documented. |
Appraisal | Assessing a collection to determine its long-term value. This often happens during accession. |
Archival description | A catalogue or finding aid for a collection of archival material. The archival description should always tell you at least: the title of the collection; the collection's reference code; where the materials are located (repository); the dates, date range, or approximate dates of the material; who created the collection (name of creator); what sorts of material or information the collection contains (scope and content); whether there are restrictions on accessing the collection (access conditions); the language of the material; and how much material is in the collection (extent). Some descriptions may contain more information than this. |
Archival formats | File formats suitable for long-term preservation of digital content. Archival formats are typically open formats (i.e. do not require proprietary software to access). The US Library of Congress has published a list of formats suitable for long-term archiving here: https://www.loc.gov/preservation/resources/rfs/TOC.html |
Archives | Materials created and accumulated by individuals, organisations or businesses in the course of their activities and retained for usefulness (research) and as evidence (legal). The term Archive or Archives is also widely applied to organisations or subsections of organisations which have custody of archives/records, e.g. The National Archives, the Tate Archives. The term is also used more loosely to apply to the idea of storing things safely over time, such as 'archiving your text messages'. |
Asana (see: Kanban) | A workflow / work scheduling software (online program) |
ASCII | American Standard Code for Information Interchange, standard for electronic text. |
AtoM | AtoM stands for Access to Memory. It is a web-based, open source application for standards-based archival description and access in a multilingual, multi-repository environment. |
Authentication | A mechanism which attempts to establish the authenticity of digital materials at a particular point in time. For example, digital signatures. |
Authenticity | The digital material is what it purports to be. In the case of electronic records, it refers to the trustworthiness of the electronic record as a record. In the case of "born digital" and digitised materials, it refers to the fact that whatever is being cited is the same as it was when it was first created unless the accompanying metadata indicates any changes. Confidence in the authenticity of digital materials over time is particularly crucial owing to the ease with which alterations can be made. |
Bit | A bit is the basic unit of information in computing. It can have only one of two values commonly represented as either a 0 or 1.The two values can be interpreted as any two-valued attribute (yes/no, on/off, etc). |
Bit Preservation | A term used to denote a very basic level of preservation of digital resource as it was submitted( literally preservation of the bits forming a digital resource). It may include maintaining onsite and offsite backup copies, virus checking, fixity-checking, and periodic refreshment to new storage media. Bit preservation is not digital preservation but it does provide one building block for the more complete set of digital preservation practices and processes that ensure the survival of digital content and also its usability, display, context and interpretation over time. |
Bit-rot | The decay in digital files over time due to the deterioration of physical media such as hard drives. Usually causes complete data failure, but may also cause individual data items to become corrupted or incorrect over time. |
Born-Digital | Digital materials which are not intended to have an analogue equivalent, either as the originating source or as a result of conversion to analogue form. This term has been used in the Handbook to differentiate them from 1) digital materials which have been created as a result of converting analogue originals; and 2) digital materials, which may have originated from a digital source but have been printed to paper, e.g. some electronic records. |
BPM | Beats per Minute |
BWF | Broadcast WAV format, the European Broadcasting Union standard for a WAV file, with extra metadata. |
Byte (B) | A unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures. |
Chain of Custody | A key concept in forensics whereby the custody and provenance of digital hardware, media and files are safeguarded through, for example, the appointment of evidence custodians. The purpose of the Digital Evidence Bag (DEB) is to hold digitally, along with the evidential digital objects, provenance metadata that can be updated as required: a concept that is familiar to digital preservation practitioners. |
CLOCKSS (see: LOCKSS) | ‘CLOCKSS (Controlled LOCKSS) is a not-for-profit joint venture between the world’s leading academic publishers and research libraries whose mission is to build a sustainable, geographically distributed dark archive with which to ensure the long-term survival of Web-based scholarly publications for the benefit of the greater global research community.’ |
Coded data | Data tagged or assigned with identifiers as the precursor to analysis. |
Collection (Archive Collection) | Documents or material of any kind that have accumulated as part of the normal activity of an organisation, business or individual and been kept as a unit in an archival repository. Sometimes the term fonds may be used for a collection of material created by an individual person or organisation where the integrity of the whole is important, as it provides contextual evidence for all of the items. The term artificial collection may be used for archival material brought together by a collector or a repository with no shared provenace. A collection may be a single item (letter, diary, film etc), or it may be made up of many items. The extent will tell you how big the collection is. |
Collection description | A description of the material within an archival collection, providing essential information about the collection. Often also called an archival description, a catalogue, or a finding aid. |
Collection Level | This describes a description that summarises general information about the archival material in a collection, without details of individual items. |
Collection Level description | This is a description for an archival collection that provides a general overview of the collection, without going into details of individual items. For a multi-level description, the higher level can be seen as the 'parent' of lower-level descriptions or components. |
Confidential data | Data that contains sensitive personal information that should not be shared. See direct identifier and Indirect identifier. |
Confidentiality | Where disclosive/identifying information is collected but not share. I.e. personal interviews in which names and identities are removed prior to sharing; surveys that collect but do not share personal information, etc. Usually used in qualitative research where personal identities must be collected, i.e. in face-to-face interviews. NOT synonymous with Anonymity. |
Confidentiality | A situation in which the identity of the research participants is collected but not shared. Many kinds of data including personal interviews can be made confidential through removing disclosive data. |
Copyright/Reproduction | In archival descriptions, the 'copyright/reproduction' statement provides information about whether you may copy, quote, or publish material from within the archival collection. There may be limitations imposed by the collection's donor, or there may be legal restrictions on the use of the material. This will often include information about copyright of the material. |
Creation Information | In descriptions on the Archives Hub, this usually appears under the heading 'Cataloguing Info', and provides information about how the description was created, and who is responsible for it. |
Custodial History | In an archival description, this outlines the 'chain of ownership' or 'provenance' of the material in the collection - who has owned it, and how it came to be acquired by the repository. |
Dark archive | Dark archive is an archive that cannot be accessed by any current users but may be accessible at future dates subject to the occurrence of specific predefined events ('trigger event'). Access to the data is either limited to a few set individuals or completely restricted to all. |
Data conversion (See also: Reformatting) | The conversion of computer data from one format to another. Throughout a computer environment, data is encoded in a variety of ways. For example, computer hardware is built on the basis of certain standards, which requires that data contains, for example, parity bit checks. Similarly, the operating system is predicated on certain standards for data and file handling. Furthermore, each computer program handles data in a different manner. Whenever any one of these variables is changed, data must be converted in some way before it can be used by a different computer, operating system or program. Even different versions of these elements usually involve different data structures. For example, the changing of bits from one format to another, usually for the purpose of application interoperability or of capability of using new features, is merely a data conversion. Data conversions may be as simple as the conversion of a text file from one character encoding system to another; or more complex, such as the conversion of office file formats, or conversion of image and audio file formats. |
Data de-identification | The process of removing, eliding or revising certain pieces of information in a dataset (qualitative or quantitative) that can be used to identify research participants (and, potentially, referrents - other people mentioned in the data). Typically consists of one or more of the following techniques: Omission/redaction - removing the identifying information, often by replacing it with a stock phrase such as ‘redacted’ Revision/abstraction/generalisation - replacing the content with content with similar meaning but no identifying information. E.g. replacing “Head of Department” with “senior leadership role” or ‘aged 21’ with “15-30”. Perturbation - replacing specific identifiers with other, specific pieces of information that maintain statistical analysis but obscure identities. E.g. replacing “age 21” with a random number +/- 2 years of the original age (19, 20, 21, 22, 23) |
Data de-identification | The process of removing information that could reveal research participants’ identities. Can include the removal of direct identifiers and indirect identifiers through omission, abstraction, redaction or perturbation. |
Dates of Creation | In an archival description, this shows the date (or dates) when the materials were created - they may pre-date the formation of the collection. |
DCC Digital Curation Centre | A UK based organisation active in digital preservation. |
Description | Usually referring to a description of an archive, or a unit within an archive, which describes and explains the content and context of a collection of archival material. See also Collection description, Archival description. |
Digital archiving | This term is used very differently within sectors. The library and archiving communities often use it interchangeably with digital preservation. Computing professionals tend to use digital archiving to mean the process of backup and ongoing maintenance as opposed to strategies for long-term digital preservation. |
Digital forensics | The application of scientific technical methods and tools toward the preservation, collection, validation, identification, analysis, interpretation, documentation and presentation of digital information derived after-the-fact from digital sources. |
Digital materials | A broad term encompassing digital surrogates created as a result of converting analogue materials to digital form (digitisation), and "born digital" for which there has never been and is never intended to be an analogue equivalent, and digital records. |
Digital object | A digital object is a digital representation of some or all of the material in an archive collection. This may be a digital surrogate, or it might be born-digital material, such as a digital photograph or mp3 recording. Digital objects are often available online. |
Digital Preservation | Digital Preservation Refers to the series of managed activities necessary to ensure continued access to digital materials for as long as necessary. Digital preservation is defined very broadly for the purposes of this study and refers to all of the actions required to maintain access to digital materials beyond the limits of media failure or technological and organisational change. Those materials may be records created during the day-to-day business of an organisation; "born-digital" materials created for a specific purpose (e.g. teaching resources); or the products of digitisation projects. This Handbook specifically excludes the potential use of digital technology to preserve the original artefacts through digitisation. Short-term preservation - Access to digital materials either for a defined period of time while use is predicted but which does not extend beyond the foreseeable future and/or until it becomes inaccessible because of changes in technology. Medium-term preservation - Continued access to digital materials beyond changes in technology for a defined period of time but not indefinitely. Long-term preservation - Continued access to digital materials, or at least to the information contained in them, indefinitely. |
Digital Publications | Born digital objects which have been released for public access and either made available or distributed free of charge or for a fee. They may consist of networked publications, available over a communications network or physical format publications which are distributed on formats such as floppy or optical disks. They may also be either static or dynamic. |
Digital Records | See Electronic Records |
Digital Resources | See Digital Materials |
Digital surrogate | Electronic or digitised copy of an original document, photograph, or other material. Digital surrogates are often used if the original item is fragile or inaccessible. |
Digitisation | The process of creating digital files by scanning or otherwise converting analogue materials. The resulting digital copy, or digital surrogate, would then be classed as digital material and then subject to the same broad challenges involved in preserving access to it, as "born digital" materials. |
DIP | Dissemination Information Package. An Information Package, derived from one or more Archival Information Packages (AIPs), and sent by Archives to the Consumer in response to a request to the OAIS (OAIS term) |
Direct identifier | Unit of information that can be used in isolation to identify an individual. E,g, ID number, name and surname, telephone number, email address. |
Documentation | The information provided by a creator and the repository which provides enough information to establish provenance, history and context and to enable its use by others. See also Metadata. |
DOI | Digital Object Identifier. A technical and organisational infrastructure for the registration and use of persistent identifiers widely used in digital publications and for research data. The DOI system was created by the International DOI Foundation and was adopted as International Standard ISO 26324 in 2012. |
DPC | Digital Preservation Coalition. A UK and Ireland based organisation active in digital preservation and responsible for the Digital Preservation Handbook. |
DPTP | Digital Preservation Training Programme, an intensive training course run by the University of London Computer Centre. |
DRAMBORA | Digital Repository Audit Methodology Based on Risk Assessment. A set of risk assessment tools developed by the Digital Curation Centre. |
EAD | Electronic Archival Description. EAD is an international standard developed by Society of American Archivists and the US Library of Congress. EAD maps closely onto the International Standard for Archival Description ISAD(G). EAD uses 'mark-up' or ‘tags’ to distinguish the different parts of a finding aid, in a way that can be interpreted and processed by different computer systems. The EAD Document Type Definition (DTD) and the EAD Schema provide a set of rules for the mark-up of highly structured, hierarchically-organised information by using ‘elements’ and their sets of tags. EAD documents are in XML format. XML is ‘Extensible Mark-up Language’, an international standard used for creating many types of electronic documents and designed for the electronic exchange of information. |
Electronic Records | Records created digitally in the day-to-day business of the organisation and assigned formal status by the organisation. They may include for example, word processing documents, emails, databases, or intranet web pages. |
Emulation | A means of overcoming technological obsolescence of hardware and software by developing techniques for imitating obsolete systems on future generations of computers. |
Experimental data | Data collected in an environment with high control over variables, such as chemical reactions. |
Extent | In an archival description, this provides information about the quantity of materials in the collection, or the physical space they occupy. This information can help you to decide how long to allow for a visit to the archive. |
Field data | Data collected in an uncontrolled/in-situ setting, such as field notes, participant observation, etiology (observed animal behaviours), etc. |
File Format | A file format is a standard way that information is encoded for storage in a computer file. It tells the computer how to display, print, and process, and save the information. It is dictated by the application program which created the file, and the operating system under which it was created and stored. Some file formats are designed for very particular types of data, others can act as a container for different types. A particular file format is often indicated by a file name extension containing three or four letters that identify the format. |
Finding aid | A description of an archival collection, to enable the archive to be discovered or the contents within an archive to be identified. |
Fixity Check | a method for ensuring the integrity of a file and verifying it has not been altered or corrupted. During transfer, an archive may run a fixity check to ensure a transmitted file has not been altered en route. Within the archive, fixity checking is used to ensure that digital files have not been altered or corrupted. It is most often accomplished by computing checksums such as MD5, SHA1 or SHA256 for a file and comparing them to a stored value. |
Fonds | In a collection description, 'fonds' is a term often used by archivists for the material created or collected by a particular person, family, or organisation in the course of their activities, in order to distinguish this type of collection from an artificial collection. This distinction is important because a collection with a single provenance has particular evidential value - the parts of the collection all relate to each other and provide context for each other. |
Genre/Form | The medium or form of the material, such as photographs, manuscripts or floppy disks. Can be added to the Access Points to provide information about the kind of materials in a collection. |
GIF | Graphic Interchange Format, an image which typically uses lossy compression. |
Gigabyte (GB) | A unit of digital information often used to describe data or data storage size, equates to approximately 1,000 Megabytes (MB). |
GIS | Geographical Information System, a system that processes spatial and non-spatial data together. |
Held at | In an archival description, this provides the name of the repository where a collection is kept. |
Hierarchy | A collection is arranged in order to show context. This means that it will be catalogued to preserve its original order where possible. The collection will be arranged into sub-sections, such as series, files, items, and these will all be clearly related. An archival description should show the hierarchy if it is catalogued to this level of detail, commonly through a table of contents with a folder type structure. The researcher can then see the context of an individual item, such as a letter - they can see that it forms part of a series, and the series is within a larger collection. |
Holograph | A document in the author's own handwriting. |
HTML Hypertext Markup Language | a format used to present text and other information on the World Wide Web. Since 1996, versions of the HTML specification have been maintained by the World Wide Web Consortium (W3C). |
IASA | International Association of Sound and Audiovisual Archives, an association for archives that preserve recorded sound and audiovisual documents. |
IIIF | International Image Interoperability Framework, a growing community of the world’s leading research libraries and image repositories have embarked on an effort to collaboratively produce an interoperable technology and community framework for image delivery. |
IIPC | The International Internet Preservation Consortium. |
Image size | NB: not to be confused with Resolution |
Immediate Source of Acquisition | This explains how the archival collection came into the care of the repository - the source of the archive, such as the donor or building where it was previously housed. |
Indirect identifier | Unit of information that can be used in conjunction with other units to identify an individual. E.g. position + date of study, first name + position + institution, subject specialisation + institution. |
Ingest | The process of turning a Submission Information Package (SIP) into an Archival Information Package (AIP), i.e. putting data into a digital archive (OAIS term). |
ISO | International Organization for Standardization. |
Item-level | In a description of an archival collection, this is usually the smallest unit of a description, giving information about a single document, such as a letter, photograph, or report. |
JPEG (Jpg / Jpeg) | Joint Photographic Experts Group, a committee that oversees international standards for compression and processing of digital photographs . The majority of JPEG formats are lossy. |
JPEG 2000 | A revision of the JPEG format which can use lossless compression. |
Kanban (see: Asana) | A workflow / work scheduling methodology |
Kilobyte (KB) | A unit of digital information often used to describe data or data storage size, equates to approximately 1,000 Bytes. |
KPA | Key Performance Area. |
Level | For an archival description, the level is the particular point in the hierarchy that is being described. For example 'collection', 'series', 'item'. Levels are nested, so that a 'subseries' forms one part of a 'series', and an 'item' may form one part of a 'subseries.' |
Life-cycle Management | Records management practices have established life-cycle management for many years, for both paper and electronic records. The major implications for life-cycle management of digital resources, whatever their form or function, is the need actively to manage the resource at each stage of its life-cycle and to recognise the interdependencies between each stage and commence preservation activities as early as practicable. This represents a major difference with most traditional preservation, where management is largely passive until detailed conservation work is required, typically, many years after creation and rarely, if ever, involving the creator. There is an active and inter-linked life-cycle to digital resources which has prompted many to promote the term "continuum" to distinguish it from the more traditional and linear flow of the life-cycle for traditional analogue materials. We have used the term life-cycle to apply to this proactive concept of preservation management for digital materials. |
LOCKSS (see: CLOCKSS) | Stands for ‘Lots of Copies Keep Stuff Safe’. The LOCKSS programme is an open source, inter-library preservation system |
Long-term preservation | - Continued access to digital materials, or at least to the information contained in them, indefinitely. |
Lossless Compression | A mechanism for reducing file sizes that retains all original data. |
Lossy Compression | A mechanism for reducing file sizes that typically discards data. |
Lower Level description | A level of description below the top level. The top level is usually at the collection or fonds level, but may, for example, be at series level. The amount of detail given in a lower level description will vary. Often it is just a reference, title and date. See also subfonds, series, item-level. |
Medium-term preservation | Continued access to digital materials beyond changes in technology for a defined period of time but not indefinitely. |
Megabyte (MB) | A unit of digital information often used to describe data or data storage size, equates to approximately 1,000 Kilobytes (KB) |
Metadata | Data that provides information about another object or resource (which can itself be data). May include information about authorship, creation or modification (object logs), unique identification (DOIs), categorisation (keywords, subject categories), organisation (hierarchical information) |
Metadata Information | Information which describes significant aspects of a resource. Most discussion to date has tended to emphasise metadata for the purposes of resource discovery. The emphasis in this Handbook is on what metadata are required successfully to manage and preserve digital materials over time and which will assist in ensuring essential contextual, historical, and technical information are preserved along with the digital object. The PREMIS Data Dictionary for Preservation Metadata has become a key de facto standard in digital preservation. |
METS | Metadata Encoding and Transmission Standard, a standard for presenting metadata using XML. |
Microdata | The ”thing” of data - the data which informs analysis. E.g. interview transcripts, census records, astronomical data, video recordings of a theatrical performance, etc. Most commonly used for tabular data. |
Migration | A means of overcoming technological obsolescence by transferring digital resources from one hardware/software generation to the next. The purpose of migration is to preserve the intellectual content of digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology. Migration differs from the refreshing of storage media in that it is not always possible to make an exact digital copy or replicate original features and appearance and still maintain the compatibility of the resource with the new generation of technology. |
MIME | Multipurpose Internet Mail Extensions. A protocol for including non-ASCII information in email messages. Software typically include interpreters that convert MIME content to and from its native format, as necessary. |
Moderation (data) | Quality assurance as applies to open data sets submitted for publishing. Moderation is a process of checking the data for completeness and coherence, particularly in terms of the richness of the metadata, and offering suggestions on how to make the data record more complete. Does NOT include checking the data itself for value or veracity, which is the submitting researcher’s responsibility. |
MPEG (Mpg/Mpeg) | Moving Picture Experts Group. A committee responsible for the development of international standards for compression, decompression, processing, and coded representation of moving pictures, audio and their combination. |
Multi Level description | An archival description that is not just at one level, but includes nested descriptions of units within the collection. A multi level description may just have two levels, e.g. collection and series, or it may have several levels, e.g. fonds, subfonds, series, subseries, item. See also subfonds, item-level. |
Noise | See: Signal-to-noise ratio |
Obsolescence | Obsolescence (File Formats and Software): |
Obsolescence (File Formats and Software) | ‘Computer files, the objects normally thought of as the main target of digital preservation, are presented according to pre-defined structural and organizational principles. Those principles, usually referred to as a file format, are typically laid out in a document called a format specification. A format specification provides the details necessary to construct a valid file of a particular type and to develop software applications that can decode and render such files. The actual specifications may vary considerably in length, from well under 100 pages to well over 1000, depending on the complexity of the format.’ |
Omission/redaction | Removing the identifying information, often by replacing it with a stock phrase such as ‘redacted’ |
Open Archival Information System (OAIS) | An Archive, consisting of an organization, which may be part of a larger organization, of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. It meets a set of responsibilities, as defined in section 4 of the OAIS standard that allows an OAIS Archive to be distinguished from other uses of the term ‘Archive’. The term ‘Open’ in OAIS is used to imply that the OAIS standards are developed in open forums, and it does not imply that access to the Archive is unrestricted. The OAIS abbreviation is also used commonly to refer to the Open Archival Information System reference model standard which defined the term. The standard is a conceptual framework describing the environment, functional components, and information objects associated with a system responsible for the long-term preservation. As a reference model, its primary purpose is to provide a common set of concepts and definitions that can assist discussion across sectors and professional groups and facilitate the specification of archives and digital preservation systems. It has a very basic set of conformance requirements that should be seen as minimalist. OAIS was first approved as ISO Standard 14721 in 2002 and a 2nd edition was published in 2012. Although produced under the leadership of the Consultative Committee for Space Data Systems (CCSDS), it had major input from libraries and archives. |
Portable Document Format, a set of formats and open standards maintained by the International Organization for Standardization for producing and sharing electronic documents originally developed by Adobe Systems. The original page description format has been elaborated over successive versions to enable the embedding of such complex objects as image, audio, and moving image files, hyperlinks, embedded XML metadata, and updatable forms. Specification for various versions and profiles of the format are now maintained by the International Standards Organization. | |
PDF/A | Versions of the PDF standard intended for archival use. |
PDI | Preservation Description Information. The information which is necessary for adequate preservation of the Content Information and which can be categorized as Provenance, Reference, Fixity, Context, and Access Rights Information (OAIS term). |
Persistent Unique Identifier | An identifier that is used solely for the object being described, and that remains the same over time. On the Web this takes the form of a URI. This provides a web address that can be used to bookmark or link to the description. |
Personal data | Data pertaining to an individual’s identity, activities or characteristics. See confidential data. |
Perturbation | replacing specific identifiers with other, specific pieces of information that maintain statistical analysis but obscure identities. E.g. replacing “age 21” with a random number +/- 2 years of the original age (19, 20, 21, 22, 23) |
PID | Persistent Unique Identifier |
Preferred Citation | The recommended form of words for identifying a collection or unit when referring to it in a bibliography or other formal document. |
Primary data | The data from which the core analysis for a research project is drawn. |
Processed data | Data which has undergone some process of clarification, enhancement, error-checking, removing outliers, conversion into different formats, etc. May or may not include disclosive information (see below). |
Processing Information | Information about how the archival materials have been stored, preserved, or arranged, or how their description has been prepared. |
Provenance | The origin or custody of the materials in a collection. Provenance is important for judging the integrity of a collection. A collection with a shared provenance can provide insights into the creator's life and work. |
Publication Note | Information about publications which are based on, or written about, material in the collection, or which may be of value to researchers using the collection. |
Qualitative data | Data that is collected about the quality of an object, interaction, or process, and/or understanding a particular thought process or perception. Typically represented in language and not by numbers. |
Quantitative data | Data that can be expressed numerically and/or granularly, or are analysed according to statistical models. Often expressed in tabular or similar formats, composed of lists of variables. |
Quasi-statistics | Conducting or supplementing qualitative analysis with simple numerical analysis. E.g. “30% of the research participants referred to their working conditions negatively.” |
Raw data | Data/information captured directly from the collecting instrument, before processing. Examples include interview audio recordings, laboratory machine readouts, fieldnotes, etc. |
RDM | ‘Research Data Management’ … |
Record | Archives such as minute books, registers, deeds, agreements, contracts etc., are actually records in the legal sense, because they formally record official processes and transactions. But sometimes 'records' is used to mean 'archives' in a general sense. Archival descriptions in the Archives Hub's database can also be described as records. |
Reformatting | Copying information content from one storage medium to a different storage medium (media reformatting) or converting from one file format to a different file format (file re-formatting). |
Related Units of Description | Materials which are not part of the archival collection being described, but which are related in some way - maybe a shared creator, or the same subject area. This might include material which is held by another repository. |
Repository | The archive, library, or special collection, where an archival collection is stored. Usually a repository has a reading room for consulting materials, and strong rooms with environmental controls for housing the collections. |
Research data | Facts, measurements, recordings, records, or observations about the world collected by scientists and others, with a minimum of contextual interpretation. Data may be in any format or medium taking the form of writings, notes, numbers, symbols, text, images, films, video, sound recordings, pictorial reproductions, drawings, designs or other graphical representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, data processing algorithms, or statistical records. NOT synonymous (but may in cases overlap) with Enterprise data. Also see Research & Development (R&D). |
Revision/abstraction/generalisation | Replacing the content with content with similar meaning but no identifying information. E.g. replacing “Head of Department” with “senior leadership role” or ‘aged 21’ with “15-30”. |
Scope and Content | This summarises the range of the materials being described, allowing you to judge the potential relevance of the archival collection. This should provide a general overview of the subjects covered, and highlight significant individuals, organisations, or events represented in the collection. |
Secondary data | Additional data collected that may or may not form part of the analysis. |
Series | In an archival description this may refer to materials grouped together because they are of a similar type or because they were originally arranged together. See also System of Arrangement, Lower level description. |
Short-term preservation | Access to digital materials either for a defined period of time while use is predicted but which does not extend beyond the foreseeable future and/or until it becomes inaccessible because of changes in technology. |
Signal-to-noise ratio | Signal-to-noise ratio (abbreviated SNR or S/N) is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. SNR is defined as the ratio of signal power to the noise power, often expressed in decibels. A ratio higher than 1:1 (greater than 0 dB) indicates more signal than noise. While SNR is commonly quoted for electrical signals, it can be applied to any form of signal, for example isotope levels in an ice core, biochemical signaling between cells, or financial trading signals. Signal-to-noise ratio is sometimes used metaphorically to refer to the ratio of useful information to false or irrelevant data in a conversation or exchange. |
SIP Submission Information Package | An Information Package that is delivered by the Producer to the OAIS for use in the construction or update of one or more Archival Information Packages (AIPs) and/or the associated Descriptive Information (OAIS term). |
SMPTE | Society of Motion Picture and Television Engineers, a professional organisation and technical standards body for television and motion picture. |
System of Arrangement | Information on the physical or logical ordering of the material in the collection being described. The material may, for example, be arranged alphabetically by title, in date order, or by some classification scheme. This should include details of any changes to original the arrangement made by the archivist. |
TDR | Trusted Digital Repository. A trusted digital repository has been defined as having “a mission to provide reliable, long-term access to managed digital resources to its designated community, now and into the future”. The TDR must include the following seven attributes: compliance with the reference model for an Open Archival Information System (OAIS), administrative responsibility, organizational viability, financial sustainability, technological and procedural suitability, system security, and procedural accountability. The concept has been an important one particularly in relation to certification of digital repositories. |
Temp file | Temporary files, or foo files (.TMP), are files created to temporarily contain information while a new file is being made.[1] It may be created by computer programs for a variety of purposes; principally when a program cannot allocate enough memory for its tasks, when the program is working on data bigger than the architecture's address space, or as a primitive form of inter-process communication. |
Terabyte (TB) | A unit of digital information often used to describe data or data storage size, equates to 1024 Gigabytes (GB). |
Thumbnail | A small version of a digital image, generally used as a link to a larger version. |
TIFF | Tagged Image File Format, a common format for images typically lossless. |
Unit of Description | Any level of description, from a collection or fonds through to a subfonds, series, subseries, file or item within an archival collection. A unit of description typically has a reference, title and date at minimum. |
UPS | Uninterruptible Power Supply. |
WAV | the standard file wrapper for audio; see BWF (Broadcast WAV Format) for the professional variant. |
Weeding | This is the act of identifying and removing unwanted materials from a collection. Often an archivist may decide to remove duplicate or damaged documents from a collection, and they would usually include details of this process in their archival description. |
X-rite | A manufacturer of colour calibration hard- and software. |
XML | Extensible Markup Language, a widely used standard (derived from SGML), for representing structured information, including documents, data, configuration, books, and transactions. It is maintained by the World Wide Web Consortium (W3C). |
Zeutschel | A German manufacturer of digitisation / scanning hard- and software. |
ZivaHub Item Type: Book | Books are generally long-form documents, a specialist work of writing that contains multiple chapters or a detailed written study. They are non-serial and should be complete in a single volume or finite number of volume. |
ZivaHub Item Type: Composition | A creative work in a fine art context, such as a piece of music or a poem. Composition can refer to the piece or the process of creation of the piece. |
ZivaHub Item Type: Conference contribution | Any type of content contributed to an academic conference, such as papers, presentations, lectures or proceedings. This type should only be used if there is not other, more specific type, eg. Poster |
ZivaHub Item Type: Data Management Plan (DMP) | Data management plans are an integral part of a research venture, describing what data will be collected or created and how, and also the means by which it will be managed, shared and preserved. |
ZivaHub Item Type: Dataset | Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, usually in a defined structure, on which analysis can be performed. |
ZivaHub Item Type: Educational resource | Any type of content, or learning object, useful for teaching, learning or research in an educational context. Tutorials, practicals, demonstrations, course outlines, test questions and rubrics, worked examples, lecture notes and topical reading lists. |
ZivaHub Item Type: Event | Information on the purpose, location, duration, agents or effects of an occurrence such as a conference, performance, natural phenomenon, or conflagration. |
ZivaHub Item Type: Figure | Figures are generally photos, graphs and static images that would be represented in traditional pdf publications. |
ZivaHub Item Type: Journal Contribution | Any type of content formally published in an academic journal, usually following a peer-review process. |
ZivaHub Item Type: Media | Media is any form of research output that is recorded and played. This is most commonly video, but can be audio or 3D representations. |
ZivaHub Item Type: Monograph | A non-serial scholarly publication (either one or multiple finite volumes). |
ZivaHub Item Type: Online resource | Any type of resource available online. |
ZivaHub Item Type: Performance | The presentation of a theatrical play or music concert within a fine art context. |
ZivaHub Item Type: Physical object | A record describing any type of physical object, such as a work of art, instrument or archaeological artefact. |
ZivaHub Item Type: Poster | Poster sessions are particularly prominent at academic conferences. Posters are usually one frame of a PowerPoint (or similar) presentation and are represented at full resolution to make them zoomable. Posters typically contain text with illustrative figures and/or tables, usually reporting research results or proposing hypotheses. |
ZivaHub Item Type: Preprint | Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an authors' research but must not have been accepted for publication at the time of submission. |
ZivaHub Item Type: Presentation | Academic presentations can be uploaded in their original slide format. Presentations are usually represented as slide decks. Videos of presentations can be uploaded as media. |
ZivaHub Item Type: Report | A formal account of an observation, investigation, finding, activity or any other type of information. Also an official record of activities by a committee or similar entity, usually archived or submitted to a higher authority |
ZivaHub Item Type: Software | A computer program in source code (text) or compiled form. Code as a research output can either be uploaded directly from your computer or through the code management system GitHub. Versioning of code repositories is supported. |
ZivaHub Item Type: Standard | A formal and detailed description of an invention, protocol or workflow; examples include patents, patent applications and requests for comments (RFC). Could include SOPs (Standard Operating Procedures). |
ZivaHub Item Type: Thesis | In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text-based documents than a paper. A thesis or dissertation is a document submitted in support of candidature for an academic degree or professional qualification presenting the author's research and findings. |
ZivaHub Item Type: Workflow | Resource describing protocols, procedures, methods or activities part of a scientific experiment. A recorded sequence of connected steps that can be reliably repeated in the performance of a particular task. |