This document outlines some of the basic definitions of data as it is used within the academic environment. Research data come in many different shapes and sizes, but it covers “any information collected, stored, and processed to produce and validate original research results.”1
FAIR Data
Working with data throughout a research project is incredibly challenging. Research Data Management becomes important as it is the process of organising and documenting of data processes (collection, description, de-identification, curation, archiving and publication) within a project. Professional data management practices can make research more coherent and shareable, or FAIR. FAIR stands for data that is Findable, Accessible, Interoperable and Reusable. Even if you can not make your data completely accessible, practising good research data management helps you make your research more efficient.
Research and Development
Creative work, undertaken on a systemic basis, in order to increase the stock of knowledge, including knowledge of man[sic], culture and society, and to devise new applications of available knowledge
2 The Organization for Economic Co-operation and Development-OECD (2015)
According to the Frascati Manual(2015) The Organization for Economic Co-operation and Development-OECD (2015) an R&D activity can be distinguished from a non-R&D activity if five core criteria are met; namely the activity must be:
- novel i.e. aimed at new findings;
- creative i.e. based on original, not obvious, concepts and hypotheses;
- uncertain i.e. uncertain about the final outcome;
- systematic i.e. planned and budgeted; AND
- transferable and/or reproducible i.e. leads to results that could be possibly reproduced.
According to The Organization for Economic Co-operation and Development-OECD (2015) “All five criteria must be met, at least in principle, every time an R&D activity is undertaken whether on a continuous or occasional basis.”
R&D specifically excludes educational, training, and administrative work undertaken as part of normal operational processes, as well as certain large-scale data gathering, analysis and/or processing activities (such as the national census, country-level topographical surveys, etc.) that can only be undertaken at the governmental level. National Intellectual Property Management Office-NIMPO (2012) guideline 1 document provides further information on Table 3 about what is excluded from the definition of R&D:
- Education and training personnel at higher education institutions should be excluded.
- Routine (not for a specific R&D project) scientific and technical processes (coding, analysing, etc), which are carried out by scientific and technical personnel, bibliographic services, patent services, scientific and technical extension and advisory services and at scientific conferences
- Routine activities carried out by government agencies to record natural, biological or social phenomena, which are of general public interest or which only the government has the resources to record, for example routine topographical mapping, routine geological, hydrological, oceanographic and meteorological surveying, census data, etc.
- Routine maintenance and testing of national standards and products
- Feasibility studies (not undertaken as part of a specific R&D programme)
- Routine investigation and application of specialised medical knowledge
- Administrative and legal work pertaining to patents
- Routine national, regional and local policies
- Routine software development
- Routine activities necessary for implementation of services/products
- Production and related technical activities
- R&D financing activities (fundraising, etc.)
- Indirect administration and support services (transport, cleaning, etc.)
Definitions
UCT’s current data definitions are shown below (Casrai (n.d.), Department of Science and Technology-DET (2012), University of South Carolina Libraries (n.d.)):
Term | Synonyms | Definition |
---|---|---|
Anonymity | N/A | A situation in which the identity of the research participants neither collected nor shared. I.e. where no-one, including the researcher, knows the identity of the research participants. NOT synonymous with confidentiality. Examples include anonymous surveys, tip-offs, etc. |
Coded data | N/A | Data tagged or assigned with identifiers as the precursor to analysis. |
Confidentiality | N/A | A situation in which the identity of the research participants is collected but not shared. Many kinds of data including personal interviews can be made confidential through removing disclosive data. |
Confidential data | Disclosive data | Data that contains sensitive personal information that should not be shared. See direct identifier and Indirect identifier. |
Data de-identification | Anonymisation, confidentiality | The process of removing information that could reveal research participants’ identities. Can include the removal of direct identifiers and indirect identifiers through omission, abstraction, redaction or perturbation. |
Direct identifier | NA | Unit of information that can be used in isolation to identify an individual. E,g, ID number, name and surname, telephone number, email address. |
Experimental data | Laboratory data | Data collected in an environment with high control over variables, such as chemical reactions. |
Field data | Field studies | Data collected in an uncontrolled/in-situ setting, such as field notes, participant observation, etiology (observed animal behaviours), etc. |
Indirect identifier | NA | Unit of information that can be used in conjunction with other units to identify an individual. E.g. position + date of study, first name + position + institution, subject specialisation + institution. |
Metadata | Categories, keywords, descriptive information, study type | Data that provides information about another object or resource (which can itself be data). May include information about authorship, creation or modification (object logs), unique identification (DOIs), categorisation (keywords, subject categories), organisation (hierarchical information) |
Microdata | Unit record data | The ”thing” of data - the data which informs analysis. E.g. interview transcripts, census records, astronomical data, video recordings of a theatrical performance, etc. Most commonly used for tabular data. |
‘Open’ Data | Shared data, public data | Data that is published with few or no restrictions constraining its reuse. Typically shared under an Open Government licence, Creative Commons, or GNU Open licence. |
Personal data | N/A | Data pertaining to an individual’s identity, activities or characteristics. See confidential data. |
Primary data | Core data, main data | The data from which the core analysis for a research project is drawn. |
Processed data | Cleaned data | Data which has undergone some process of clarification, enhancement, error-checking, removing outliers, conversion into different formats, etc. May or may not include disclosive information (see below). |
Qualitative data | N/A | Data that is collected about the quality of an object, interaction, or process, and/or understanding a particular thought process or perception. Typically represented in language and not by numbers. |
Quantitative data | N/A | Data that can be expressed numerically and/or granularly, or are analysed according to statistical models. Often expressed in tabular or similar formats, composed of lists of variables. |
Quasi-statistics | N/A | Conducting or supplementing qualitative analysis with simple numerical analysis. E.g. “30% of the research participants referred to their working conditions negatively.” |
Raw data | Original data | Data/information captured directly from the collecting instrument, before processing. Examples include interview audio recordings, laboratory machine readouts, field notes, etc. |
Research data | Data | Facts, measurements, recordings, records, or observations about the world collected by scientists and others, with a minimum of contextual interpretation. Data may be in any format or medium taking the form of writings, notes, numbers, symbols, text, images, films, video, sound recordings, pictorial reproductions, drawings, designs or other graphical representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, data processing algorithms, or statistical records. NOT synonymous (but may in cases overlap) with Enterprise data. Also see Research & Development (R&D). |
Secondary data | Ancillary data, supplementary data | Additional data collected that may or may not form part of the analysis. |
References
Casrai. n.d. “Casrai Standard Dictionary of Research Administration Information.” https://bit.ly/2PtoW1i.
Department of Science and Technology-DET. 2012. “Act No. 28 of 2013: Intellectual Property Laws Amendment Act 2013.” Government Gazette, 570. 2012. https://bit.ly/2RGTHBB.
National Intellectual Property Management Office-NIMPO. 2012. “Guideline 1 of 2012: Interpretation of the Scope of Intellectual Property Rights from Publicly-Financed Research and Development Act (Act 51 of 2008): Setting the Scene.” Pretoria: NIPMO. 2012. https://www.ru.ac.za/media/rhodesuniversity/content/research/documents/South_African_IPR-PFRD_Act,_2008_(Act_51_of_2008).pdf.
The Organization for Economic Co-operation and Development-OECD. 2015. “Frascati Manual 2015: Guidelines for Collecting and Reporting Data on Research and Experimental Development, the Measurement of Scientific, Technological and Innovation Activities.” Paris: OECD Publishing. 2015. https://bit.ly/2NBY9Oz.
University of South Carolina Libraries. n.d. “Glossary of Research Terms.” https://bit.ly/2tA8iEf.
-
LibGuides@ Macalester University. Available at: https://libguides.macalester.edu/c.php?g=527786&p=3608583↩︎¸
-
OECD (The Organisation for Economic Co-operation and Development). (2015). Frascati Manual 2015: Guidelines for Collecting and Reporting Data on Research and Experimental Development, the Measurement of Scientific, Technological and Innovation Activities. Parsi: OECD Publishing. Accessible: https://bit.ly/2NBY9Oz↩︎¸