Workflow tool for archaeological experiments and analytics

1. Introduction/background/definitions

Although often classified as a “social science” or as one of the “humanities,” archaeology depends on methods derived from the “natural sciences” when studying objects in order to understand how they were made, used, deposited into the archaeological record, and modified after deposition and/or discard. Answers to these and related questions are sought via what is often labelled “archaeological science”: employing experiments and analytics to generate data by recording observations.

Experimentation plays a fundamental though often overlooked role in archaeological theory and epistemology. Its role is somewhat unclear partly because of this uncertainty over whether archaeology is one of the “humanities” or a natural science, and partly because archaeologists cannot directly witness how things were made and/or used in the past, or what happened to them after use. This leads to some uncertainty: although we can propose plausible explanations, due to issues of equifinality we will never be able to definitely prove how things happened. This problem is, however, common to every “historical“ science (in the broadest sense, including cosmology, evolutionary biology, paleontology, paleoanthropology, etc.), but no one would question whether cosmology was a hard, natural science. And just as cosmologists rely on observations of old stars and simulations, archeologists must rely on the analysis of archeological objects and experimentation. Analytical methods are applied to both experimental and archeological objects in order to test hypotheses about them. The history of archaeology presents us with a constantly widening range of analytical methods applied to everything humans and our ancestors have encountered in our evolutionary and environmental history.

An “experiment” is part of a formal scientific process, directed at testing a hypothesis. Such experimentation has traditionally taken the form of replication, reconstruction and re-enactment; more recent formal experiments performed under controlled conditions rely on the use of sophisticated high-tech equipment (Scanning Electron Microscopes, robotic arms, computer simulation) to digitally record both methods and observations. Although any archaeological interpretation or inference regarding an object's manufacture or use really should be considered to be a hypothesis to be tested, archaeologists often lack the infrastructure, theoretical basis, resources and/or tools for performing the necessary testing. “Analytics” and archeometry can be seen more in terms of “normal science,” in that – following extensive experimentation – testing and measurement methods have been developed and established to the point where they become fairly routine and widely accepted (“black boxed,” according to philosopher Bruno Latour). Despite what appears to be a widespread belief that protocols for a given method are well established and well reported, the current reproducibility crisis in science is a proof that they are not; archeology is no exception. Some journals specifically publish protocols; in the case of NatureProtocols (https://www.nature.com/nprot/about), the protocols are meant to accompany the primary research article and can be required pre-submission. Many tools exist to record/report protocols (e.g. https://www.protocols.io/ or https://openlabnotebooks.org/). Many clinical trials have to be registered (e.g. https://clinicaltrials.gov/ct2/home) before running the trials. However, most of these tools are either field-specific (e.g. chemistry/biology lab) and/or are limited in functionalities (see below).

2. State of the art: what there is and what is missing

Despite its fundamental importance, experimental archaeology is extremely underdeveloped, being situated only weakly within the archaeological process, both from a strictly theoretical viewpoint or (of greater relevance to the present initiative) in terms of reporting, documenting and sharing the designs and results of experiments. There are very few databases (or even lists) of experiments which have been performed, and most reports about experimental design are not detailed enough for the procedures to be repeatable and/or the results to be reproducible. As the sharing of research data gains momentum - even within a discipline as traditionally conservative as archaeology - it becomes imperative that we be able to assess the quality of the openly accessible data. This can be broken down into three aspects:

why were the data generated (context, hypothesis, etc.),

how the data were generated (samples, method, equipment and associated accuracy/precision, etc.), and

why they were generated in that way (appropriateness of the method to address the hypothesis, etc.).

Aspect #1 relates to hypothesis formulation. In order to avoid biases in the (statistical) analysis of data, the hypothesis should be formulated before generating the data. Furthermore, many experiments have been run and many methods have been applied without clear goals.

Aspect #2 relates to the metadata. Some can be extracted from the data itself (e.g. magnification of an SEM image is usually saved with the image), but some is difficult to extract, and most is difficult to extract automatically. Moreover, it is not yet clear what metadata is necessary for assessing the quality of the data for many methods. For example: the magnification of a microscopic image is obviously important because it reflects the size of the features that can be observed. But - although resolution is just as important for that same reason - the numerical aperture of the objective is almost never reported in archeological studies, and some objective manufacturers do not even provide this information.

Aspect #3 illustrates a reflexive process. Too often in science, a method is applied just because someone else did it, without questioning the appropriateness of this method for the given samples and/or hypotheses. To take a simple, extreme example, even if a published study used acid to clean flint objects, acids are definitely not appropriate to (non-destructively) clean objects made of limestone.

3. Tools

We aim to develop a set of tools to address these problems. These can be grouped into 3 categories: (1) workflows, (2) data and (3) metadata. These tools can be developed independently but ultimately should be combined into one single interface (this would require regular exchanges and feedback between the developing teams).

1) Workflows:

[Aims]

We aim to develop a flexible tool that will allow researchers to record the design of an experiment, or the protocol of a sample preparation procedure or of an analysis (hereafter grouped under the term "workflow"). This workflow should be assigned a DOI/URI that can be cited even before the experiment is run, or before the procedure or analysis is applied. Hypothesis formulation tool: while workshops or guidelines could aid in the process of developing testable hypotheses and designing experiments, the process could also be formalized by applying some kind of logic tools (the underlying ontology would follow or build upon the CRMinf Argumentation Model developed by one of the CIDOC-CRM Special Interest Groups). The aim is to make this part of the process more rigorous and transparent; a datum to which reference could be made at each step along the way. An analogous “analytics tool” would clearly define what a given process is intended to measure, and why (the important question in issues of “thick description” and defining goals [Aristotle’s “final cause”]). This may seem rather straight forward until one considers how various measures for such abstract qualities as “roughness,” although conforming to DIN norms – having been developed for use in the automotive and similar industries – do not necessarily produce a value that tells us whether a given surface is “rough” or not without need of interpretation. In both cases, there should be a final “debriefing” in order to check to see whether the initial goals had been attained. Lab protocol tool: a formal, more elaborate version of standard lab notebooks, i.e. formal tools for recording each step undertaken in the experimental/analytical process, with an emphasis placed on recording decisions made: what the various options were, and why one was chosen over another. Debriefing tool(s): some means for assessing whether the goals were attained, whether the hypothesis was proven or disproven, checking the logic of the arguments made and whether or not the conclusion follows from these, etc. This is of lesser importance this at this point of the project’s development, since simply writing the “discussion” and/or “conclusions” part of any given publication may give similar results, but this might include mark-ups or additions to the data and metadata produced along the way. Some kind of reflexive, critical “feedback” should be included, in order to give the user the opportunity to think back on the process, from beginning to end, and reflect on any lessons learned about the process itself that can then be fed into the next round of experiments and/or analytical process.

[Implementation]

As currently envisioned – subject to revision during the development process – is an interface similar to visual programming languages with predefined blocks (see below) that the user can drag into the workflow window to build his/her workflow. Steps within the overall process:

Planning (research design, including formulation of hypothesis and/or goals).

Specimen preparation

Measurement or observation

Data manipulation (selection, format change, etc.)

Data analysis (including image manipulation, evaluating suitability for various statistical tests, etc.).

Preparing data for storage

However, this tool should go beyond that simple recording of a workflow. What is actually more important is a way to edit and update the workflow, without losing the earlier versions. In other words, versioning and version control are crucial. This is because any experiment and analysis can run into unforeseen issues and may need to change from the initial idea, but the initial idea remains relevant. To do so, the changes to the workflow need to be tracked (similar to Track Changes in MS Word, or more generally Git/GitHub) and different "released" versions must be saved and assigned a DOI/URI (similar to what Zenodo does) for citation purposes. The most fundamental building blocks are obviously "hypothesis", "sample preparation" and "steps" of the workflow (the underlying ontology would follow or build upon the Scientific Observation Model [CRMsci] developed by one of the CIDOC-CRM Special Interest Groups), but several additional types of blocks must be available to edit a workflow: add "branches" (if e.g. a new route is followed), clear "dead-ends" (indicating where the given branch stops), and potential, as yet unexplored routes (future perspectives), and possibly some statement regarding expected results. Other types of blocks would be needed to connect the steps. Because it is visual/graphical, arrows are more appropriate than numbers. A workflow is however rarely linear: steps can be repeated (loops), run in parallel (divergence), the output of two steps must be combined (merging)… Another important feature that must be integrated is a means for commenting on every step and every connection between the steps. These comments could give some "best practice" information or warning on a given step, or explain why this step was performed and why in that way rather than another. The latter will allow researcher to reflect about their workflow and will surely improve the quality of the output. Finally, the whole workflow should be graphed as a tree (or similar: a directed graph consisting of nodes and edges) in a readable layout that the user can export and use in publications or simply cite using the DOI/URI. Vector graphic formats (e.g. SVG or PDF) would be preferred export format, but raster formats (e.g. PNG) should also be available.

Other important peripheral functions include:

Cross-referencing throughout the workflow.

Creation/management of units within the workflow, e.g. sample preparation, experiment, analysis steps 1, 2, 3, etc., with the ability to zoom in/out or focus on a single unit/step.

Duplicating a workflow so that others intending to perform similar experiments or analyses do not have to start from scratch.

Management of samples to follow the life history of every experimental or archeological sample. URIs are assigned to archeological objects during excavation and/or curation (TA1-2). But URIs for experimental samples can only be assigned when the samples are created, and experimental samples are usually hosted by the institute that run any given experiment. This management system for samples is especially important for sample preparation, treatment (in a statistical sense) and data analysis. This process should be developed in close collaboration with TA5.

Management of equipment, so that the user can quickly add an experimental or analytical machine. For this, a database of the types of equipment and the associated necessary settings is required (see below). This database could have two levels: a generic level for a type of equipment that would be built in the tool, and another specific level where the user could add the pieces of equipment available in a given lab (this could be saved locally). This could make use of Persistent Identifiers for instruments (Stocker et al. 2020, https://doi.org/10.5334/dsj-2020-018). This should be done in tight collaboration with TA1.

2) Data:

Ultimately, the data must be stored in a secure and open-access (when legally and ethically allowed) way. What is also important is that every piece of data is connected to the sample, to the sample preparation/experimental/analytical workflow, and to the associated metadata. There is currently a need for long-term storage of experimental and analytical protocols (including videos), results from experiments and analysis, etc. It will have to be decided whether this problem should be addressed via some centralised database committed to long-term storage and maintenance – or by some other, less central means (local storage, with some form of central registry?) – remains to be seen. The aims should be storage, maintenance, and ensuring accessibility. Videos currently uploaded to Youtube, for example, are difficult to find, and their availability over the long term is unclear. What is of prime importance for the whole endeavour is the need to ensure links between the “official” results (i.e. publications, in whatever form) and any disseminated data with the contextual (meta-)data we are trying to capture. Ideally, this data should be published under the CC BY-SA 4.0 license (or similar), so that users will in turn have to make their new data accessible. This ensures that users are also contributors, contributing to a growing database. Existing databases and repositories are rarely compatible with each other and not all of them allow the storage of all types of data produced in experiments and analytics (images, videos, tables, graphs, 3D models, etc.), even less so in a structured way. Specific maybe to experimental archaeology, no dedicated repository for the documentation and results of experiments exist. Finally, few, if any, of these repositories include tools to add metadata to the stored data. While many researchers will choose to store their data on their institution's servers or on their favorite repositories, we should also provide the possibility to store data for those who do not have access to, or do not like, such repositories. This also means that the existing data servers/repositories must be known and it must be possible to know where each resource is stored (using DOI/URI). In other words, a database, list or registry of all available databanks is needed. Finally, when allowed and wished for, we should have a tool available that allows the importation of existing data into the proposed database. This should be done in tight collaboration with TA5.

3) Metadata:

Another tool, or set of tools, that should ultimately be connected to the first ones concerns the recording and management of metadata. Ideally, tools allowing extracting the metadata directly from the data should be developed. But this is not realistic for all types of data, neither for all metadata for a given type of data. When impossible, the user will have to enter them manually. While this is time-consuming, the tool(s) we propose will guide the user in this task. In doing so, the user will know what is important to report and will not have to strain himself/herself to not forget things. Unfortunately, it is also a reality that there is barely any unity in the settings from different manufacturers. Therefore, users of one piece of equipment might be able to report some settings, while users of another piece of equipment might only have access to others. Worse, even similar settings might not be comparable between manufacturers. When available, initiatives such as the Faires Datenblatt (http://optassyst.de/fairesdatenblatt/) should be used to harmonize the settings. More concretely, we imagine an interface where, for each type of data, the user would be able to choose from a range of equipment types. Then, a list of generic and/or specific pieces of equipment (see above) should help the user identify and enter the important information (manufacturer, model, year of manufacture, software version…). The next step would guide the user with reporting the relevant settings (objective, temperature, calibration…) from a list defined for each type of equipment. For all these steps, information linking to definitions should help the user understand what each field is about. These definitions should be part of an ontology. The thesauri and ontologies commonly developed for use with standard databases may have limited utility in a project with such a wide range of activities. Problems expected to be encountered include such standards as the somewhat arbitrary distinction between “experiment” and “analytics” through to the inclusion of methods and apparatus not yet imagined.

4. Necessary standards, guidelines and education

Pre-requisites for developing these tools include standards or guidelines that need to be developed, plus encouraging the adoption of these tools by the archeological community.

1) Standards, guidelines and best practices:

Standards are difficult to develop in archaeology because of the wide diversity of methods applied. At best, we can strive for standards for each of these methods. But even then, there are often several appropriate standards for a single method. When it comes to metadata, as explained above, even if we know that one particular setting is essential (e.g. objective's numerical aperture), there will always be some manufacturers that do not provide this information. That is why, rather than standards, guidelines and/or best practices should be preferred. These should help the user to design the experiment and analyze samples in more repeatable, reproducible and comparable ways. We therefore need standard/guidelines/best practices about:

Hypothesis formulation

Experimental design

Sample size

Sample preparation

Lab standards

Documentation

Workflow

Relevant metadata

Data mining

Simulation

etc.

2) Education/outreach:

As of yet, many archeologists are not even aware of the problems mentioned above. So why would they adopt the tools we propose to develop? This is why it is important to educate the entire archeological community regarding these fundamental problems and to show why they are relevant to archaeology as a science. This educational/outreach task should be performed in close collaboration with TA6, but part of the community relevant to TA3 can only be reached through researchers active in that community. Therefore, the tools developed must not only be presented at expert conferences and workshops, but also taught at universities and integrated into the training of new lab members.

Because many archeologists are not familiar with these concepts, the tools outlined above must be intuitive and easy to use. The technical part (programming, structuring, etc.) should be accessible to those who wish to see it; most users will just want that it works. These tools must also be flexible enough to accommodate the whole range of workflows, data and metadata. We argue that, currently, the best approach is to have a proposed flexible interface; it should guide the user without constraint and should not force him/her to e.g. report settings or to publish a protocol before the publication of the paper. Everything should be possible, but not mandatory.

5. Field testing

Development of this tool would benefit from direct contact with as many experimental archaeologists and analytical labs in Germany as possible; we need to examine the wide range of activities directly, discuss the issues with potential users , tailor the tools to their needs, and encourage them to try out whatever tools we develop, so they can give us feedback as we go forward. These are the people who will use the tools we produce and the Digital Infrastructure that results, and such an infrastructure generally needs to reach a critical mass in order to be viable and self-sustaining. In the first 5 years, someone would have to visit as many EXAR and/or German EXARC members as possible just to document what they are doing; this would have the added benefit of strengthening their ties to and/or interest in the project.

6. Conclusions

Combined with proper education and outreach, the use of these tools will gain a critical mass in Germany and beyond.

Resources:

Experimental design:

Tasks and Flowchart:

Demo version:

Demo