## Statystyka w archeologii, czyli dlaczego nie trzeba bać się liczb

##### Data

2021

##### Autorzy

Rajpold, Wojciech

##### Tytuł czasopisma

##### ISSN

##### Tytuł tomu

##### Wydawnictwo

Muzeum Okręgowe w Rzeszowie

Instytut Archeologii UR

Fundacja Rzeszowskiego Ośrodka Archeologicznego

Wydawnictwo „Mitel”

Instytut Archeologii UR

Fundacja Rzeszowskiego Ośrodka Archeologicznego

Wydawnictwo „Mitel”

##### Abstrakt

The great motorway research and construction investments have brought and are still bringing a huge set of new data. In 2019 alone, one million new archaeological artefacts were sourced. Therefore, there is a problem of systematic and comprehensive development of the obtained sources, in which statistics may be helpful. The article introduces selected statistical methods and shows examples of their use. It focuses on their usefulness in archaeological research, and thus it may become a boost for their wider use in the development of archaeological sources.

Archaeology, although it is a human science, draws from other areas of the world of science, especially from the achievements of natural sciences. Physics, chemistry and biology are widely used, for example in determining the chronology (C14, dendrology) or in the study of the chemical composition of artefacts (e.g. Raman spectroscopy). That is why it is not surprising that mathematics is also included in the arsenal of research methods used by archaeologists. The amount of archaeological materials widening the collection from year to year is impressive. However, it creates a huge challenge, including the one associated with the development of such a large number of sources. The artefacts obtained during excavations are massive, countable, therefore we can measure them and weigh them. So this is where statistics comes to the aid – the field of mathematics that organizes large numbers. The possibility of using statistical analyses can be found in many works of Polish researchers, and they show both richness and diversity, as well as usefulness of this field of science in archaeological studies. The first issue that should be indicated is the type of data surveyed by the statistics. There are two types: a) quantitative (measurable) – e.g. weight or length of the artefact (continuous data) or e.g. the number of coils on the pin head (discrete data); b) nominal (immeasurable; the variable gets a numerical label); they can be binary and multiple. This group also includes ordinal data – otherwise rankable – which arrange materials according to the intensity of the phenomenon. The type of data, in turn, determines the type of measurement scale which is going to be used. In archaeology, the socalled quotient scale gives the possibility to implement all statistic methods. In that case, we can measure weight and height. You can also use the so-called an ordinal scale that examines rankable data, where we grade the intensity of a specific feature. However, it is worth emphasizing that statistical data (depending on the methods) can be transformed so that they can be recorded in many ways and measured on various scales. In statistics, however, the most important issue is to examine whether the differences in the analysed groups are significant. For this purpose, the so-called chi^ 2, Kruskal-Wallis and U-Mann Whitney tests are used. The first test – the most common – compares the observed prevalence with the expected rates. With its help, you can check whether the ceramics obtained from a particular site is technologically homogeneous, for instance. The next two tests are used when data are expressed on measurable scales and are implemented to test the median. They can be used, inter alia, to check whether the differences in the thickness of the vessels from given areas are significant. What is most, commonly associated with statistics are various types of correlations, or interdependencies (e.g. Pearson correlation, Spearman’s Rank correlations, Kendall’s Tau correlation). However, it should be considered that not every correlation is a relationship between the studied features, therefore in any statistical method it is so important to exercise a certain degree of caution when reading the results. Above all, methods that allow archaeologists to group data are very useful in the considerations. The most popular are dendrite diagrams and correspondence boards. Additionally, the cut-off point on the ROC curve and associated with it the Odds Ratio diagram may also be an important method. Undoubtedly, the greatest power of statistics is mathematical modelling, which allows the researcher, based on the results of empirical research, to create a formula describing the analysed feature. There are many types of models, such as logistic, linear and decision trees. They are widely used, among others, in medicine, sociology and economics – and they are also implemented in archaeological analyses. The simplest type of model is a decision tree, which is useful for the graphic presentation of data and the possible results of the decisions taken. It is also helpful when selecting the features that have the greatest impact on the topic we are researching. Such an example is the determination of the age of the deceased (divided into adults and children) on the basis of the number of remains and the size of the urn, where two steps were noted that led to the correct indication of the age of the deceased. The logit and discriminant models are much more complicated. They can be created for two variables, e.g. age divided into adults and children, as well as for more variables, e.g. chronological phases. Basically, in these models we obtain a formula which, based on the examined features, e.g. as for the age – the number of remains and the size of the urn, and for the chronology – the thickness and colour of the walls, shows us (with a certain probability) which age group or chronology can be connected the given archaeological feature. Finally, it is worth mentioning that the importance of statistical analyses for archaeology will systematically grow, as constant influx of new materials takes place. Especially, there are some available statistical programs (e.g. Statistica, R and RStudio) which are really useful. However, it is necessary to remember that statistics is only an extremely helpful tool that we can and should use carefully.

Archaeology, although it is a human science, draws from other areas of the world of science, especially from the achievements of natural sciences. Physics, chemistry and biology are widely used, for example in determining the chronology (C14, dendrology) or in the study of the chemical composition of artefacts (e.g. Raman spectroscopy). That is why it is not surprising that mathematics is also included in the arsenal of research methods used by archaeologists. The amount of archaeological materials widening the collection from year to year is impressive. However, it creates a huge challenge, including the one associated with the development of such a large number of sources. The artefacts obtained during excavations are massive, countable, therefore we can measure them and weigh them. So this is where statistics comes to the aid – the field of mathematics that organizes large numbers. The possibility of using statistical analyses can be found in many works of Polish researchers, and they show both richness and diversity, as well as usefulness of this field of science in archaeological studies. The first issue that should be indicated is the type of data surveyed by the statistics. There are two types: a) quantitative (measurable) – e.g. weight or length of the artefact (continuous data) or e.g. the number of coils on the pin head (discrete data); b) nominal (immeasurable; the variable gets a numerical label); they can be binary and multiple. This group also includes ordinal data – otherwise rankable – which arrange materials according to the intensity of the phenomenon. The type of data, in turn, determines the type of measurement scale which is going to be used. In archaeology, the socalled quotient scale gives the possibility to implement all statistic methods. In that case, we can measure weight and height. You can also use the so-called an ordinal scale that examines rankable data, where we grade the intensity of a specific feature. However, it is worth emphasizing that statistical data (depending on the methods) can be transformed so that they can be recorded in many ways and measured on various scales. In statistics, however, the most important issue is to examine whether the differences in the analysed groups are significant. For this purpose, the so-called chi^ 2, Kruskal-Wallis and U-Mann Whitney tests are used. The first test – the most common – compares the observed prevalence with the expected rates. With its help, you can check whether the ceramics obtained from a particular site is technologically homogeneous, for instance. The next two tests are used when data are expressed on measurable scales and are implemented to test the median. They can be used, inter alia, to check whether the differences in the thickness of the vessels from given areas are significant. What is most, commonly associated with statistics are various types of correlations, or interdependencies (e.g. Pearson correlation, Spearman’s Rank correlations, Kendall’s Tau correlation). However, it should be considered that not every correlation is a relationship between the studied features, therefore in any statistical method it is so important to exercise a certain degree of caution when reading the results. Above all, methods that allow archaeologists to group data are very useful in the considerations. The most popular are dendrite diagrams and correspondence boards. Additionally, the cut-off point on the ROC curve and associated with it the Odds Ratio diagram may also be an important method. Undoubtedly, the greatest power of statistics is mathematical modelling, which allows the researcher, based on the results of empirical research, to create a formula describing the analysed feature. There are many types of models, such as logistic, linear and decision trees. They are widely used, among others, in medicine, sociology and economics – and they are also implemented in archaeological analyses. The simplest type of model is a decision tree, which is useful for the graphic presentation of data and the possible results of the decisions taken. It is also helpful when selecting the features that have the greatest impact on the topic we are researching. Such an example is the determination of the age of the deceased (divided into adults and children) on the basis of the number of remains and the size of the urn, where two steps were noted that led to the correct indication of the age of the deceased. The logit and discriminant models are much more complicated. They can be created for two variables, e.g. age divided into adults and children, as well as for more variables, e.g. chronological phases. Basically, in these models we obtain a formula which, based on the examined features, e.g. as for the age – the number of remains and the size of the urn, and for the chronology – the thickness and colour of the walls, shows us (with a certain probability) which age group or chronology can be connected the given archaeological feature. Finally, it is worth mentioning that the importance of statistical analyses for archaeology will systematically grow, as constant influx of new materials takes place. Especially, there are some available statistical programs (e.g. Statistica, R and RStudio) which are really useful. However, it is necessary to remember that statistics is only an extremely helpful tool that we can and should use carefully.

##### Opis

##### Słowa kluczowe

statistical analysis , archaeology , mathematics , metric data

##### Cytowanie

Materiały i Sprawozdania Rzeszowskiego Ośrodka Archeologicznego, t. 42/2021, s. 113–139