Digitised, searchable Holle List in Stokhof (1980)


May 23, 2023


August 1, 2023


Arts and Humanities Research Council

Creative Commons License DOI DOI
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

1 Introduction

The Holle List (hereafter HL) consists of approximately 1000 lexical items designed by K. F. Holle (1829-1896), an “eminent authority and lover of the Netherlands Indies and their people” (Stokhof 1980: 1). The HL was prepared to be dispersed across the Indonesian archipelago to gather knowledge about the linguistic situation of Indonesia, the then Dutch colony.

The HL exists in three variants (versions 1894, 1904/1911, and 1931) differing slightly in content and the order of the items. The HL in the engganolang GitHub repository and in the Oxford University Research Archive (ORA) (Rajeg 2023a) is the “new basic list (NBL)” set up by Stokhof (1980: 17, 22–72) “to facilitate comparative work” across the three different variants of the HL (see Table 1 for the interactive version and the raw file here). The NBL captures all lexical items appearing in the three variants of the HL, except those items “which never or hardly ever appeared to be filled in by the researchers” (Stokhof 1980: 17). These exception items appear as footnotes in the word list of each target language.

2 The rationale for the digitisation of the Holle List

The publication of the three variants of the Holle List as the new basic list (NBL) in Stokhof (1980) is available as an open-access PDF file under the CC BY-SA 4.0 license (license provided in the footer of the cover page in the PDF file). While the PDF itself is searchable via the basic find functionality in a PDF viewer, the list is obviously not manipulatable (e.g., when we want to filter certain items). It also cannot facilitate computational processing to automatically match the IDs of the list with the ID of the vocabulary in the target languages.

Given that the CC-BY license allows us to copy, adapt and build upon the material for any purpose, as long as we provide attribution (i.e., citation) to the original material, we decided to digitise the NBL into a fully searchable, portable format (i.e., a UTF-8 encoded, tab-separated plain text) (see Table 1 for the interactive version). The digitisation is conducted in conjunction with our AHRC-funded project to build lexical resources for Enggano (“Lexical resources for Enggano, a threatened language of Indonesia”, https://enggano.ling-phil.ox.ac.uk/). The project, amongst others, aims at bringing together a host of historical, paper-born resources available for Enggano. The Enggano vocabularies in the Holle List is one of the oldest from the late 19th century (collected in 1895 by Abs vd Noord: see Stokhof & Almanar 1987: 189); this late 19th century word list of Enggano has also been digitised and deposited on GitHub, the Oxford Research Archive (ORA), and Zenodo (Rajeg 2023b).

3 Content of the digitised Holle List

The digitised, NBL Holle List (HL) preserves the original columns. The columns containing the years for the three versions of the Holle List were renamed so that these columns do not begin with numbers. Note that the first four columns are not labelled in the original PDF. These columns are the Index, Dutch, English, and Indonesian. The Indonesian glosses were taken from the 1931 version of the HL (Stokhof 1980: 18). It is the values in the Index column that can be computationally matched with the Index in the (also digitised) word lists of the target languages (published as subsequent volumes after Stokhof (1980)); a use case of this computational matching is performed in preparing the Enggano word list that is part of the HL (see Rajeg 2023b).

The values of the English column in Table 1 are hyperlinked to the Concept sets in the Concepticon catalogue (List et al. 2023). The initial mapping of the English glosses to the Concepticon Concept sets was programmatically performed using pyconcepticon (Forkel 2022), a Python package to access and curate the Concepticon data, following the tutorial in Tjuka (2020). The output of the mapping has also been manually curated and checked (track the changes here). However, there are cases where the English glosses cannot be linked to the relevant Concept sets because they are not yet mapped in the Concepticon data. For this case, the glosses are not hyperlinked.

We added several new columns after the version years columns. One of these is the Swadesh columns (Boolean true/false), indicating whether the entries are part of the Swadesh items (true) or not (false)1 (see Stokhof 1980: 141–143). From this table, it is then possible to easily filter out the Swadesh items, something that is not possible in the PDF version, since the NBL table does not directly include a column marking which items are from the Swadesh list; we then hand-coded this Swadesh column based on the index numbers provided by Stokhof (1980: 141–143).

An additional column after the Swadesh column is the Swadesh_orig column. It lists the English forms/labels given in the Swadesh appendix (Stokhof 1980: 141–143), which could be phrased differently in the English column in the NBL. When the forms in the Swadesh and the English columns are exact matches, the Swadesh_orig column is left empty. Moreover, typo corrections were done for the entries of the three language columns (either typo from the original PDF or typo due to the first-pass OCR error) (listed in the Remark column). Finally, there are two additional tables (to the right panel of Table 1), which contain phrases and clauses from the 1904/1911 (raw file) and the 1931 (raw file) editions of the HL.

We hope that the digitised NBL of the Holle List can be helpful for, and escalate the workflow of, other researchers with computational orientation. Readers/users are also encouraged to check the original PDF list in Stokhof (1980).

# read the concepticon mapping
concepticon <- read_tsv("data/concepticon-mapping.tsv") |> 
  rename(Index = NUMBER, 
         English = GLOSS,
         Concepticon_Gloss = CONCEPTICON_GLOSS) |> 
  select(-SIMILARITY) |> 
  mutate(concept_url = paste("https://concepticon.clld.org/parameters/",
                             sep = ""))
concepticon_checked <- concepticon |> 
  filter(CHECKED == "y") |> 
  select(English, Index, Concepticon_Gloss, concept_url) |> 
  mutate(Index = as.character(Index))

holle_tb <- read_tsv("data/digitised-holle-list-in-stokhof-1980.tsv")
holle_tb <- holle_tb |> 
  # merge with the checked concepticon mapping
  left_join(concepticon_checked, by = join_by(Index, English)) |> 
  mutate(English = replace(English, is.na(English), ""))
url_eng <- '<a href="%s" target="_blank">%s</a>'
holle_tb |> 
  reactable(style = list(fontFamily = "Canela Text"),
            elementId = "digitised-holle-list",
            filterable = TRUE,
            highlight = TRUE,
            resizable = TRUE,
            bordered = TRUE,
            borderless = TRUE,
            defaultPageSize = 20,
            wrap = FALSE,
            columns = list(
              Index = colDef(align = "center",
                             sticky = "left"),
              Dutch = colDef(minWidth = 150,
                             cell = function(value, index, name) {tippy(text = value, tooltip = value)}),
              English = colDef(minWidth = 150,
                               cell = function(value, index, name) {tippy(text = if_else(!is.na(holle_tb$Concepticon_Gloss[index]),
                                                                          tooltip = value)}),
              Indonesian = colDef(minWidth = 150,
                                  cell = function(value, index, name) {tippy(text = value, tooltip = value)}),
              v1894 = colDef(align = "center"),
              `v1904/1911` = colDef(align = "center"),
              v1931 = colDef(align = "center"),
              Swadesh = colDef(align = "center"),
              Swadesh_orig = colDef(minWidth = 150),
              Concepticon_Gloss = colDef(show = FALSE),
              concept_url = colDef(show = FALSE)
Table 1: The digitised, new basic list of the Holle List in Stokhof (1980: 22–72)