wrong split?

I have tried to use the following script to generate a stimulus set with a split condition for frequency and a control_condition for number of syllables. The script does not always generate a stimulus set according to the split conditions. Some of the items have a frequency that exceeds the set limits. Any idea what is wrong with the script or lexOPS?

Best wishes Jens

[dlexdb_results.csv](https://github.com/user-attachments/files/17281171/dlexdb_results.csv)


#Demo Paket lexOPS

myLibs <- c("LexOPS", "tidyverse")
lapply(myLibs, require, character.only = TRUE)

setwd(dirname(rstudioapi::getActiveDocumentContext()$path))

#Beispiel dlexDB nur vier buchstabige Woerter

dlexDB <- read_tsv("dlexdb_results.csv", locale = locale(encoding = "UTF-8"))

colnames(dlexDB) <- c("Type", "PoSTag", "Lemma", "Silben", "AnTypeFreqN", "TypeFreqN")

dlexDB$NrSilben <- str_count(dlexDB$Silben, "-") + 1
dlexDB$TypeL <- nchar(dlexDB$Type)
dlexDB$LemmaL <- nchar(dlexDB$Lemma)

range(dlexDB$AnTypeFreqN)

stimuli <- dlexDB |>
  set_options(id_col = "Lemma") |>
  split_by(AnTypeFreqN, 10:20 ~ 200:6557) |>
  control_for(NrSilben, 0:0) |>
  generate(n = "all", match_null = "inclusive")

stimLong <- long_format(stimuli)
stimLong <- stimLong[order(stimLong$condition),]
#deskriptive Statistik
stimLong %>% group_by(condition) %>% summarise(M = mean(AnTypeFreqN),
                                               SD = sd(AnTypeFreqN),
                                               Max = max(AnTypeFreqN))


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrong split? #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

wrong split? #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions