I have files named XXX.pdfa.pdf, which differentiate them as PDF/A files from the non-PDF/A version XXX.pdf. When fed into createTraining, it produces training files such as xxx.training.segmentation.tei.xmla.training.segmentation.tei.xml - note the xmla and the duplication of segmentation.tei.xml. Looks like a simple replacement of all occurrences of "pdf".