After processing all of them, I've realized that these sections are often either being identified as figures or even as bib references, which is causing a lot of issues with my pipeline. I've also seen other various issues with documents like these (missing paragraph breaks and such), but they are largely not as problematic as these sections getting lost.
<biblStruct xml:id="b0">
<monogr>
<title level="m" type="main">Orange solid, yield: 73%. 1 H NMR (400 MHz</title>
<idno>1.7.1. 2-(3-Bromopropyl)-6-(methylamino)-1H-benzo[de]iso- quinoline-1</idno>
<imprint>
<biblScope unit="volume">3</biblScope>
</imprint>
</monogr>
<note>H)-dione (7a) CDCl 3 ): d 8.62 (d, J = 6.8 Hz, 1H), 8.53 (d, J = 8.4 Hz, 1H), 8.13 (d, J = 9.2 Hz, 1H), 7.67 (t, J = 8.4 Hz, 1H</note>
</biblStruct>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>5.
1 .</head><label>1</label><figDesc>Synthesis 5.1.1. General
procedure for the preparation of 1a-1d6-Bromobenzo[de]-isochromene-1,3-dione
(1.94 g, 7.00 mmol) was dissolved in 20 ml ethanol. Then corresponding primary
amine (7.70 mmol) was added, and the mixture was stirred at 60 °C for 5-6 h. The
mixture was cooled to room temperature and evaporated in vacuum to obtain the
residue. Then the residue was purified on silica gel chromatography (PE:EA =
10:1, V/V) to provide 1a-1d.5.1.1.1.
6-Bromo-2-methyl-1H-benzo[de]isoquinoline-1,3(2H)dione (1a).White solid, yield:
90%.<ref type="bibr" target="#b11">1</ref> H NMR (400 MHz, CDCl 3 ): d 8.68 (d,
J = 7.2 Hz, 1H), 8.59 (d, J = 8.4 Hz, 1H), 8.44 (d, J = 8.0 Hz, 1H), 8.06 (d, J
= 7.6 Hz, 1H), 7.86 (t, J = 8.4 Hz, 1H), 3.57 (s, 3H); MS(ESI) calcd for C 13 H
9 BrNO 2 [M+H] + 289.0, found: 289.0. 5.1.1.2.
6-Bromo-2-butyl-1H-benzo[de]isoquinoline-1,3(2H)dione (1b). White solid, yield:
80%. 1 H NMR (400 MHz, CDCl 3 ) d 8.66 (d, J = 7.2 Hz, 1H), 8.56 (d, J = 8.8 Hz,
1H), 8.41 (d, J = 8.0 Hz, 1H), 8.04 (d, J = 8.0 Hz, 1H), 7.85 (t, J = 8.0 Hz,
1H), 4.19 (t, J = 7.6 Hz, 1H), 1.77-1.69 (m, 2H), 1.51-1.42 (m, 2H), 1.00 (t, J
= 7.2 Hz, 3H); MS(ESI) calcd for C 16 H 15 BrNO 2 [M+H] + 331.0, found: 331.0.
5.1.1.3. 6-Bromo-2-octyl-1H-benzo[de]isoquinoline-1,3(2H)dione (1c). White
solid, yield: 85%. 1 H NMR (400 MHz, CDCl 3 ): d 8.68 (d, J = 7.2 Hz, 1H), 8.59
(d, J = 7.6 Hz, 1H), 8.44 (d, J = 8.0 Hz, 1H), 8.06 (d, J = 8.0 Hz, 1H), 7.87
(t, J = 7.6 Hz, 1H), 4.18 (t, J = 8.0 Hz, 2H), 1.78-1.71 (m, 2H), 1.47-1.29 (m,
10H), 0.89 (t, J = 7.2 Hz, 3H); MS(ESI) calcd for C 20 H 23 BrNO 2 [M+H] +
387.1, found: 387.1. 5.1.1.4.
6-Bromo-2-dodecyl-1H-benzo[de]isoquinoline-1,3(2H)dione (1d). White solid,
yield: 85%. 1 H NMR (400 MHz, CDCl 3 ): d 8.68 (d, J = 7.2 Hz, 1H), 8.59 (d, J =
8.4 Hz, 1H), 8.44 (d, J = 7.6 Hz, 1H), 8.06 (d, J = 8.0 Hz, 1H), 7.87 (t, J =
7.6 Hz, 1H), 4.18 (t, J = 7.6 Hz, 2H), 1.78-1.71 (m, 2H), 1.47-1.27 (m, 18H),
0.90 (t, J = 7.2 Hz, 3H); MS(ESI) calcd for C 24 H 31 BrNO 2 [M+H] + 443.1,
found: 443.2.</figDesc></figure>
Not really sure whats causing this, thought it would be useful to report it. I've attached the PDF that produced the above issues.
I have a large number of PDFs which have a series of sections/subsections that look like this:
After processing all of them, I've realized that these sections are often either being identified as figures or even as bib references, which is causing a lot of issues with my pipeline. I've also seen other various issues with documents like these (missing paragraph breaks and such), but they are largely not as problematic as these sections getting lost.
Heres an example of part of one of these sections getting turned into a bib ref
And from the same PDF, several of them were turned into a figure:
Not really sure whats causing this, thought it would be useful to report it. I've attached the PDF that produced the above issues.
1-s2.0-S0968089615005787-main.pdf