Volume Fractions

When developing calibration models for spectrometers the reference values are often given in weight percent. This makes sense as it is easy to make up reference samples by weighing the components. But in absorbance spectroscopy the measurement is related to the volume fraction of each component [1,2]. In cases where the density of all components is about equal there is little difference between the mass and volume fraction. But when the difference in densities are large the difference is considerable.

As an example, consider a hypothetical binary liquid mixture where component A has a density of 0.8 and component B a density of 1.2. Assuming linear mixing (no volume change on mixing) a 50/50 by weight mixture is 60/40 by volume (A/B). Assuming Beer’s law, the spectrometer would see this sample as 0.6 times the pure component spectra of A plus 0.4 times the pure component spectra of B. Based on this it’s easy to see that a calibration done on volume fractions would be linear whereas one done on mass fractions would not.

Consider the 5 component mixture data collected by Windig and Stephenson [3]. This is a mixture of butanol, dichloromethane, methanol, dichloropropane and acetone. Samples were measured in the NIR (1100-2498nm, 700 wavelengths) and the data was split into 35 calibration samples (2 replicates each) and 35 test samples (also with replicates). Classical Least Squares (CLS) models were built using both mass fraction and volume fraction. The volume fractions were computed from the original mass fractions using densities of [0.810 1.326 0.790 1.160 0.790] obtained from the literature. The calibration curves are shown for mass fractions in Figure 1 and volume fractions in Figure 2. The errors for mass and volume fractions are shown in Tables 1 and 2 respectively. Because both the mass and volume fractions are in the same range (0-1) with nearly equal variance these tables are comparable. Compensating for the slight differences in variance and aggregating over all 5 analytes the model based on volume fractions has lower RMSEP by about 24%.

Figure 1. Calibration curves for Windig mixture data based on Mass Fractions.

Figure 2. Calibration curves for Windig data based on Volume Fractions.

Table 1: Root-mean-square errors of Calibration, Cross-validation and Prediction for Windig data based on Mass Fractions.

Mass Fractions	Butanol	Dichloro-methane	Methanol	Dicholoro-propane	Acetone
RMSEC	0.0102	0.0081	0.0100	0.0157	0.0098
RMSECV	0.0118	0.0090	0.0128	0.0202	0.0124
RMSEP	0.0108	0.0091	0.0079	0.0131	0.0092

Table 2: Root-mean-square errors of Calibration, Cross-validation and Prediction for Windig data based on Volume Fractions.

Volume Fractions	Butanol	Dichloro-methane	Methanol	Dicholoro-propane	Acetone
RMSEC	0.0098	0.0071	0.0062	0.0118	0.0070
RMSECV	0.0115	0.0091	0.0083	0.0160	0.0088
RMSEP	0.0096	0.0066	0.0058	0.0087	0.0063

24% is a nice reduction, but why isn’t it more? Or maybe a better way to ask this is why isn’t the calibration based on mass fractions worse than it is? It is worth noting that the estimated pure component spectra for the two cases are not the same. They are compared in Figure 3. Estimates for the pure component spectra of dichloromethane show the largest deviations, but differences exist in all components. The pure components are, of course, estimated to obtain the best fit to the observed mixture spectra so they will differ depending on whether mass or volume fraction is used. In other words, the pure spectra are derived to best fit the data whether it is in mass or volume fraction.

Figure 3. Comparison of pure component spectra estimates based on mass fractions and volume fractions.

What if you have data in mass percent and don’t know the density of the constituents? Consider the casein-glucose-lacate (CGL) data fromNæs and Isaksson[4]. This data includes 153 calibration and 78 test samples from a designed experiment, 117 wavelengths, 1104-2496 nm. These are powders, and while you can look up their densities, they are highly dependent upon how the samples have been handled. A CLS calibration based on mass fraction is shown in Figure 4. Errors are collected in Table 3. The calibration curves suggest that there is some non-linearity as might be expected if the densities of the constituents were markedly different.

Figure 4. Calibration curves for CGL data based on mass fractions.

Table 3. Root-mean-square errors of Calibration, Cross-validation and Prediction for CGL data based on Mass Fractions.

Mass Fractions	Casein	Glucose	Lactate	Moisture
RMSEC	0.0369	0.0687	0.0450	0.0096
RMSECV	0.0398	0.0717	0.0456	0.0097
RMSEP	0.0301	0.0469	0.0313	0.0068

It is possible to estimate a set of densities for the CGL system using optimization. The objective function for this is the sum of squared difference between the calibration spectra and the reconstructed spectra using the CLS estimates and the estimated volume fraction. The MATLAB function fminsearch was used to find a set of densities to minimize this error. Note that because only the ratio of densities matter, the first density for casein was set to 1. The minimum for this problem is fairly shallow and solutions get far from 1 with little improvement in the objective function so a small penalty was added for the distance of the solution from unit densities. The result is [1.00 2.03 0.66 1.00], which reduces the reconstruction error of the spectra from 2.46 to 0.405, a factor of ~6. The reality of these densities is debatable so we will refer to them as apparent densities and apparent volume fractions.

A CLS calibration using the apparent volume fractions is shown in Figure 5 and the errors are collected in Table 4. When the errors are adjusted for differences in variance upon conversion from mass to apparent volume the average error is reduced by 43%. Note that the non-linearities obvious in the mass fraction calibration curves have been mitigated. (It is also possible and perhaps even preferable to do this calibration without the moisture. When this is done very similar results including improvements when using apparent volume fractions are obtained.) If final estimates in mass fraction are desired it is of course a simple matter to convert the volume fraction estimates from the model back to mass fractions.

Figure 5. Calibration curves for CGL data based on apparent volume fractions.

Table 4. Root-mean-square errors of Calibration, Cross-validation and Prediction for CGL data based on Apparent Volume Fractions.

Volume	Casein	Glucose	Lactate	Moisture
RMSEC	0.0240	0.0197	0.0173	0.0033
RMSECV	0.0253	0.0211	0.0181	0.0034
RMSEP	0.0209	0.0162	0.0154	0.0029

In conclusion, the use of volume fractions can clearly improve calibrations. Here is it shown for CLS models but it is also true that inverse least squares models can be improved (more on that later). The amount of improvement is dependent upon the variation in the (apparent) ratio of densities of the pure components in the mixture. Provided sufficient data is available, apparent volume fractions can be estimated which in turn improves calibrations.

Happy modeling!

BMW

All results in this post were created with PLS_Toolbox 9.5 and MATLAB R2024b.

[1] H. Mark, R. Rubinovitz, D. Heaps, P. Gemperline, D. Dahm, and K. Dahm, “Comparison of the Use of Volume Fractions with Other Measures of Concentration for Quantitative Spectroscopic Calibration Using the Classical Least Squares Method,” Appl. Spectrosc. 64, 995-1006 (2010).

[2] J. Workman, “Units of Measure in Spectroscopy, Part I: It’s the Volume, Folks!“, Spectroscopy-02-01-2014.

[3] W. Windig and D.A. Stephenson, Analytical Chemistry 64, pps 2735-2742, (1992).

[4] T. Næs and T. Isaksson, NIR news 3(3), 7(1992).

Events

Mar 8, 2026