Next Article in Journal
Cognition and Cooperation in Interfered Multiple Access Channels
Next Article in Special Issue
Characterizing Complexity Changes in Chinese Stock Markets by Permutation Entropy
Previous Article in Journal
Investigation of Oriented Magnetic Field Effects on Entropy Generation in an Inclined Channel Filled with Ferrofluids
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Pretreatment and Wavelength Selection Method for Near-Infrared Spectra Signal Based on Improved CEEMDAN Energy Entropy and Permutation Entropy

School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin 150001, China
*
Author to whom correspondence should be addressed.
Entropy 2017, 19(7), 380; https://doi.org/10.3390/e19070380
Submission received: 26 June 2017 / Revised: 14 July 2017 / Accepted: 22 July 2017 / Published: 24 July 2017
(This article belongs to the Special Issue Permutation Entropy & Its Interdisciplinary Applications)

Abstract

:
The noise of near-infrared spectra and spectral information redundancy can affect the accuracy of calibration and prediction models in near-infrared analytical technology. To address this problem, the improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and permutation entropy (PE) were used to propose a new method for pretreatment and wavelength selection of near-infrared spectra signal. The near-infrared spectra of glucose solution was used as the research object, the improved CEEMDAN energy entropy was then used to reconstruct spectral data for removing noise, and the useful wavelengths are selected based on PE after spectra segmentation. Firstly, the intrinsic mode functions of original spectra are obtained by improved CEEMDAN algorithm. The useful signal modes and noisy signal modes were then identified by the energy entropy, and the reconstructed spectral signal is the sum of useful signal modes. Finally, the reconstructed spectra were segmented and the wavelengths with abundant glucose information were selected based on PE. To evaluate the performance of the proposed method, support vector regression and partial least square regression were used to build the calibration model using the wavelengths selected by the new method, mutual information, successive projection algorithm, principal component analysis, and full spectra data. The results of the model were evaluated by the correlation coefficient and root mean square error of prediction. The experimental results showed that the improved CEEMDAN energy entropy can effectively reconstruct near-infrared spectra signal and that the PE can effectively solve the wavelength selection. Therefore, the proposed method can improve the precision of spectral analysis and the stability of the model for near-infrared spectra analysis.

1. Introduction

Diabetes, which is a kind of blood glucose metabolism disorder, causes serious health problems [1]. According to the statistical data from the International Diabetes Federation (IDF), the number of people with diabetes will reach 592 million in 2025 [2]. The foundations of diabetes treatment are regular blood glucose detection, diet plans, and injected or oral insulin. Therefore, blood glucose detection is the key step to an effective diabetes treatment. The non-invasive blood glucose detection method is a painless, convenient, and affordable method. Given the development of computer technology and chemometrics in recent years, high efficiency and low-cost near-infrared spectra technologies that can perform fast analysis are widely used in non-invasive blood glucose detection [3]. The electromagnetic wavelength near-infrared light between visible light and medium infrared light ranges from 700 to 2500 nm [4]. For example, glucose molecules contain C–H, N–H, and O–H groups, and the stretching vibration of these hydrogen groups [5] forms certain strength absorption bands of frequency doubling and combined frequency in the near-infrared wavelength region. The different numbers of hydrogen groups in different concentrations of glucose will affect the intensity of the peak position. Therefore, the glucose concentration is quantitatively analyzed based on near-infrared spectra and Beer–Lambert’s law. However, different hydrogen groups have varying near-infrared characteristics. Some groups have no absorption or weak absorption capacity. The quality of the models will decline with the use of all wavelengths that build the calibration and prediction models. Moreover, the near-infrared spectra itself has some problems, such as the presence of wavelength points, overlap of spectral information, and low absorption intensity. Therefore, the pretreatment of spectra and wavelength selection method [6] are critical for simplifying and improving the predictive ability of the model before building the calibration model in near-infrared spectra analysis technology.
The acquired spectra contain not only useful information related to the glucose concentration, but also many uncorrelated noise signals. These noises will affect spectral quality and model accuracy. Thus, the removal of these useless noises is needed. At present, Empirical Mode Decomposition (EMD) is widely used in the signal denoising domain [7,8,9,10]. EMD decomposes time series to intrinsic mode functions (IMFs) and a residue that depends on time scale. EMD is an adaptive signal processing method that can effectively analyze stationary and non-stationary signals. Ensemble Empirical Mode Decomposition (EEMD) solves the mode mixing problem in the EMD method by adding white Gaussian noise; this approach also brings residue noise [11]. Complementary Ensemble Empirical Mode Decomposition (CEEMD) eliminates residue noise in the reconstructed signal by adding a pair of positive and negative signals [12]. However, this method will produce false modes. The CEEMDAN method, which has an iteration number that is half of the EEMD method, accurately completes signal reconstruction [13]. However, false modes in the early stage of CEEMDAN method and residue noise in the modes are still observed. Therefore, an improved CEEMDAN [14] method is used in denoising and reconstructing near-infrared spectra signals in this study.
The superior quantitative calibration model can be obtained through the characteristic wavelength or wavelength interval using specific method. Wavelength selection can simplify the model and reduce modeling time. Irrelevant or nonlinear variables should be eliminated to obtain an excellent calibration model with strong prediction and stability [15]. Therefore, wavelength selection procedures are particularly important when dealing with near-infrared spectra data. Common variable wavelength selection methods include correlation coefficient [16], uninformative variable elimination [17], interval partial least squares [18], successive projections algorithm (SPA) [19], simulated annealing, and genetic algorithms [20]. In this study, a wavelength selection method based on PE is proposed as a new method. C. Bandt and B. Pompe proposed a random detection method for time series, namely, permutation entropy (PE) [21,22,23,24,25,26].
This study proposed a new pretreatment and wavelength selection method. Firstly, the original near-infrared spectra signal is decomposed by using improved CEEMDAN to obtain IMFs. The critical point between useful signal modes and noisy signal modes can be identified through the value of energy entropy of each IMF. The reconstructed signal is the sum of useful signal mode and residue. The characteristic wavelengths are then selected by comparing the PE of the same wavelength interval spectra between glucose solution and pure water. Finally, the performance of the proposed method is verified by the quantitative model established with PLSR and SVR. The results show that the proposed pretreatment and wavelength selection method outperforms the other pretreatment and wavelength selection methods in near-infrared spectra analysis.

2. Related Theory

2.1. EMD Method

According to [27], the general steps of the EMD method are as follows:
(1)
Find all the maxima and minima for the signal, s ( t ) .
(2)
Obtain the upper envelope composed of all the maxima and the lower envelope composed of all the minima using the cubic spline interpolation, and define them as u ( t ) and v ( t ) respectively.
(3)
The mean of upper and lower envelope is m ( t ) = u ( t ) + v ( t ) 2 .
(4)
The difference between original signal and mean of envelope is h ( t ) = s ( t ) m ( t ) .
(5)
If h ( t ) meet the nature of IMF, then the h ( t ) is c 1 ( t ) . Otherwise, repeat steps (1)–(4) until c 1 ( t ) is obtained. The IMF needs to meet two natures, one is that the number of extreme value points and passing zero points is equal or differs at most by one point, another one is that the mean of upper and lower envelope at any point is zero.
(6)
The r 1 ( t ) = s ( t ) c 1 ( t ) , as a new signal to be analyzed, repeat the steps (1)–(5) to obtain the second IMF and the r 2 ( t ) = s ( t ) c 2 ( t ) .
(7)
Repeat the above steps and the decomposition ends when the residue r n ( t ) is a monotonic function.
Finally, a set of IMF, c 1 ( t ) , c 2 ( t ) , …, c n ( t ) and the residue r n ( t ) are obtained. Therefore, the original signal is
s ( t ) = i = 1 n c i ( t ) + r n ( t )

2.2. Improved CEEMDAN Method

According to Ref. [14], given x ( i ) = x + w i , the first mode for the CEEMDAN algorithm is
IMF 1 = E 1 ( x ( i ) ) = x ( i ) M ( x ( i ) ) = x ( i ) M ( x ( i ) )
where, x is the original signal, w i is a realization of zero mean unit variance white Gaussian noise, E 1 is a function to extract the first mode decomposed by EMD ( E 1 ( x ) = x M ( x ) ), M ( · ) is the operator that produces the local mean of the applied signal, and · is the action of averaging throughout the realization.
If only the local mean is estimated and subtracted from the original signal, IMF 1 = x M ( x ( i ) ) .
Based on the above content, the improved CEEMDAN method is described as follows:
(1)
Decompose signal x ( i ) = x + β 0 E 1 ( w i ) to obtain the first residue and first mode using the EMD algorithm.
r 1 = M ( x ( i ) )
IMF 1 = x r 1
where x is the original signal, β 0 is the standard deviation of the added white Gaussian noise, and E k ( · ) is the operator that produces the k-th mode obtained by EMD algorithm (k = 1,2,…,N, N is the total ensemble number).
(2)
When k = 2 , , N , the k-th residue is
r k = M ( r k 1 + β k 1 E k ( w i ) )
(3)
The k-th mode is
IMF k = r k 1 r k

2.3. Energy Entropy of IMF

Entropy is used to describe the irregular and complex evolution of time series. The composition changes of signal can be directly distinguished by comparing the transformation situation of some characteristics of signal entropy [28]. The IMF components decomposed by improved CEEMDAN contain the local characteristics of original signal and time scale information with different characteristics. The joint distribution of signal energy entropy with frequency and time can be accurately given through the characteristic information of signal expressed by different resolution. The concept of information entropy is introduced to the energy distribution analysis of the IMFs to describe the difference. Information entropy is a measure used to locate a system in a certain state. Information entropy is a measure of unknown degree of time series ( x 1 , x 2 , , x n ) , which can be used to estimate the complexity of the random signal. The entropy in this process is expressed by the following formula
H = p ( x ) ln p ( x ) d x
where p ( x ) is the joint probability density function of ( x 1 , x 2 , , x n ) .
Each IMF component is equally divided into N segments along the time axis. The energy of each segment is W i ( i = 1 , 2 , , N ) and the energy of the whole timeline is A. The energy of each segment is normalized to obtain energy normalized values q i = W i A . With reference to the information entropy calculation formula, the energy entropy of IMF is defined as [29]
H ( q ) = i = 1 N q i ln q i

2.4. Permutation Entropy

According to the [21], the definition of PE is:
Considering time series { x ( i ) , i = 1 , 2 , , N } with the length N, it is reconstructed in phase space to obtain the time series,
[ X ( 1 ) = { x ( 1 ) , x ( 1 + τ ) , , x ( 1 + ( m 1 ) τ ) } X ( i ) = { x ( i ) , x ( i + τ ) , , x ( i + ( m 1 ) τ ) } X ( N ( m 1 ) τ ) = { x ( N ( m 1 ) τ ) , x ( N ( m 2 ) τ ) , , x ( N ) } ]
where m and τ are the embedding dimension and delay time, respectively. Afterward, an ordinal pattern probability distribution, P = { p j , j = 1 , m ! } can be obtained from the time series by computing the relative frequencies of the m! possible permutations j. The PE is just the Shannon entropy estimated by using this ordinal pattern probability diatribution,
S p = j = 1 m ! p j l n p j
If some ordinal patterns appear more frequently than others, the PE decreases, indicating that the signal is less random and more predictable [30]. For convenience, H p is typically normalized with log m ! , namely,
H p = S p / S m a x = S p / ln ( m ! )
S m a x = ln ( m ! ) is the value obtained from an equiprobable ordinal pattern probability distribution. Therefore, the H p ranges between 0 and 1. The magnitude of H p represents the randomness degree of the time series. The smaller the value of H p is, the more inerratic the time series will be, otherwise, the more stochastic the time series will be. The change in H p reflects and amplifies the minute details of the time series.

3. Reconstruction Methods

3.1. Selection of Relevant Mode

The noisy signal, y ( t ) , can be decomposed into several modes by improved CEEMDAN algorithm as
y ( t ) = i = 1 I I M F i + r I ( t )
Equation (12) also can be expressed as the sum of noisy modes and useful signal modes as
y ( t ) = i = 1 k 1 I M F i + i = k I I M F i + r I ( t )
where the first (k − 1) modes are noisy modes, and the residual modes are the useful signal modes and residue. The critical task is to find k to reconstruct the signal. The role of signal reconstruction can also be understood as a low-pass filter. The front several high frequency IMFs (noise modes) are removed, and the low frequency IMFs (useful signal modes) are kept and added to reconstruct the signal. Given that each IMF contains different frequency components and different energy, the energy of the IMFs is measured by energy entropy to select the relevant modes effectively. According to a large number of experimental results, it is found that the energy entropy of the noise modes is around a certain value, and that of useful signal modes is around another certain value. The difference of energy entropy of noise modes or useful signal modes is a small change. The maximum energy entropy appears when the first useful signal mode comes. Therefore, a mutational point exists, which is the maximum of all energy entropy of IMFs between two kinds of modes. The mutational point that corresponds to the mode index is k . The steps of the selection of relevant mode are as follows:
(1)
Noisy signal y ( t ) is decomposed to obtain I M F i ( i = 1 , 2 , , I ) by improved CEEMDAN algorithm.
(2)
The energy entropy of each I M F i is calculated, which is denoted as E E i ( i = 1 , 2 , , I ) , where I is the number of modes obtained by improved CEEMDAN algorithm.
(3)
The relevant mode is identified as
k = argmax ( E E i )
(4)
The reconstructed signal is
y ˜ ( t ) = i = k I I M F i + r I ( t )

3.2. Application

The periodic signal y ( t ) = sin ( 2 π f 1 t ) + cos ( 2 π f 2 t ) , which has a data length of 1024, composed by different frequencies f 1 and f 2 , where f 1 = 2 Hz and f 2 = 4 Hz. The white Gaussian noise with 3 dB is added to signal y ( t ) (Figure 1). The signal is decomposed by improved CEEMDAN, where the ratio of standard deviation of added white noise is 0.2 and the ensemble number is 50. To illustrate the stability of the proposed reconstructed method, the method is tested 10 times to prove the effect of reconstruction. Each time, the noisy signal y ( t ) is decomposed by improved CEEMDAN algorithm. The energy entropy of each IMF is then calculated. Figure 2 shows that the noisy signal is decomposed into nine IMFs and one residue. The eighth and ninth modes are the useful signal modes, and the reconstructed signal is the sum of the last three modes (IMF8, IMF9, and IMF10). The energy entropy of each IMF is listed in Table 1. The maximum of energy entropy corresponds to IMF8. Therefore, the index k of mutational point is 8 (Figure 3), and the useful modes start with the eighth mode. The results of other nine tests are similar to those of the first test. The reconstructed signal is shown in Figure 4, which illustrates the energy entropy can effectively identify the noisy modes and useful modes.
To compare the reconstruction result, improved CEEMDAN energy entropy, Fourier transform (the cut-off frequency is 70 Hz), wavelet transform (the mother wavelet is db3, and the level of decomposition is 5), moving averaging (the size of sliding window is 5), and median (the dim is 2) are used to reconstruct the signal. The reconstructed performance is evaluated at various input signal to noise ratios (SNR), which range from 1 to 10 dB with a fixed step of 1 dB. The output SNR and mean square error (MSE) are calculated to quantize the reconstructed result.
SNR = 10 log 10 ( n = 1 N ( y ( n ) ) 2 n = 1 N ( y ( n ) y ¯ ( n ) ) 2 )
MSE = 1 N n = 1 N ( y ( n ) y ¯ ( n ) ) 2
where the y ( n ) is the pure signal, and the y ¯ ( n ) is the reconstructed signal. Table 2 and Table 3 are the SNR and MSE of different reconstructed signal methods. To effectively evaluate reconstructed result, the ratio of standard deviation of added white noise is 0.2 and the ensemble number is 100 in the improved CEEMDAN algorithm. The value of SNR and MSE are the average value of 10 test times. Based on the Table 2 and Table 3, we conclude that the SNR of the reconstructed method based on improved CEEMDAN energy entropy is larger than that of others. The MSE of the reconstructed method based on improved CEEMDAN with energy entropy is smaller than that of others. These results show that the proposed reconstructed method is superior to other methods.
To verify the validity of the proposed method, the non-stationary ECG signal (from the MIT-BIH normal Sinus Rhythm Database) and Blocks signal [31] with 5 dB white Gaussian noise is introduced into the experiments. The reconstructed results were then compared with Fourier transform (the cut-off frequency is 110 Hz), wavelet transform (the mother wavelet is db5, and the level of decomposition is 10), moving averaging (the size of the sliding window is 5), and median (the dim is 5). Table 4 shows the output SNR and MSE of the ECG signal and Block signal. In the table, the output SNR/MSE of the proposed method is higher/smaller than that of others. The structure of the ECG signal is different from that of the Blocks signal. These results demonstrate the extensive application of the proposed method based on improved CEEMDAN energy entropy.
To verify the validity of the proposed method for different noise distribution, the uniform distribution noise between 0 and 1 is added into the periodic signal y ( t ) , the ECG signal, and Blocks signal. The reconstructed results were then compared with Fourier transform (the cut-off frequency is 110 Hz), wavelet transform (the mother wavelet is db5, and the level of decomposition is 10), moving averaging (the size of the sliding window is 5), and median (the dim is 5). Table 5 shows the output SNR and MSE of the periodic signal y ( t ) , ECG signal, and Block signal. In the table, the output SNR/MSE of the proposed method is higher/smaller than that of others. These results demonstrate that the proposed method based on improved CEEMDAN energy entropy is effective for uniform distribution of noise.
Overall, the method of how to select the relevant mode to distinguish the noise mode and useful signal mode is explained in the Section 3.1. In Section 3.2, three kinds of signals are introduced to illustrate the effectiveness of the proposed method. The periodic signal y(t) is a stationary signal, and the two signals with different structures, ECG signal (Electrocardiogram) and Blocks signal, are non-stationary signals.

4. Results and Discussion

4.1. Near-Infrared Spectra Collection

The near-infrared spectra were measured on Antaris II FT-NIR instrument (America Thermo Company, Shanghai, China) in the spectral range of 833 nm to 2630 nm at 4 cm−1 resolution. The diagram of measure system structure is shown in Figure 5. In the measurement experiments for glucose concentration of near-infrared spectra, all glucose solutions with concentrations ranging from 50 to 1000 mg/dL are continuous and equally distributed liquid that are uniformly configured under the same conditions. The collected near-infrared spectra data of the glucose solutions are measured five times with the same concentration to obtain a small statistical error and shown in Figure 6.

4.2. Reconstruction of Near-Infrared Spectra

The noise of the collected near-infrared spectral data is removed based on the improved CEEMDAN energy entropy method. This method is performed by adding a standard deviation of added white noise of 0.2 and the ensemble number of 100. The reconstructed efficiency was compared with the proposed method, wavelet filter method (the mother wavelet is db5, and the level of decomposition is 10), moving average filter method (the size of the sliding window is 5), and median filter method (the dim is 2).The reconstructed results of near-infrared spectra for a 700 mg/dL glucose solution are shown in Figure 7. To quantify the reconstructed results and verify the effectiveness of these methods, the SNR and MSE were calculated for different methods. Given that the noisy signal was used to replace the pure signal y ( n ) in Equations (16) and (17), the evaluated results are opposite to the simulation signals, i.e., the smaller the SNR (bigger MSE) is, the better the reconstructed effect. The values of SNR and MSE of different methods are shown in Table 6. The SNR and MSE values generated by the improved CEEMDAN energy entropy method are 24.0355 and 0.0297, respectively. These values are better than those generated by other methods. The results show that the reconstructed signal based on the improved CEEMDAN energy entropy was smooth and presented the near-infrared spectra characteristics. The proposed method had excellent performance in de-noising and signal reconstruction.

4.3. Wavelength Selection of Near-Infrared Spectra

The characteristic wavelengths are selected from reconstructed near-infrared spectra data of the glucose solution. Full spectrum wavelength data have a total of 1867 points, which are divided into wavelength intervals with a rolling window. The rolling window size W is chosen according to the rule W > 5 m ! [32,33], where m is the order of ordinal patterns or embedding dimension. The permutation entropy of each wavelength interval is calculated with an embedding dimension of 4 and a delay time of 1 in this experiment. Therefore, the window size is larger than 120. However, some permutation entropy of the spectral absorption peak will be missed with an extremely large rolling window size. Given these conditions, the window size is chosen as 130 for the wavelength selection of near-infrared spectra. To illustrate the proposed method, the four different concentrations of glucose solutions are used in the calculation. The calculated results of glucose solutions with 50, 500, and 1000 mg/dL, and a pure water solution are shown in Figure 8. As shown in the figure, PE values in some wavelength intervals are substantially consistent and significantly different in other wavelength intervals. Therefore, the later wavelength intervals are the characteristic wavelengths that contained abundant glucose concentration information. All of the non-overlapping intervals are considered as the final characteristic wavelengths (Table 7). By combining the Figure 6 and Table 7, the result shows that the selected characteristic wavelengths contain the peak position of near-infrared spectra, which correspond to the peak of glucose absorption.
To verify the effectiveness of the proposed method, the characteristic wavelength of the reconstructed spectral data of glucose solutions with the proposed method, mutual information method [34], SPA method [35], PCA method [36], and full spectral data are integrated into the calibration models established by PLSR [37] and SVR [38] ( ε = [ 0 , 0.2 ] , C = [ 1 , 10 8 ] , γ = [ 0.01 , 2 ] ). The correlation coefficient and root mean square error of prediction (RMSEP) of the model are evaluated.
R = 1 ( y ^ i y i ) 2 ( y ^ ¯ i y i ) 2
R M S E P = ( y ^ i y i ) 2 n 1
where, n is the sample quantity of the calibration set, y i is the true value of the ith sample, y ^ i is the predicted value of the i-th sample, and y ^ ¯ i is the average value of y ^ i of all the samples in the calibration set.
The characteristic wavelengths selected based on the permutation entropy are 375, which is lower than the points of full spectral wavelength. The smaller the selected characteristic wavelength points are, the shorter the established model time. The experimental results of PLSR and SVR calibration model (Table 8) show that the correlation coefficient (R) and RMSEP of established calibration model by characteristic wavelengths that were selected based on the improved CEEMDAN energy entropy method reach 0.9999/0.9998 and 0.9125/0.9089. This result is better than that of the established calibration model by characteristic wavelengths that were selected based on MI method, SPA method, PCA method, and full spectral data. The overall modeling results of SVR are more reliable than that of PLSR modeling. The errors between the predicted values and the true values are calculated and those between the predicted values and true values are provided in Figure 9.

5. Conclusions

This study proposed a novel pretreatment and wavelength selection method for near-infrared spectra signal using the improved CEEMDAN energy entropy and permutation entropy. In terms of signal reconstruction, Fourier transform, wavelet transform, moving averaging, and median are compared to remove noise with different input SNRs. The reconstructed results show that the proposed method based on the improved CEEMDAN energy entropy works best. By utilizing the near-infrared spectral data of glucose solutions as the object, full spectral data are reconstructed by the improved CEEMDAN energy entropy to remove noise. To select the characteristic wavelength, the reconstructed near-infrared spectra are divided with certain interval points. The PE values of wavelength intervals are then calculated. The PLSR and SVR models are introduced to establish the calibration model with characteristic wavelength selection using the PE method, MI method, SPA method, PCA method, and full spectral data. According to the correlation coefficient and RMSEP of the calibration models, the proposed wavelength selection method effectively solves the redundancy problem of near-infrared spectral data. This approach also improves the robustness and predictive ability of the regression model. Therefore, the proposed method can remove the useless noise information and reduce the effective range of data to establish stable, accurate, and practicable quantitative models.

Acknowledgments

The authors are grateful for comments and suggestions by anonymous reviewers and the Associate Editor for their valuable contribution in improving the quality of the paper significantly. This work was supported by the Fundamental Research Funds for the Central Universities (Grant No. HIT. IBRSEM. 201307) and Program for Harbin City Science and Technology Innovation Talents of Special Fund Project (Grant No. 2014RFXXJ065).

Author Contributions

Xiaoli Li conceived the algorithm and wrote the manuscript. Chengwei Li and Xiaoli Li designed and performed the experiment. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Coster, S.; Gulliford, M.C.; Seed, P.T.; Powrie, J.K.; Swaminathan, R. Monitoring blood glucose control in diabetes mellitus: A systematic review. Health Technol. Assess. 2000, 4, 1–93. [Google Scholar]
  2. Guariguata, L.; Whiting, D.R.; Hambleton, I.; Beagley, J.; Linnenkamp, U.; Shaw, J.E. Global estimates of diabetes prevalence for 2013 and projections for 2035. Diabetes Res. Clin. Pract. 2014, 103, 137–149. [Google Scholar] [CrossRef] [PubMed]
  3. Ce, F.D.A.; Wolf, B. Current development in non-invasive glucose monitoring. Med. Eng. Phys. 2008, 30, 541–549. [Google Scholar]
  4. Tang, J.Y.; Wang, M.H.; Chen, M.K.; Jang, L.S. Glucose detection using an electro-optical fluidic device based on pulse width modulation. In Proceedings of the Seventh International Conference on Sensing Technology, Wellington, New Zealand, 3–5 December 2013; pp. 325–329. [Google Scholar]
  5. Wabomba, M.; Small, G.W.; Arnold, M.A. Evaluation of selectivity and robustness of near-infrared glucose measurements based on short-scan Fourier transform infrared interferograms. Anal. Chim. Acta 2003, 490, 325–340. [Google Scholar] [CrossRef]
  6. Mobley, P.R.; Kowalski, B.R.; Workman, J.J., Jr.; Bro, R. Review of Chemometrics Applied to Spectroscopy: 1985–95, Part 2. Appl. Spectrosc. Rev. 1996, 31, 347–368. [Google Scholar] [CrossRef]
  7. Li, X.; Mei, D.Q.; Chen, Z.C. Feature extraction of chatter for precision hole boring processing based on EMD and HHT. Opt. Precis. Eng. 2011, 19, 1291–1297. [Google Scholar]
  8. Lu, L.; Yan, G.Z.; Zhao, K.; Xu, F. Analysis of human colonic motility using EEMD. Opt. Precis. Eng. 2015, 23, 1580–1586. [Google Scholar] [CrossRef]
  9. Luo, Y.K.; Luo, S.T.; Luo, F.L. Realization and improvement of laser ultrasonic signal denoising based on empirical mode decomposition. Opt. Precis. Eng. 2013, 21, 479–487. [Google Scholar] [CrossRef]
  10. Jiang, L.H.; Gai, J.Y.; Wang, W.B.; Xiong, X.L.; Liang, S.; Sheng, X.Z. Ensemble Empirical Mode Decomposition Based Event Classification Method for the Fiber-Optic Intrusion Monitoring System. Acta Opt. Sin. 2015, 10, 52–58. [Google Scholar] [CrossRef]
  11. Zhao, H.W.; Norden, E.H. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar]
  12. Yeh, J.R.; Shieh, J.S.; Huang, N.E. Complementary Ensemble Empirical Mode Decomposition: A Novel Noise Enhanced Data Analysis Method. Adv. Adapt. Data Anal. 2011, 2, 135–156. [Google Scholar] [CrossRef]
  13. Torres, M.E.; Colominas, M.A.; Schlotthauer, G.; Flandrin, P. A complete ensemble empirical mode decomposition with adaptive noise. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 4144–4147. [Google Scholar]
  14. Colominas, M.A.; Schlotthauer, G.; Torres, M.E. Improved complete ensemble EMD: A suitable tool for biomedical signal processing. Biomed. Signal Process. Control 2014, 14, 19–29. [Google Scholar] [CrossRef]
  15. Thomas, E.V. A primer on multivariate calibration. Anal. Chem. 1994, 66, 795A–804A. [Google Scholar] [CrossRef]
  16. Wu, W.; Walczak, B.; Massart, D.L.; Prebble, K.A.; Last, I.R. Spectral transformation and wavelength selection in near-infrared spectra classification. Anal. Chim. Acta 1995, 315, 243–255. [Google Scholar] [CrossRef]
  17. Centner, V.; Massart, D.L.; Noord, O.E.D.; de Jong, S.; Vandeginste, B.M.; Sterna, C. Elimination of Uninformative Variables for Multivariate Calibration. Anal. Chem. 1996, 68, 3851–3858. [Google Scholar] [CrossRef] [PubMed]
  18. Norgaard, L.; Saudland, A.; Wagner, J.; Nielsen, J.P.; Munck, L.; Engelsen, S.B. Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy. Appl. Spectrosc. 2000, 54, 413–419. [Google Scholar] [CrossRef]
  19. Araújo, M.C.U.; Saldanha, T.C.B.; Galvão, R.K.H.; Yoneyama, T.; Chame, H.C.; Visani, V. The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemom. Intell. Lab. Syst. 2001, 57, 65–73. [Google Scholar] [CrossRef]
  20. Wang, L.Q.; Ge, H.F.; Li, G.B.; Yu, D.Y.; Hu, L.Z.; Jiang, L.Z. Characteristic Wavelength Variable Optimization of Near-Infrared Spectroscopy Based on Kalman Filtering. Spectrosc. Spectr. Anal. 2014, 34, 958–961. [Google Scholar]
  21. Bandt, C.; Pompe, B. Permutation entropy: A natural complexity measure for time series. Phys. Rev. Lett. 2002, 88, 174102. [Google Scholar] [CrossRef] [PubMed]
  22. Zunino, L.; Soriano, M.C.; Fischer, I.; Rosso, O.A.; Mirasso, C.R. Permutation-information-theory approach to unveil delay dynamics from time-series analysis. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2010, 82, 565–590. [Google Scholar] [CrossRef] [PubMed]
  23. Li, X.; Cui, S.; Voss, L.J. Using Permutation Entropy to Measure the Electroencephalographic Effects of Sevoflurane. Anesthesiology 2008, 109, 448–456. [Google Scholar] [CrossRef] [PubMed]
  24. Yuedan, L.; Taesoo, C.; Hunki, B.; Younghae, D.; Jin, H.C.; Yun, D.C. Permutation entropy applied to movement behaviors of Drosophila Melanogaster. Mod. Phys. Lett. B 2011, 25, 1133–1142. [Google Scholar]
  25. Zanin, M.; Zunino, L.; Rosso, O.A.; Papo, D. Permutation Entropy and Its Main Biomedical and Econophysics Applications: A Review. Entropy 2012, 14, 1553–1577. [Google Scholar] [CrossRef]
  26. Bian, C.; Qin, C.; Ma, Q.D.; Shen, Q. Modified permutation-entropy analysis of heartbeat dynamics. Phys. Rev. E 2012, 85, 021906. [Google Scholar] [CrossRef] [PubMed]
  27. Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, W.; Yen, N.; Tung, C.C.; Liu, H.H.; Yen, N.C.; et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. R. Soc. Lond. Proc. 1988, 454, 903–995. [Google Scholar] [CrossRef]
  28. Freeland, R.S.; Odhiambo, L.O. Subsurface characterization using textural features extracted from GPR data. Trans. ASABE 2007, 50, 287–293. [Google Scholar] [CrossRef]
  29. Chen, W.G.; Deng, B.F.; Bin, Y. Fault Recognition for High Voltage Circuit Breaker Based on EMD of Vibration Signal and Energy Entropy Characteristic. High Volt. Appar. 2009, 45, 90–96. [Google Scholar]
  30. Zunino, L.; Olivares, F.; Scholkmann, F.; Rosso, O.A. Permutation entropy based time series analysis: Equalities in the input signal can lead to false conclusions. Phys. Lett. A 2017, 381, 1883–1892. [Google Scholar] [CrossRef]
  31. Li, C.; Zhan, L. A hybrid filtering method based on a novel empirical mode decomposition for friction signals. Meas. Sci. Technol. 2015, 26, 125003. [Google Scholar] [CrossRef]
  32. Amigó, J.M.; Zambrano, S.; Sanjuán, M.A.F. Combinatorial detection of determinism in noisy time series. EPL 2008, 83, 60005. [Google Scholar]
  33. Amigó, J.M. Permutation Complexity in Dynamical Systems: Ordinal Patterns, Permutation Entropy and All That; Springer Publishing Company, Inc.: New York, NY, USA, 2012. [Google Scholar]
  34. Li, C.; Zhan, L.; Shen, L. Friction Signal Denoising Using Complete Ensemble EMD with Adaptive Noise and Mutual Information. Entropy 2015, 17, 5965–5979. [Google Scholar] [CrossRef]
  35. Brègman, L.M. Certain properties of nonnegative matrices and their permanents. Doklady Akademii Nauk SSSR 1973, 14, 27–30. [Google Scholar]
  36. Colucci, J.A.; Fontalvogómez, M.; Velez, N.; Romanach, R.J. In-Line Near-Infrared (NIR) and Raman Spectroscopy Coupled with Principal Component Analysis (PCA) for In Situ Evaluation of the Transesterification Reaction. Appl. Spectrosc. 2013, 67, 1142–1149. [Google Scholar]
  37. Martens, H.; Jensen, S.A. Partial least squares regression: A new two-stage NIR calibration method. In Developments in Food Science, Vol. 5A. Progress in Cereal Chemistry and Technology; Elsevier: Amsterdam, The Netherlands, 1983. [Google Scholar]
  38. Safavi, H.R.; Esmikhani, M. Conjunctive use of surface water and groundwater: Application of support vector machines (SVMs) and genetic algorithms. Water Resour. Manag. 2013, 27, 2623–2644. [Google Scholar] [CrossRef]
Figure 1. Noisy signal.
Figure 1. Noisy signal.
Entropy 19 00380 g001
Figure 2. The IMF obtained by improved CEEMDAN algorithm.
Figure 2. The IMF obtained by improved CEEMDAN algorithm.
Entropy 19 00380 g002
Figure 3. The energy entropy of each IMF.
Figure 3. The energy entropy of each IMF.
Entropy 19 00380 g003
Figure 4. The pure signal and reconstructed signal.
Figure 4. The pure signal and reconstructed signal.
Entropy 19 00380 g004
Figure 5. The diagram of measure system structure.
Figure 5. The diagram of measure system structure.
Entropy 19 00380 g005
Figure 6. The near-infrared spectral data of glucose solution.
Figure 6. The near-infrared spectral data of glucose solution.
Entropy 19 00380 g006
Figure 7. The reconstructed results of four methods (700 mg/dL). (a) Proposed method; (b) Wavelet; (c) Moving averaging; (d) Median.
Figure 7. The reconstructed results of four methods (700 mg/dL). (a) Proposed method; (b) Wavelet; (c) Moving averaging; (d) Median.
Entropy 19 00380 g007
Figure 8. The PE of different segmented spectral data.
Figure 8. The PE of different segmented spectral data.
Entropy 19 00380 g008
Figure 9. The errors and the predicted values of two methods (a) SVR model (b) PLSR model.
Figure 9. The errors and the predicted values of two methods (a) SVR model (b) PLSR model.
Entropy 19 00380 g009
Table 1. The energy entropy of each IMF.
Table 1. The energy entropy of each IMF.
IMFIMF1IMF2IMF3IMF4IMF5IMF6IMF7IMF8IMF9IMF10
Energy Entropy0.35090.05860.12260.09390.07510.04710.05560.36640.35260.0680
Table 2. Value of SNR for different reconstructed signal methods.
Table 2. Value of SNR for different reconstructed signal methods.
Input SNR (dB)Improved CEEMDANFourier TransformWaveletMoving AveragingMedian
117.824816.056015.034114.578712.2264
218.043817.306715.792415.521314.3009
318.494517.827116.615616.569215.5220
420.850518.747917.350817.035315.7923
520.933220.090019.393517.961815.9758
621.484020.941020.928020.568716.9514
722.008521.952821.278721.012417.8478
822.406822.020321.944121.053319.0951
923.319522.908722.848022.263619.3020
1023.949423.627423.581023.282920.4236
Table 3. Value of MSE for different reconstructed signal methods.
Table 3. Value of MSE for different reconstructed signal methods.
Input SNR (dB)Improved CEEMDANFourier TransformWaveletMoving AveragingMedian
10.01650.02340.03140.03480.0599
20.01570.02070.02630.02800.0371
30.01410.01870.02180.02200.0280
40.00820.01200.01840.01980.0263
50.00810.01020.01150.01600.0252
60.00710.00750.00810.00880.0202
70.00630.00690.00740.00790.0164
80.00570.00600.00640.00780.0123
90.00470.00490.00520.00590.0117
100.00400.00420.00440.00470.0091
Table 4. Values of SNR and MSE of different reconstructed methods for ECG signal and Blocks signal (input SNR = 5 dB).
Table 4. Values of SNR and MSE of different reconstructed methods for ECG signal and Blocks signal (input SNR = 5 dB).
MethodsECGBlocks
SNRMSESNRMSE
Improved CEEMDAN35.69250.460520.14450.0853
Wavelet32.81870.892414.86040.2881
Median31.56521.191017.52860.1558
Moving Averaging30.73281.442719.62350.0962
Fourier Transform30.35291.574416.36530.2037
Table 5. Values of SNR and MSE of different reconstructed methods for signals y ( t ) , ECG and Blocks with uniform distribution noise.
Table 5. Values of SNR and MSE of different reconstructed methods for signals y ( t ) , ECG and Blocks with uniform distribution noise.
Methods y ( t ) ECGBlocks
SNRMSESNRMSESNRMSE
Improved CEEMDAN6.16690.241737.08010.334514.81960.2908
Wavelet5.65900.271735.77990.451314.39900.3204
Moving Averaging5.60520.275130.40341.556314.09420.3436
Fourier Transform5.48670.282730.93401.377313.80940.3669
Median5.09630.309332.18641.032314.55610.3090
Table 6. Value of SNR and MSE for different reconstructed signal methods.
Table 6. Value of SNR and MSE for different reconstructed signal methods.
MethodsSNRMSE
Improved CEEMDAN24.03550.0297
Wavelet26.31360.0178
Moving Averaging27.79170.0125
Median28.07760.0117
Table 7. Selection of characteristic wavelengths.
Table 7. Selection of characteristic wavelengths.
NumberPoint NumberWavenumber (cm−1)
1218–3015639–5959
2701–7917502–7849
3942–11418431–9120
Table 8. R and RMSEP of SVR model and PLSR model.
Table 8. R and RMSEP of SVR model and PLSR model.
MethodsSVRPLSR
RRMSEPRRMSEP
Improved CEEMDAN-FD0.99990.91250.99980.9089
SPA0.98920.81950.98780.8002
MI0.97900.76040.96580.7019
PCA0.96210.75420.94030.6958
Full wave bands0.89880.54990.81470.5013

Share and Cite

MDPI and ACS Style

Li, X.; Li, C. Pretreatment and Wavelength Selection Method for Near-Infrared Spectra Signal Based on Improved CEEMDAN Energy Entropy and Permutation Entropy. Entropy 2017, 19, 380. https://doi.org/10.3390/e19070380

AMA Style

Li X, Li C. Pretreatment and Wavelength Selection Method for Near-Infrared Spectra Signal Based on Improved CEEMDAN Energy Entropy and Permutation Entropy. Entropy. 2017; 19(7):380. https://doi.org/10.3390/e19070380

Chicago/Turabian Style

Li, Xiaoli, and Chengwei Li. 2017. "Pretreatment and Wavelength Selection Method for Near-Infrared Spectra Signal Based on Improved CEEMDAN Energy Entropy and Permutation Entropy" Entropy 19, no. 7: 380. https://doi.org/10.3390/e19070380

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop