Study of physiological and pathological information mining with biological information similarity analysis

. Chronic diseases, such as sleep apnea and Parkinson's disease, are characterized by insidious onset, complex etiology, slow course, and easy-to-cause other complications, which seriously affect life quality of the patients. Real-time monitoring of biological information can effectively reveal the occurrence and development of chronic diseases. It also helps in aspects of early diagnosis and treating options. In current study, the dynamic change rules of biological signals caused by chronic diseases have been explored, from which one can realize the auxiliary diagnosis and evaluation of these diseases. Attention has been focused on physiological fluctuation and coordination of biological information similarity, including pulse fluctuation detection in patients with sleep apnea and plantar pressure coordination assessment in patients with Parkinson's disease. In the biological similarity study, the heart rate from sleep apnea patients has been recorded two minutes before and after breath pulse. Information of the average plantar pressure from both foot of Parkinson patients has also been recorded and analyzed. Results show that: for sleep apnea patients, their heart rate fluctuation level has significantly reduced. That is because the human body enhances its sympathetic nerve activity to open the airway. The heart rate starts to change periodically, resulting in its fluctuations tending to be consistent. Compared with ordinary people, PD patients have weaker biological information similarities of plantar pressure on one foot. Also, information similarity between left and right feet of PD patients was more diversified. It revealed that the left and right foot plantar pressure fluctuated more and tended to be more consistent together with gait disorder and weakened balance. Such results show that the similarity of biological information can effectively excavate the fluctuation and coordination of physiological signals, and effectively contribute to the recognition and auxiliary diagnosis of chronic diseases. Data mining methods applied here helps to explore the physiological and pathological mechanism of the studied chronic diseases and sheds light on early diagnosis and severity assessment. It becomes more promising to develop algorithms, software, and hardware systems that is helpful for patients and facilitate promotion of human life quality and health cause.


Background
Chronic disease is a general term that refers to diseases with hidden onset, complex etiology, and a long course of the disease.Common ones include cardiovascular and cerebrovascular diseases, endocrine diseases, and mental disorders.In recent years, people have paid increasing attention to health issues and have realized the significance of physiological information detection.Physiological information detection is the first step in the diagnosis of such disease.Its crucial function is to monitor and measure the real-time changes in physiological information and identify the early symptoms of the disease by detecting abnormal conditions.By studying dynamic changes in different physiological statuses and exploring the orders in them, one can get some key data.Having medical staff use phones and other mobile devices to monitor relevant values of patients in real-time and keep on track of the physical condition from patients at all times, can effectively decrease accidents and allow timely rescue.

Sleep apnea
Sleep apnea-hypopnea syndrome (Obstructive sleep apnea, OSA) is a sleeping disorder with physiological symptoms of partial stenosis or complete collapse of the upper airway, resulting in reduction or cessation of respiratory airflow [1][2].Generally, the upper respiratory tract collapses (tongue and pharynx, etc.), which leads to increased resistance or abnormal respiratory center, then lead to hypoventilation of the respiratory tract, which causes decent pathological and physiological changes and related clinical symptoms [3][4].Such pathological and physiological changes can be identified according to symptoms, such as respiratory airflow, oxygen saturation (SPO 2), and heart rate variability (HRV) of breathing in sleep.Studies have shown that OSA is a systemic disorder, impact the cardiovascular system, nervous system, respiratory system, endocrine system, digestive system, and urinary system of patients [5].Patients with severe OSA have repeated apnea during the attack, mainly manifested as snoring and decreased oxygen saturation.Onset of the disease leads to a decrease in sleep quality, fragmentation of disturbed sleep structure, and sleep continuity.What's worse, hypoxemia, hypercapnia, and internal thoracic pressure due to repeated irregular occurrence of apnea events decreased sleep quality and sleep structure disorder and increased sympathetic nerve activity and abnormal performance.Therefore, OSA patients often show daytime drowsiness, headache, mental distress, or some psychotic symptoms.They also suffer from high incidence of cardiovascular disease or multiple organ damage, which threaten their life in severe cases [6][7][8][9][10].OSA is currently recognized as a widespread and public health-threatening disease.In the study carried out by Young, T's team [11], data from 4,925 adults was collected to measure the proportion of undiagnosed OSA.The results suggest that about 93% of women and 82% of men are potential OSA patients who may have moderate to severe OSA but have not been diagnosed yet.Therefore, it must be stated that OSA patients have an extremely large potential base in society, which should not be underestimated [14].

Parkinson's disease
Parkinson's disease (PD), or Parkinson's syndrome, is a common neurological degenerative disease in the middle-aged and the elderly population.Celebrities such as Deng Xiaoping, Muhammad Ali, Catherine Hepburn, and Hitler also died from severe Parkinson's disease.Parkinson's disease is also known as the "silent killer" because the process is slow but deadly, and there is still no perfect curing method, such as drugs, or treatment.Parkinson's disease is characterized by its progressive and multiple symptoms.The main symptoms are static tremors, abnormal action, and posture instability [15], as shown in Figure 3.At the same time, there is the "panic language" (stuttering, repetition, unclear pronunciation), "small case" (the smaller the word is), etc., and poor mental state is categorized into different degrees of depression.The key lessons of Parkinson's disease occur in the brain substantia nigra and striatum, while the cause is still unknown, mainly attributed to aging, genetics, trauma, poisoning, and inflammatory infection.Statistics in 2016 shows that about 6.1 million people worldwide are affected by PD, with an incidence rate of about 0.08%, and about 210,000 deaths [16].Figure 4 briefly shows the distribution of PD prevalence.Due to the large population and gradually aging era, PD patients are expected to increase to 4.94 million by 2030, accounting for half of the global PD patients.This will have a significant burden on China's economy and medical system [17].At present, the clinical use of the Parkinson's disease rating scale (Unified-Parkinson Disease Rating Scale, UPDRS) comprehensive and detailed assessment of the severity of PD patients, can be divided into four parts, that is the impact of PD non-motor symptoms on patients' daily life (mental, behavior, and mood), PD exercise symptoms on patients' daily life (daily activities), PD sports complications, PD movement function, respectively.The specific problem paradigm is shown in Figure 5.
In terms of clinical differentiation and diagnosis of PD, classical motor features are usually predominant, including the landmark manifestation of mobility delay in all patients, and resting tremor and rigidity in most of them.But postural reflex disorder includes abnormal flexion posture in the trunk and limbs and postural instability, usually late in PD.In addition, about 90% of PD patients will experience olfactory dysfunction, constipation, REM behavior disorder, and depression/anxiety before movement disorders [18].The evaluation and treatment of PD patients based on gait disorder is also one of the important research directions of scholars today.The gait pattern of PD patients is very different from that of healthy people.For example, the gait problems, such as reduced speed of steps, reduced length of steps, increased axial stiffness, and impaired rhythms worsen with the progression of the disease.Plantar pressure can reflect the behavioral characteristics of a patient's gait.Through the plantar pressure plate, we can easily measure the pressure center, pressure size, and the direction of the resultant force of the plantar pressure, in order to analyze the movement behavior of the subjects.
Based on the above understanding, the current research by research based on portable wearable devices (wearable bracelets) detection of pulse wave signal, the application of biological information similarity analysis method of pulse rate variability (Pulse Rate Variability, PRV) analysis to explore the influence of disease patients on pulse rate, and to explore the symptoms of OSA in patients by heart rate, nerve activity, while using the physiological information based on plantar pressure to the analysis of PD.

Biological information similarity analysis
In biological information similarity analysis (Information-based Similarity, IBS), the first step is to order the two information sequences ascending or descending and transform into the corresponding 0,1 sequence, ignoring the absolute amplitude of the signal.Then analyze the similarity between the two binary sequences, extract and quantify those parameters which can reflect the characters of the time series.Application of such method allows analysis of the correlation and consistency of two sets of physiological signals under study.In the field of bioinformatics analysis, methods are continuously being updated.For example, the team of Shan Wu [19] used the IBS method to predict obstructive sleep apnea, used IBS to analyze heart rate fluctuations in patients with severe sleep apnea, and found that the severity of OSA patients was related to heart rate variability, and IBS could be used to detect and screen patients with severe sleep apnea.Yudan Huang et al. [20] applied IBS to the cross-comparison network and derived it into a form for staging liver fibrosis.In this study, we will analyze and process the dynamic transformation orders of physiological signals, mainly based on IBS, and apply them to the identification of sleep apnea and Parkinson's disease.

Overview
The general flow chart of the study is shown in Figure 6.The data included pulse rate variability data of OSA patient group and plantar pressure data of PD patient group.First, the data of experimental research is processed by filtering and segmentation so as to extract the feature index more accurately.By analyzing the repeated patterns in the collected physiological sequences, the IBS index extracts the characteristic quantity covered by the corresponding time series and quantifies it.Using the IBS approach, we evaluated the similarity of two adjacent signal segments by analyzing the patient's shortterm local signals.Finally, the index was verified and analyzed.

Data
2.2.1.Data source.OSA data applied here were obtained from the experimental database of SunYet-Sen University.A total of 40 cases of overnight PPG signal were used, including 20 normal cases and 20 patients.The subjects were adults suspected of having OSA.They were asked to wear a commercial wearable bracelet to record PPG signals as well as a polysomnometer as the gold standard.The sampling frequency is 100 Hz.Plantar pressure data was collected from gaitpdb subset of Physionet (http://www.physionet.org/phys-iobank/database/gaitpdb).The gaitpdb database is composed of experimental data from 3 teams, as Gait Yogev (Ga), Jeffrey M. Hausdorff (Ju), Silvi Frenkel-Toledo MScPt (Si).A total of 93 PD patients and 73 healthy subjects were included.Considering the data collection scheme and the number of subjects of each experimental team, we selected the data of Si team as the source for current study, including 29 healthy people and 35 PD patients.PD data databases have also been categorized.In the treatment of PD experiment, wavelet transform is mainly used for filtering and de-noising.Wavelet transform is often used in the analysis and filtering of non-stationary signals, especially in physiological signals.Wavelet transform is an improved timefrequency analysis method based on the limitations of Fourier transform.The wavelet transform optimizes the "infinite length trigonometric function basis" in Fourier transform method, and replaces the transform basis with "finite length attenuation wavelet basis".The wavelet basis can be scaled.It can also locate the corresponding time [13].The wavelet formula is shown in formula (1), where the variable is the frequency corresponding to the scale and the variable is the time corresponding to the translation amount: is the expansion amount of the wavelet function, is the translation amount of the wavelet function.Wavelet filter can recognize threshold for de-noising of signal, so it can effectively remove high frequency noise in physiological signal.The following takes the schematic diagram before and after the plantar pressure signal filtering as an example to see the effect of wavelet filtering, as shown in Figure 7.

Biological information similarity analysis method
We use a dimensionless biological information similarity analysis method (IBS) to analyze the local variation of the signal.The biological information similarity analysis method ignores the absolute amplitude of the signal and converts the increase and decrease of the signal into the binary sequence of 0 and 1.By analyzing the repeated pattern involved in the physiological sequence, the characteristic quantity that can reflect the information buried in the time series can be extracted and quantified.By applying this method to physiological signals, the fluctuation and coordination of the signals can be analyzed.Its implementation steps are as follows: (1).Signal coarse-grained.
In order to reduce the interference of noise on the signal, the multi-time scale information of the original signal is extracted at the same time, and the signal is coarsely granulated: Where x is the original sequence, y is the coarse-grained sequence, and s is the coarse-grained scale.
By defining the decrease of the signal as 0 and the increase or change of the signal as 1, the sequence length will be converted from N to a binary sequence of length n-1, sliding a point at a time to convert the binary sequence to the corresponding decimal number, as shown in Figure 8. (3).Calculate the distance.Each word is sorted according to its frequency of occurrence, and formulas (3) -( 5) are applied to calculate the distance between adjacent signal segments, where is the normalization factor, representing the number of different M-bit words, representing the M-bit words, representing the probability of word occurrence; Represents the weight of the word; Represents the distance between two-word sequences, represents the rank of the word.The smaller the distance, the higher the similarity between the two fragments.On the contrary, the similarity between the two fragments is low.
In the information similarity method, there are two parameters, coarse-grained scale and unit word number, which need to be selected.Different combinations (coarse-grained scale 1-9, unit degree 2-9) were tried in this study.Finally, coarse-grained scale of 9 and unit number of 6 were selected for OSA analysis, coarse-grained scale of 1 and word number of 4 were selected for PD analysis.
In OSA analysis, the IBS value of all adjacent 5-minute segments was calculated for each individual, and then the average value of all IBS was calculated as the IBS value of the individual.In PD analysis, all plantar pressure signals were divided into 2s segments.By calculating and averaging the IBS values of all adjacent segments of the left and right feet respectively, the IBS values of the left and right feet and the difference IBS rIBS values of the left and right feet were obtained.

Statistical analysis
In the index verification, one-way analysis of variance is mainly used to judge whether there are significant differences between data groups.One-way variance hypothesis treats samples as: independent from each other, normally distributed, multiple groups of samples although different control variables, but should have comparability and homogeneity of variance.
The variance can be divided into two simple categories: (1) systematic bias that can be explained by a single factor; (2) random deviation that cannot be explained by single factor.According to the result analysis, the random bias is significantly less than the systematic bias, so it can be judged that there is a significant difference in the dependent variable under the control of this single factor.
In the software IBM SPSS Statistics 20, in the "single factor ANOVA" analysis, there are 14 test methods in the "assumed equal variance" option group and 4 test methods in the "not assumed homogeneity of variance" option group.In this study, LSD, Bonferroni and Games-Howell were mainly used to analyze and compare results from different groups.LSD method, the least significant difference method, uses t test to match and compare the means of each group, and do not correct the multiple comparison error rate.LSD method has the highest sensitivity but poor accuracy.Bonferroni Adjustment Method (Bonferroni Adjustment Method) is a method based on LSD method.The level of inspection will be adjusted in each comparison to control the total error rate.Bonferroni method is one of the most popular methods at present because of its wide application and conservative performance in varied comparison studies.The Games-Howell method, a commonly used test for uneven variances, allows groups to have unequal quantities.

Analysis of sleep apnea results
In the OSA experiment, PPG signals were analyzed with biological similarity methods.Results have been shown in 0257 Results for the adjacent PPI fragments of normal group and OSA patient group has been shown in Figure10.Compared with the normal group, the pulse interphase in OSA patient group has more regularity and less complexity, which indicates the damaged autonomic nervous system in OSA patients.Besides, the pulse interphase in OSA patient group resembles to that in adjacent segments, and has smaller IBS values.The trend of time interval between pulse that decreases before elevating validates the fact that there is alternative transition between slow and fast phase in OSA patients' measured heart rate.

Results of patients with Parkinson disease
In IBS analysis, the plantar pressure data segment length was 1s and the sample frequency was 100 hz, indicating 100 data points have been collected per second.Then the signal was converted into binary sequence to calculate the similarity between the continuous second.Through the IBS method, we analyzed plantar pressure signals on one foot (getting from normal group and PD patient group) with signal similarity method and made a further analysis about plantar coordination.Results from the data, which records the single-foot plantar pressure, showed that the IBS values of both normal group and PD patient group are statistically significant (p<0.001),indicating that there are significant differences of plantar pressure 2 seconds before and after the pressure between the two groups.The IBS value of left and right foot plantar pressure in normal group (averaged IBS of left foot =2.5290; IBS of right foot =2.5472) is lower than that in patient group (left foot IBS=3.0697;Right foot IBS=3.1306).The relevant data has been shown in Table 3 and Figure 11.In the experiment, the data segments were divided by time rather than gait, for example adults take 1.5 to 2 steps per second.The IBS value of 2s before and after the plantar pressure is small, which means the "distance" between biological information of 2s before and after the pressure is close, indicating that there is less difference and more similarity in the biological information of the plantar pressure in normal group, and more regularity exists in normal group.However, the variation of plantar pressure in PD patient is more complicated.By comparing the data in different groups, deduction could be made as there is a certain degree of gait disorder in PD patient.The difference between the IBS values of each participant's right and left foot plantar pressure was first calculated, then take the absolute value as the results, namely rIBS = | IBS left -IBS right|.For plantar pressure difference, the normal group has the larger IBS value (IBS = 0.2679), while the patient group has the smaller IBS values (IBS = 0.1111).What's more, IBS collected from the experiment in both normal group and patient group has statistical significance (p < 0.001).The relevant data has been shown in Table 4 and Figure 12. , when one leg is in the swinging phase, the opposite leg is more likely to be in the supporting phase, and the one side in the swinging phase or supporting phase accounts for 60% or more of a gait.Therefore, the larger rIBS in normal group suggests that there is less similarity in the gait between the two feet exits in normal group, and the feet can coordinate with each other better, embodied in relatively more periodic and stable gait.Smaller rIBS in PD patient suggests that there may be disorder in the gait cycle.It's assumed that the rIBS in the supporting phase will be elongated or the rIBS in the wiggle phase will be shortened, and coordination of the gait between the left and right foot will decrease.

Conclusion and perspective
In this study, the rule of the dynamic change of physiological information was collected and well analyzed.Also, the fluctuation and coordination of physiological signals through analysis of the similarity of biological information has been carefully studied.Taking OSA and PD as example, the pulse fluctuation of sleep apnea of OSA patients and plantar pressure coordination of Parkinson's patients were collected and studied with the biological similarity methods.Through the analysis of heart rate fluctuation based on the information similarity, one could conclude that the heart rate fluctuation of patients with sleep apnea weakened, which manifested as the heart rate sequence two minutes before and after the pulse was more similar, indicating that in patient group, the sympathetic nerve force was increased by sleep apnea, the airway was opened, and the heart rate showed fluctuated regularly.And by the analysis of plantar pressure coordination based on information similarity, it is clear to see that single-leg plantar pressure in Parkinson's patient group was significantly weaker than that in normal group, and the difference in the pressure fluctuation between left and right leg was significantly smaller than that in normal group, indicating that the gait in Parkinson's patient group is more disordered, and their balance is weakened, which leads to the increase of the fluctuation of pressure on one leg and the two legs can coordinate with each other better.It could be derived from the two cases that the analytical method based on biological information similarity is suitable for mining the change rule of biological signals caused by chronic diseases and exploring the physiological mechanism of the occurrence and development of diseases, which is expected to contribute more to early diagnosis and evaluation of chronic diseases.However, this study has not verified the hypothesis physiological mechanism, the physiological significance of indicators could not be comprehensively explained reasonably.Soon, the related work could help to explain them more precisely.It is believed that through identifying the dynamic transformation rule of physiological signals, and basing on the analysis of information similarity, one can create more advanced algorithms, software, and hardware, thus further facilitate the progress of diseases researching and promote the development of human health.

Figure 1 .
Figure 1.Schematic diagram of an obese patient with sleep and breathing disorder [12].

Figure 2 .
Figure 2. Detection and transmission of OSA-related physiological signals with different sensors [13].Note: HRV: heart rate variability; PTT: pulse transmission time; PAT: peripheral artery tension; SpO 2: blood oxygen saturation.Therefore, it must be stated that OSA patients have an extremely large potential base in society, which should not be underestimated[14].

Figure 3 .
Figure 3.Typical characteristics of PD patients shown in action.

Figure 4 .
Figure 4. Age-standardized prevalence of Parkinson's disease per 100,000 people by location in 2016 [16].Plantar pressure can reflect the behavioral characteristics of a patient's gait.Through the plantar pressure plate, we can easily measure the pressure center, pressure size, and the direction of the resultant force of the plantar pressure, in order to analyze the movement behavior of the subjects.Based on the above understanding, the current research by research based on portable wearable devices (wearable bracelets) detection of pulse wave signal, the application of biological information similarity analysis method of pulse rate variability (Pulse Rate Variability, PRV) analysis to explore the influence of disease patients on pulse rate, and to explore the symptoms of OSA in patients by heart rate, nerve activity, while using the physiological information based on plantar pressure to the analysis of PD.

Figure 5 .
Figure 5. Example of the title in the UPDRS.Note: Example of UPDRS title published by Chongqing People's Hospital and Chongqing Hospital of University of Chinese Academy of Sciences.

Figure 6 .
Figure 6.Schematic workflow of current study.

Figure 7 .
Figure 7. Panel (a) shows the averaged signal for a patient at 121s time point and panel (b) is the regionspecific zoom-in representation from panel (a).Wavelet filter can recognize threshold for de-noising of signal, so it can effectively remove high frequency noise in physiological signal.The following takes the schematic diagram before and after the plantar pressure signal filtering as an example to see the effect of wavelet filtering, as shown in Figure7.

Figure 8 .
Figure 8. Schematic diagram of constructing the M-word sequence.(3).Calculate the distance.Each word is sorted according to its frequency of occurrence, and formulas (3) -(5) are applied to calculate the distance between adjacent signal segments, where is the normalization factor, representing the number of different M-bit words, representing the M-bit words, representing the probability of word occurrence; Represents the weight of the word; Represents the distance between two-word sequences, represents the rank of the word.The smaller the distance, the higher the similarity between the two fragments.On the contrary, the similarity between the two fragments is low.From which: Proceedings of the International Conference on Modern Medicine and Global Health DOI: 10.54254/2753-8818/6/20230251

Figure 9 .
Figure 9. IBS comparison results from normal and OSA patient group.

Figure 10 .
Figure 10.Results of adjacent PPI fragments in normal group (left panel) and OSA patients (right panel).

Figure 11 .
Figure 11.IBS result of plantar pressure for normal group and PD patients (statistical significance p < 0.001).

Figure 12 .
Figure 12.Result of rIBS (p < 0.001).The difference between the IBS values of each participant's right and left foot plantar pressure was first calculated, then take the absolute value as the results, namely rIBS = | IBS left -IBS right|.For plantar pressure difference, the normal group has the larger IBS value (IBS = 0.2679), while the patient group has the smaller IBS values (IBS = 0.1111).What's more, IBS collected from the experiment in both normal group and patient group has statistical significance (p < 0.001).The relevant data has been shown in Table4and Figure12.Table4.Statistical result of rIBS.

Table 1 .
Data explanation for PD study*.Classification parameter for OSA data was the Apnea --hypopnea index (AHI), which aggregated the number of apnea of varying degrees per hour during sleep.The data were thus divided into two groups: normal group (AHI≤ 5,20 subjects) and OSA group (AHI≥5, 20 subjects).In order to better analyze the pulse wave signal, in the process of data processing, this study first uses the local median filter to remove noise and signal correction, and then through the Peak detection algorithm to obtain the original PPG signal Peak to peak intervals (PPI), namely, two heartbeat intervals.Finally, PPI sequences were divided into non-overlapping 5-minute segments for subsequent analysis.
*Note: Mean age, UPDRS and UPDRSM are expressed in the form of "mean ± standard deviation".UPDRS and UPDRSM are indicators to measure the severity of disease in PD patients, and UPDRSM is a sub-table of UPDRS.The higher the value of the two, the more serious the disease, and the two indicators are 0 in healthy people.2.2.2.Data processing.

Table 2 .
Sample size for both normal and patient group was 20.Average data length for normal group was 411.5 minutes in comparison with 420.5 minutes for patient group.Averaged IBS signals measure from normal group and patient group were 0.3428 and 0.3172, respectively.Statistical analysis shows that there is significant difference between normal group and patient group (FIG.10,p<0.001) as the IBS values measured from patients were way lower than those of normal group.

Table 2 .
IBS results from OSA.

Table 3 .
Result of IBS study.

Table 4 .
Statistical result of rIBS.