 |
Zhu, Weimin; Marshall, John; Smith, Christopher; Zhang, Rulin; |
Amino acid sequence pattern matching

A method for locating pattern matches in amino acids by use of various and sequential filters capable of determining inner sample pattern matches, inner group pattern matches, and word matching for purposes of further analysis or data mining. Filters include the use of a scoring scheme, comparison of scan numbers versus sequence of common ions to be MS/MS, and daughter ion subtraction for obtaining pattern match candidates.


1. A method for pattern matching between samples of amino acids comprising;
analyzing a sample mass by ionization and mass spectrometry;
merging ions from said sample mass in an inner-sample to form a first group and a control group;
providing a means for matching ions of a similar pattern wherein said means for matching includes comparing samples according to molecular weight by awarding a first Score if a particular daughter ion less list of total daughter ions is ≦±0.25 Dalton, a second Score if said daughter ion less said list of total daughter ions is ≦±0.50 Dalton, a third Score if said daughter ion less said list of total daughter ions is ≦±0.75 Dalton, and a fourth Score if said daughter ion less said list of total daughter ions is ≦±1.00 Dalton wherein a pattern match candidate is located if S >t_score; Total_Target >Td and total Query>Td; and 1≦Total_Query or Total Query/Total_Target≦2; whereby the comparison has the following values:
a. S: cumulative score;
b. Q_ratio: matched/Total_Query ratio;
c. T_ratio: matched/Total_Target ration;
d. t_score =3*((Q_ratio+T_ratio)/2) * an acceptable ratio of matched over total;
and
creating an output data file for use in data mining.
2. The method for pattern matching according to claim 1 including the step of merging daughter ions ejected within the range of about ±0.3 Dalton and recording all ions represented by only one charge state.
3. The method for pattern matching according to claim 1 including the step of creating data directories and regenerating DTA files.
4. The method for pattern matching according to claim 1 wherein said means for matching further includes a means for comparing scan numbers to sequence of common ions to be MS/MS.
5. The method for pattern matching according to claim 1 wherein said means for matching further includes a means for daughter ion subtraction.
6. The method for pattern matching according to claim 3 including a means for word matching of said files.
7. A method for pattern matching unique sequences in multiple samples of amino acids comprising the steps of:
a. analyzing a sample mass by ionization and mass spectrometry;
b. creating a working file for each ion analyzed from said sample mass, said working file including a DTA file name, scan number, parent mass and charge state, and a list of daughter ion masses and intensity pairs;
c. query each working file to obtain the parent mass (Mqi) and charge state (Cqi) for each parent ion (Qi);
d. query a target working file with said Mqi and said Cqi to obtain target DTA list (T1-Tm) having a parent mass (Mt1-Mtm) drop in the range of Mqi ±1.5 Dalton and a charge state (Ct1-Ctm) equal to Cqi;
e. comparing Qi with each cf the Tj in T1-Tm list of daughter patterns, wherein daughter ions are a match if |Qik-Tigi|≦± Dalton, where l≦g≦n and wherein said match is awarded a first Score if |Qik-Tjg|≦±0.25 Dalton, a second Score if |Qik-Tjg|≦±0.50 Dalton, a third Score if |Qik-Tjg|≦±0.75 Dalton, and a fourth Score if |Qik-Tjg|≦±1.00 Dalton, wherein the comparison is considered a pattern match candidate, to be included in a list of matched candidates QT, if
i. 5>t_score;
ii. Total_Target>Td and Total_Query>Td; and
iii 1≦Total_Query or Total_Query/Total_Target ≦2; and
f. removing daughter ions that match, wherein said pattern match candidate can be used directly or used for further analysis in data mining.
8. The method of pattern matching according to claim 7, wherein said working file is formed by merging all daughter ions ejected within the range of ±0.3 Dalton for each sample and recording DTA's represented by only one charge state.
9. A method for pattern matching unique sequences in multiple samples of amino acids comprising the steps of:
a. analyzing a sample mass by ionization and mass spectrometry;
b. creating a working file for each ion analyzed from said sample mass, said working file including a DTA file name, scan number, parent mass and charge state, and a list of daughter ion masses and intensity pairs;
c. query each working file to obtain the parent mass (Mqi) and charge state (Cqi) for each parent ion (Qi);
d. query a target working file with said Mqi and said Cqi to obtain target DTA list (T1-Tm) having a parent mass (Mt1-Mtm) drop in the range of Mqi±1.5 Dalton and a charge state (Ct1-Ctm) equal to Cqi;
e. comparing Qi with each of the Tj in T1-Tm list of daughter patterns, wherein daughter ions are a match if |Qik-Tjg|≦±Dalton, where 1≦g≦n and wherein said match is awarded a first Score if |Qik-Tjg|≦±0.25 Dalton, a second Score if |Qik-Tjg|≦±0.50 Dalton, a third Score if |Qik-Tjg|≦±0.75 Dalton, and a fourth Score if |Qik-Tjg|≦±1.00 Dalton, wherein the comparison is considered a pattern match candidate, to be included in a list of matched candidates QT, if
i. 5>t_score;
ii. Total_Target>Td and Total_Query>Td; and
iii. 1≦Total_Query or Total_Query/Total Target≦2;
f. removing daughter ions that match; and
g. calculating DQT and comparing to a standard DQT between said samples by trend; wherein said pattern match candidate can be used directly or used for further analysis in data mining.
10. A method for pattern matching unique sequences in multiple samples of amino acids comprising the steps of:
a. analyzing a sample mass by ionization and mass spectrometry;
b. creating a working file for each ion analyzed from said sample mass, said working file including a DTA file name, scan number, parent mass and charge state, and a list of daughter ion masses and intensity pairs;
c. query each working file to obtain the parent mass (Mqi) and charge state (Cgi) for each parent ion (Qi);
d. query a target working file with said Mqi and said Cqi to obtain target DTA list (T1-Tm) having a parent mass (Mt1-Mtm) drop in the range of Mqi±1.5 Dalton and a charge state (Ct1-Ctm) equal to Cqi;
e. comparing Qi with each of the Tj in T1-Tm list of daughter patterns, wherein daughter ions are a match if |Qik-Tjg|≦±Dalton, where 1≦g≦n and wherein said match is awarded a first Score if |Qik-Tjg|≦±0.25 Dalton, a second Score if |Qik-Tjg|≦±0.50 Dalton, a third Score if |Qik-Tjg|≦±0.75 Dalton, and a fourth Score if |Qik-Tjg|≦±1.00 Dalton, wherein the comparison is considered a pattern match candidate, to be included in a list of matched candidates QT, if
i. 5>t_score;
ii. Total_Target>Td and Total_Query>Td; and
iii. 1≦Total Query or Total_Query/Total_Target≦2;
f. removing daughter ions that match; and
g. comparing DQT to FQT between a sample by distance; wherein said pattern match candidate can be used directly or used for further analysis in data mining.
11. The method of pattern matching according to claim 7, including the step of pairing samples, wherein the total number of comparisons is defined as: ##EQU2##
12. The method of pattern matching according to claim 7 including the step of inserting matched patterns into an MS support software program for use in word match comparison of amino acid ions.
13. The method of pattern matching according to claim 7 including the step of reconstructing DTA directories for use in data mining.
14. A method for pattern matching unique sequences in multiple samples of amino acids comprising the steps of:
a. analyzing a sample mass by ionization and mass spectrometry;
b. creating a working file for each ion analyzed from said sample mass, said working file including a DTA file name, scan number, parent mass and charge state, and a list of daughter ion masses and intensity pairs;
c. query each working file to obtain the parent mass (Mqi) and charge state (Cqi) for each parent ion (Qi);
d. query a target working file with said Mqi and said Cqi to obtain target DTA list (T1-Tm) having a parent mass (Mt1-Mtm) drop in the range of Mqi+1.5 Dalton and a charge state (Ct1-Ctm) equal to Cqi;
e. comparing Qi with each of the Tj in T1-Tm list of daughter patterns, wherein daughter ions are a match if |Qik-Tjg|≦±Dalton, where 1≦g≦n and wherein said match is awarded a first Score if |Qik-Tjg|≦±0.25 Dalton, a second Score if |Qik-Tjg|≦±0.50 Dalton, a third Score if |Qik-Tjg|±0.75 Dalton, and a fourth Score if |Qik-Tjg|≦±1.00 Dalton, wherein the comparison is considered a pattern match candidate, to be included in a list of matched candidates QT, if
i. 5>t_score;
ii. Total_Target>Td and Total_Query>Td; and
iii 1≦Total_Query or Total_Query/Total_Target≦2;
f. removing daughter ions that match; and
g. recording common ions having matched candidates between Qi and Tj and separate unmatched candidates into separate files; wherein said pattern match candidate can be used directly or used for further analysis in data mining.
15. The method of pattern matching according to claim 14 including the step of clustering pattern matches by their parent mass and charge state having a variation of ±1.5 Dalton.
16. A method for pattern matching unique sequences in multiple samples of amino acids comprising the steps of:
a. analyzing a sample mass by ionization and mass spectrometry;
b. creating a working file for each ion analyzed from said sample mass, said working file including a DTA file name, scan number, parent mass and charge state, and a list of daughter ion masses and intensity pairs;
c. comparing scan numbers to a sequence of common ions on a linear scale;
d. query each working file to obtain the parent mass (Mqi) and charge state (Cqi) for each parent ion (Qi);
e. query a target working file with said Mqi and said Cqi to obtain target DTA list (T1-Tin) having a parent mass (Mt1-Mtm) drop in the range of Mqi±1.5 Dalton and a charge state (Ct1-Ctm) equal to Cqi;
f. comparing Qi with each of the Tj in T1-Tm list of daughter patterns, wherein daughter ions are a match if |Qik-Tjg|≦±Dalton, where 1≦g≦n and wherein said match is awarded a first Score if |Qik-Tjg|≦±0.25 Dalton, a second Score if |Qik-Tjg|≦±0.50 Dalton, a third Score if |Qik-Tjg|≦±0.75 Dalton, and a fourth Score if |Qik-Tjg|≦±1.00 Dalton, wherein the comparison is considered a pattern match candidate, to be included in a list of matched candidates QT, if
i. 5>t_score;
ii. Total_Target>Td and Total_Query>Td; and
iii. 1≦Total_Query or Total_Query/Total_Target≦2 and
g. removing daughter ions that match, wherein said pattern match candidate can be used directly or used for further analysis in data mining.
|