Fraction of Cost vs Value Calculations
How We Evaluated
Quantitation Accuracy
Let's say the cost per sample for GHL informatics processing is 1/x (where x ≫ 1) of the real, fully-loaded per sample cost of a typical DIA-MS experiment. And let's say that, as an initial brute-force proxy for value, we simply count the total number of peptides quantified when considering the dark proteome + FASTA files (i.e, using GHL informatics) vs only FASTA file (i.e., traditional DIA informatics). From the preprint, posters and paper (and best summarized in Table 6 in the paper), we know that there are up to ~10x to ~20x more quantifiable peptides in the dark-proteome than only in the FASTA search space. This gives us a rough back-of-the-envelope calculation to quantify value as follows:
Let's say the cost per sample for GHL informatics processing is 1/x (where x ≫ 1) of the real, fully-loaded per sample cost of a typical DIA-MS experiment. And let's say that, as an initial brute-force proxy for value, we simply count the total number of peptides quantified when considering the dark proteome + FASTA files (i.e, using GHL informatics) vs only FASTA file (i.e., traditional DIA informatics). From the preprint, posters and paper (and best summarized in Table 6 in the paper), we know that there are up to ~10x to ~20x more quantifiable peptides in the dark-proteome than only in the FASTA search space. This gives us a rough back-of-the-envelope calculation to quantify value as follows:
[~10 - ~20] / (1/x) = ~10x to ~20x increase in value for DIA-MS experiment when processed via GHL informatics, where x ≫ 1.
However, the above calculation assumes that the peptides in the dark-proteome are of equal value to those known from the FASTA search spaces. But that is unlikely to be true. PTMs, such as phosphorylations, glycosolations, ubitiquinations and hundreds more, play a disproportionate role not just in biology, but also in existing FDA approved biomarkers. Similarly, proteins resulting from single nucleotide polymorphisms (SNPs) and/or simply proteins with unknown sequences and/or small proteins that result for proteolytic cleavages are also disproportionally biologically influential (please see page two in paper regarding the FDA-approved biomarker for Alzheimer's.). So, if we assign "y" as the increase in likelihood that a peptide from the dark proteome is more likely to be "biologically / clinically valuable" compared to simply those peptides from the FASTA files (where y is conservatively ≫ 1), then the corrected overall value calculation is:
However, the above calculation assumes that the peptides in the dark-proteome are of equal value to those known from the FASTA search spaces. But that is unlikely to be true. PTMs, such as phosphorylations, glycosolations, ubitiquinations and hundreds more, play a disproportionate role not just in biology, but also in existing FDA approved biomarkers. Similarly, proteins resulting from single nucleotide polymorphisms (SNPs) and/or simply proteins with unknown sequences and/or small proteins that result for proteolytic cleavages are also disproportionally biologically influential (please see page two in paper regarding the FDA-approved biomarker for Alzheimer's.). So, if we assign "y" as the increase in likelihood that a peptide from the dark proteome is more likely to be "biologically / clinically valuable" compared to simply those peptides from the FASTA files (where y is conservatively ≫ 1), then the corrected overall value calculation is:
([~10 - ~20] / (1/x)) * y = ~10xy to ~20xy increase in value for DIA-MS experiment when processed via GHL informatics, where x ≫ 1 and y ≫ 1.
Finally, there is arguably considerable value in the qualitative benefits of the GHL algorithm. For example, we have been told by one PI that "there does not exist a good solution for us that can process the 1000s of DIA-MS Thermo Astral data [in any reasonable timeframe] that we are hopefully soon expecting to generate". But, we are certain that GHL can process large and numerous files in record time. Further, years ago, one of the senior-most analytical chemist in a collaborator's lab told us pointedly "if I can't see [all] the visual proof, I don't believe any software's claims." But, GHL provides deep "non-limiting" proof (i.e., one can easily and nearly instantaneously see any MS1 & 2 XIC that one is interested in, not just the ones that GHL used for quantitation or identification etc.) even (a) for large DIA-MS data files, (b) many DIA-MS files simultaneously, or (c) both. Finally, we have recently migrated our solution to a "globally clustered" relational database solution. So, if a lab has collaborators in a different part of the world, no one researcher accesses the GHL solution any slower/faster than any other: every researcher is a first-class citizen. We collectively consider all of the above qualitative value as having a multiplicative effect on overall value represented by "z", which we conservatively believe is >1, but each lab needs to determine for itself what that specific value "z" is to their lab. Putting all of the above together for the calculation of value of the GHL algorithms vs its fractional cost of 1/x of the typical true costs of a full DIA-MS project, we then get this:
Finally, there is arguably considerable value in the qualitative benefits of the GHL algorithm. For example, we have been told by one PI that "there does not exist a good solution for us that can process the 1000s of DIA-MS Thermo Astral data [in any reasonable timeframe] that we are hopefully soon expecting to generate". But, we are certain that GHL can process large and numerous files in record time. Further, years ago, one of the senior-most analytical chemist in a collaborator's lab told us pointedly "if I can't see [all] the visual proof, I don't believe any software's claims." But, GHL provides deep "non-limiting" proof (i.e., one can easily and nearly instantaneously see any MS1 & 2 XIC that one is interested in, not just the ones that GHL used for quantitation or identification etc.) even (a) for large DIA-MS data files, (b) many DIA-MS files simultaneously, or (c) both. Finally, we have recently migrated our solution to a "globally clustered" relational database solution. So, if a lab has collaborators in a different part of the world, no one researcher accesses the GHL solution any slower/faster than any other: every researcher is a first-class citizen. We collectively consider all of the above qualitative value as having a multiplicative effect on overall value represented by "z", which we conservatively believe is >1, but each lab needs to determine for itself what that specific value "z" is to their lab. Putting all of the above together for the calculation of value of the GHL algorithms vs its fractional cost of 1/x of the typical true costs of a full DIA-MS project, we then get this:
([~10 - ~20] / (1/x)) * y * z = ~10xyz to ~20xyz increase in value for DIA-MS experiment when processed via GHL informatics, where x ≫ 1 and y ≫ 1 and z > 1.
