Why Maximal Separation Matters

Why Maximal Separation Matters

If a biomarker panel achieves separation that is comparable (or worse) to other biomarker panels, then there is a risk of commoditization (i.e., any panel would do), and consequently a lab may not be able to recover its original investments.

To minimize the likelihood of the above scenario, GoldenHaystack Lab employs a different strategy:

First and foremost, we quantify the dark proteome, which to the best of our knowledge no other lab has succeeded in doing for DIA-MS datasets despites its high biological value. (It's computationally complicated to do, and it is easy to generate spurious results.)

Second, buried within the GoldenHaystack algorithm are techniques to clean up the XIC feature data, which substantially improves quantitation accuracy. (Without accurate quantitation, nothing else matters; please see our preprint for a detailed analysis of quantitation accuracy.)

Thirdly, we do not report volcano plots: those are single analyte biomarkers,and for a variety of reasons, they tend to fail validation steps (volcano plots mislead for reasons that are outside of the scope of this web page). Instead, we use sophisticated AI/ML routines and create a biomarker panel (i.e., multiple analytes) that maximally separate study conditions.

Finally, we use a number of conservative techniques to avoid overfitting the data in our AI/ML models, which we would be happy to discuss during a free consultation + pilot of your data. As an example of what all of this looks like as a final result, please see our HUPO 2025 poster.

If a biomarker panel achieves separation that is comparable (or worse) to other biomarker panels, then there is a risk of commoditization (i.e., any panel would do), and consequently a lab may not be able to recover its original investments.

To minimize the likelihood of the above scenario, GoldenHaystack Lab employs a different strategy:

First and foremost, we quantify the dark proteome, which to the best of our knowledge no other lab has succeeded in doing for DIA-MS datasets despites its high biological value. (It's computationally complicated to do, and it is easy to generate spurious results.)

Second, buried within the GoldenHaystack algorithm are techniques to clean up the XIC feature data, which substantially improves quantitation accuracy. (Without accurate quantitation, nothing else matters; please see our preprint for a detailed analysis of quantitation accuracy.)

Thirdly, we do not report volcano plots: those are single analyte biomarkers,and for a variety of reasons, they tend to fail validation steps (volcano plots mislead for reasons that are outside of the scope of this web page). Instead, we use sophisticated AI/ML routines and create a biomarker panel (i.e., multiple analytes) that maximally separate study conditions.

Finally, we use a number of conservative techniques to avoid overfitting the data in our AI/ML models, which we would be happy to discuss during a free consultation + pilot of your data. As an example of what all of this looks like as a final result, please see our HUPO 2025 poster.

<Back to previous page>

<Back to previous page>