Home‎ > ‎


Disclaimer: The software on this page is provided on an as is basis for research purposes. There is no additional support offered, nor are the author(s) or their institutions liable under any circumstances. All codes are provided under GNU General Public License (GPL) or as a web-service, which guarantees your freedom to use the software for academic purposes.


GPU-ArraySort is a highly scalable parallel algorithm for sorting large number of arrays using a GPU. Existing techniques focus on sorting a single large array and cannot be used for sorting large number of smaller arrays in an efficient manner. Such small number of large arrays are common in many big data applications in fields such as proteomics, genomics, connectomics, and astronomy. Our algorithm performs in-place operations and makes minimum use of any temporary run-time memory. Our results indicate that we can sort up to 2 million arrays having 1000 elements each, within few seconds. We compare our results with the unorthodox tagged array sorting technique based on NVIDIA's Thrust library. GPU-ArraySort out-performs the tagged array sorting technique by sorting three times more data in a much smaller time. The developed tool and strategy is made available at this link. Relevant Publications: 

  • Muaaz Gul Awan and Fahad Saeed*, "GPU-ArraySort: A parallel, in-place algorithm for sorting large number of arrays", Proceedings of Workshop on High Performance Computing for Big Data, International Conference on Parallel Processing (ICPP-2016), Philadelphia PA, August 2016 Tech Report | IEEE Xplore


MS-Reduce is a linear-time tool that allows massive reduction in amount of mass spectrometry data without significantly reducing the quality of the peptide deduction. Our novel data-reductive strategy for analysis of Big MS data is called MS-REDUCE and is capable of eliminating noisy peaks as well as peaks that do not contribute to peptide deduction before any peptide deduction is attempted. Our experiments have shown up to 100x speed up over existing state of the art noise elimination algorithms while maintaining comparable high quality matches. Using our approach we were able to process a million spectra in just under an hour on a moderate server which will be especially useful for processing in high-throughput environments. The algorithms has been implemented in Java and code/associated data sets are available on GitHub at this linkRelevant publications: 

  • Muaaz Awan and Fahad Saeed*, "MS-REDUCE: An ultrafast technique for reduction of Big Mass Spectrometry Data for high-throughput processing"Oxford Bioinformatics, Jan 2016 Tech Report PubMed | Oxford
  • Muaaz Awan and Fahad Saeed*, "On the sampling of Big Mass Spectrometry Data", Proceedings of Bioinformatics and Computational Biology (BICoB) Conference, Honolulu Hawaii, March 2015 Tech Report

ParaDSRC is a high-performance tool for compressing next generation sequencing data using memory-distributed clusters. It uses domain decomposition and message passing interface (MPI) to distributed data on memory-distributed compute nodes. Our implementation gives near-linear speedups for most of the data sets with some evidence of super-linear speedups for some data sets. We report experimental results for up to 1 tera byte (TB). The algorithm has been implemented using C/C++ and MPI and the code is available on GitHub at this link. Relevant publications:

  • Sandino N. V. Perez and Fahad Saeed*,  "A Parallel Algorithm for Compression of Big Next Generation Sequencing Datasets", IEEE International Workshop on Parallelism in Bioinformatics (PBio), Proceedings of Parallel and Distributed Processing with Applications (IEEE ISPA-15), Helsinki Finland, August 2015 Tech Report


PhosSA is a program for phosphorylation site assignment of LC-MS/MS data. It uses a linear-time and linear space dynamic programming strategy for phosphorylation site assignment. The algorithm optimizes the objective function defined as the summation of intensity peaks that are associated with theoretical peptide fragmentation ions. A classifier introduced in the algorithm exploits the specific characteristics of mass spectrometry data to distinguish between the correctly and incorrectly assigned site(s). The algorithm has been implemented in JAVA. An executable and instruction to use the software can be downloaded at this link. Relevant publications:
  • Fahad Saeed, Trairak Pisitkun, Jason Hoffert, Guanghui Wang, Marjan Gucek, and Mark Knepper, "An Efficient Dynamic Programming Algorithm for Phosphorylation Site Assignment of Large-Scale Mass Spectrometry Data", accepted in International Workshop on Computational Proteomics, proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Philadelphia USA, Oct 2012 IEEE Xplore | PubMed
  • Fahad Saeed*, Trairak Pisitkun, Jason D. Hoffert, Sara Rashidian, Guanghui Wang, Marjan Gucek, and Mark A. Knepper, "PhosSA: Fast and Accurate Phosphorylation Site Assignment Algorithm for Mass Spectrometry Data", Proteome Science Volume 11, Supplement 1, November 2013 Proteome Science | PubMed


Cphos is a program to calculate and visualize evolutionarily conserved Phosphorylation sites. CPhos utilizes an information theory-based algorithm to assess the conservation of phosphorylation sites among species. A conservation established from this approach can be used to potentially assess the functional significance of a particular phosphorylation site. A web-service and executable are available from this link. Relevant publications:
  • Boyang Zhao, Trairak Pisitkun, Jason D. Hoffert, Mark A. Knepper, and Fahad Saeed, "CPhos: A Program to Calculate and Visualize Evolutionarily Conserved Functional Phosphorylation Sites", Wiley PROTEOMICS, August 2012 Wiley | PubMed


NHLBI-AbDesigner is a tool for analyzing the amino acid sequence of a given protein to identify optimal immunizing peptides for production of antibodies. NHLBI-AbDesigner displays the information needed for choice of immunizing peptides, allowing the user to recognize trade-offs between immunogenicity, specificity, animal species targets, and post-translational modifications. A web-service for the algorithm is available. Relevant publications:
  • Trairak Pisitkun, Jason D. Hoffert, Fahad Saeed and Mark Knepper, "NHLBI-AbDesigner: An online tool for design of peptide-directed antibodies", American Journal of Physiology (AJP), September 2011, (doi:10.1152/ajpcell.00325.2011) AJP |Pubmed

TPM algorithm

TPM algorithm clusters any time-series data set, specifically iTRAQ LC-MS/MS data sets. The data points that have a similar behavior over the time course are clustered together. A web-service for the algorithm is available. Relevant publications:
  • Fahad Saeed, Trairak Pisitkun, Mark A. Knepper, and Jason D. Hoffert, "Mining Temporal Patterns from iTRAQ Mass Spectrometry(LC-MS/MS) Data" In proceedings of Bioinformatics and Computational Biology Conference (BICoB) pp 152-159, New Orleans USA, March 23-25, 2011 (ISBN: 978-1-880843-81-9). arXiv:1104.5510v1
  • Fahad Saeed, J. Hoffert, P. Pisitkun, M. Knepper, "Mapping-based temporal pattern mining algorithm identifies unique clusters of phosphopeptides regulated by vasopressin in collecting duct", meeting abstracts Experimental Biology(EB) April 2011, Washington DC, USA

P-Pyro-Align algorithm

P-Pyro-Align is an open source parallel algorithm for multiple alignment of pyrosequencing reads from multiple genomes. The proposed alignment algorithm accurately aligns the erroneous reads and the accuracy of the alignment is confirmed from the consensus obtained from the multiple alignments. The algorithms uses domain decomposition for parallel computations of the local multiple alignments and a novel merging technique for global alignment of the reads. The proposed algorithm shows super-linear speedups for large number of reads. Note that the algorithm is for multiple alignment of reads coming from different strains of genomes which cannot be handled using mapping of the reads to a reference genome. The code has been implemented using C/C++ and MPI library (Download 375kb). Relevant publications

  • Fahad Saeed, Alan Perez-Rathke, Jarek Gwarnicki, Tanya Y. Berger-Wolf and Ashfaq Khokhar, "High performance multiple sequence alignment system for pyrosequencing reads from multiple genomes" Journal of Parallel and Distributed Computing (JPDC) August 2011 (10.1016/j.jpdc.2011.08.001) JPDC

    Pyro-Align is a open-source computationally efficient method based on domain decomposition for multiple alignment of large number of pyrosequencing reads. The proposed alignment algorithm accurately aligns the erroneous reads and the accuracy of the alignment is confirmed from the consensus obtained from the multiple alignments. Functions are provided to multiple align the read in the presence of a wildtype reference genome. A proof-of-concept java program and command-line interface is available for non-programmers (Download 32KB). Relevant publications:

    • Fahad Saeed, Ashfaq Khokhar, Osvaldo Zagordi and Niko Beerenwinkel. "Multiple Sequence Alignment System for Pyrosequencing Reads" Bioinformatics and Computational Biology (BICoB) conference, LNBI 5462, pp 362-375, 2009. arXiv:0901.2753 | Springer    
    • Fahad Saeed, "Pyro-Align: Sample-Align based Multiple Alignment system for Pyrosequencing Reads of Large Number", Technical Report, Beerenwinkel Group Computational Biology, Department of Biosystems Science and Engineering, Eth Zurich Switzerland, August 2008. arXiv:0901.2751