The following are codes that have been built and are maintained by the HPC Team.
There is an RSS Feed available at https://hpc.unt.edu/softwareupdate that will update anytime software is either added or updated on Talon2.
If you would like to request a software program that is not listed below, please fill out our Software Request Form at https://hpc.unt.edu/software-requests
"Amber" refers to two things: a set of molecular mechanical force fields for the simulation of biomolecules (which are in the public domain, and are used in a variety of simulation programs); and a package of molecular simulation programs which includes source code and demos.
Compiled using Intel compiler and its MKL math libraries. Serial, parallel (openmpi-1.6.5/intel14), and cuda(5.0.35) versions are available.
AmberTools is distributed in source code format, and must be compiled in order to be used. You will need C, C++ and Fortran90 compilers.
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
Blat produces two major classes of alignments:
For compilation and build notes, visit http://genome.ucsc.edu/admin/git.html
Boost provides free peer-reviewed portable C++ source libraries.
We emphasize libraries that work well with the C++ Standard Library. Boost libraries are intended to be widely useful, and usable across a broad spectrum of applications. The Boost license encourages both commercial and non-commercial use.
Compiled using gcc.
Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).
Bowtie is a prerequisite for tophat.
CafeMol is a general-purpose coarse-grained(CG) biomolecular modeling and simulation software.It can simulate proteins,nucleic acids,lipids and their mixture with various CG models.
Compiled with intel-mpif90
An open-source modeling system for multi-scale integrated assessment of gaseous and particulate air pollution.
Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions. Circos is ideal for creating publication-quality infographics and illustrations with a high data-to-ink ratio, richly layered data and pleasant symmetries.
CMake is an open-source, cross-platform family of tools designed to build, test and package software. CMake is used to control the software compilation process using simple platform and compiler independent configuration files, and generate native makefiles and workspaces that can be used in the compiler environment of your choice. The suite of CMake tools were created by Kitware in response to the need for a powerful, cross-platform build environment for open-source projects such as ITK and VTK.
Installed using gcc.
CP2K is a program to perform atomistic and molecular simulations of solid state, liquid, molecular, and biological systems. It provides a general framework for different methods such as e.g., density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW) and classical pair and many-body potentials.
Serial version is compiles using gfortran.
Parallel version compiled using intel/mkl (13.1)
The CPMD code is a parallelized plane wave / pseudopotential implementation of Density Functional Theory, particularly designed for ab-initio molecular dynamics.
CPMD is jointly copyrighted by IBM Corporation and Max-Planck Institut, Stuttgart.
Parallel version is compiled with Intel/MKL 13.1
CUDA is a parallel computing platform and programming model, which interfaces CPU and of the graphics processing unit (GPU).
Cufflinks is a reference-guided assembler for RNA-Seq experiments. It simultaneously assembles transcripts from reads and estimates their relative abundances, without using a reference annotation.
compiled using GCC suite.
dDocent’s purpose is to be a standalone laboratory protocol and analysis pipeline for double digest Restriction site Associated DNA (ddRAD) sequencing (the pipeline should also work with ezRAD). The laboratory protocol largely follows Peterson et al. (2012), but is focused down to specifically what has worked best for us in the Gold lab.
Self retriving and installing with gcc.
DFTB+ is a fast and efficient versatile quantum mechanical simulation package.
DIRAC is a powerful electronic structure program that is compatible of using various relativistic treatments onto atomic and molecular systems. This includes a full 4-component Dirac Hamiltonian, many types of approximate 2-component methods, and more. DIRAC is able to use relativistic Hamiltonian to various methods such as HF, DFT, and various other electron correlation methods.
DL_POLY is a general purpose classical molecular dynamics (MD) simulation software developed at Daresbury Laboratory by I.T. Todorov and W. Smith.
Quantum Espresso is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.
FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory.
FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST). We believe that FFTW, which is free software, should become the FFT library of choice for most applications.
Intel 13, icc/ifort compiled with openmpi/intel/mlx1.6.5 support.
Intel 14, icc/ifort compiled with openmpi/intel/14/mlx/1.6.5 support. (this is the recommended version).
The General Atomic and Molecular Electronic Structure System (GAMESS)
is a general ab initio quantum chemistry package.
Gaussian 09 is the latest version of the Gaussian® series of electronic structure programs, used by chemists, chemical engineers, biochemists, physicists and other scientists worldwide. Starting from the fundamental laws of quantum mechanics, Gaussian 09 predicts the energies, molecular structures, vibrational frequencies and molecular properties of molecules and reactions in a wide variety of chemical environments. Gaussian 09’s models can be applied to both stable species and compounds which are difficult or impossible to observe experimentally (e.g., short-lived intermediates and transition structures).
Gaussian 09 provides the most advanced modeling capabilities available today, and it includes many new features and enhancements which significantly expand the range of problems and systems which can be studied. With Gaussian 09, you can model larger systems and more complex problems than ever before, even on modest computer hardware.
Gaussian runs on shared memory and run single host.
The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, Ada, and Go, as well as libraries for these languages (libstdc++, libgcj,...). GCC is a key component of the GNU toolchain.GCC has been ported to a wide variety of processor architectures, and is widely deployed as a tool in the development of both free and proprietary software. GCC is also available for most embedded platforms including Symbian, AMCC and Freescale Power Architecture-based chips.It was originally written as the compiler for the GNU operating system.
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.
Makefile was made with cmake (2.8) and using gfortran.
HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.
HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.
IM is a program, written with Rasmus Nielsen, for the fitting of an isolation model with migration to haplotype data drawn from two closely related species or populations. IM is based on a method originally developed by Rasmus Nielsen and John Wakeley (Nielsen and Wakeley 2001 Genetics 158:885). Large numbers of loci can be studied simultaneously, and different mutation models can be used.
Compiled with gfortran for only single node (SMP) parallelization.
JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS. JAGS was written with three aims in mind:
- To have a cross-platform engine for the BUGS language
- To be extensible, allowing users to write their own functions, distributions and samplers.
- To be a plaftorm for experimentation with ideas in Bayesian modelling
JAGS is licensed under the GNU General Public License.
Also availble within R package rjags by loading the library as follows:
Then you should see:
Loading required package: coda
Loading required package: lattice
Linked to JAGS 3.4.0
Loaded modules: basemod,bugs
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator.
MaSuRCA is whole genome assembly software. It combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches. MaSuRCA can assemble data sets containing only short reads from Illumina sequencing or a mixture of short reads and long reads (Sanger, 454).
Compiled with gcc/g++. Will only run parallel threads on a single compute node.
The following parameters are mandatory:
set it to the number of cores in the computer to be used for assembly.sh
The number of threads should match the number you request in your UGE parallel environment "-pe openmpi_16 16"
Mathematica is a computational software program used in many scientific, engineering, mathematical and computing fields. It was conceived by Stephen Wolfram and is developed by Wolfram Research of Champaign, Illinois.
MATLAB® is a high-level language and interactive environment for numerical computation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java™.
The Linux distro of Matlab is installed. It includes distributed computing server that enables parallel execution of the code.
MIRA 4 is able to perform true hybrid de-novo assemblies using reads gathered through Sanger, 454, Solexa, IonTorrent or PacBio sequencing technologies. That is, it assembles reads instead of a mix of (eventually shredded) consensus sequence and reads. See an example on how it looks like for Sanger and 454 in the documentation introduction, but it also works with any other combination of sequencing technologies. Only restriction at the moment: reads must be <= 32 kilobases and for PacBio, MIRA must get CCS reads or error-corrected CLR data.
Came as statically compiled binary.
Intel® Math Kernel Library (Intel® MKL) accelerates math processing routines that increase application performance and reduce development time. Intel® MKL includes highly vectorized and threaded Linear Algebra, Fast Fourier Transforms (FFT), Vector Math and Statistics functions.
Molcas is an ab initio quantum chemistry software package developed by scientists to be used by scientists.
This visualization software is only availble from the vis-login nodes and requires X11 forwarding to a client Xwin server to work.
molden, gmolden, surf, ambfor, and ambmd are all compiled with gcc/gfortran 4.4.7
Molpro is a electronic structure calculation with a complete capability of ab initio methods. The code has implemented various method for highly accurate calculation for electron correlation.
global array v5.1.1
MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.
Compiled with Intel14
MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form.
NCL is an interpreted language designed specifically for scientific data analysis and visualization.
Newbler is a software package for de novo DNA sequence assembly. It is designed specifically for assembling sequence data generated by the 454 GS-series of pyrosequencing platforms sold by 454 Life Sciences, a Roche Diagnostics company. Newbler is a useful tool for assembling your 454 (or other pyrosequencing) data.
Came as statically linked binary.
Oases is a de novo transcriptome assembler designed to produce transcripts from short read sequencing technologies, such as Illumina, SOLiD, or 454 in the absence of any genomic assembly.
Open Babel is a chemical toolbox designed to speak the many languages of chemical data. It's an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.
The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers.
The program ORCA is a modern electronic structure program package written by F. Neese, with contributions from many current and former coworkers and several collaborating groups. The binaries of ORCA are available free of charge for academic users for a variety of platforms.
The linux version was inflated. The module loads openmpi/intel/14/1.6.5 for necessary parallel libraries.
You must use "PAL#" in your ORCA input line to run in parallel, where # = 2 - 8. Ex: ! HF PAL8
PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is maintained and distributed for academic use free of charge by Ziheng Yang. ANSI C source codes are distributed for UNIX/Linux/Mac OSX, and executables are provided for MS Windows. PAML is not good for tree making. It may be used to estimate parameters and test hypotheses to study the evolutionary process, when you have reconstructed trees using other programs such as PAUP*, PHYLIP, MOLPHY, PhyML, RaxML, etc.
Compiled with intel/13.1
The command-line version can be run cluster wide.
The GUI version (PamlX) is only avaible on vis-nodes.
PGI Unified Binary™ technology simplifies cross-platform support by combining into a single executable file, code sequences optimized for multi-core x64 processor families from Intel and AMD and GPU accelerators from NVIDIA. The PGI Unified Binary delivers all the benefits of a single x64 platform while enabling you to leverage the latest hardware innovations.
Psi4 is an open-source suite of ab initio quantum chemistry programs designed for efficient, high-accuracy simulations of a variety of molecular properties. We can routinely perform computations with more than 2500 basis functions running serially or in parallel.
Python is an interpreted, interactive, object-oriented programming language that combines remarkable power with very clear syntax. For an introduction to programming in Python you are referred to the Python Tutorial. The Python Library Reference documents built-in and standard types, constants, functions and modules.
These are additional versions added to compute nodes from rocks python roll 6.1
QIIME (canonically pronounced "chime") stands for Quantitative Insights Into Microbial Ecology. QIIME is an open source software package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data).
python is used to download and compile the code. GPL 32bit version of "usearch" was used.
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees. It can also be used for postanalyses of sets of phylogenetic trees, analyses of alignments and, evolutionary placement of short reads.
RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation.
RStudio is an IDE that makes R easier to use and more productive. RStudio combines a set of productivity tools into a single environment including:
- Code Editor – syntax highlighting, code completion, indenting, and definitions
- Debugging – debugging console, breakpoints, environment panel, and tracebacks
- Visualization – data display, data plotting, and data manipulation
Compiled with gcc. Optional packages gap_packages-4.6.4.p1 and database_gap-4.6.4 have been installed too.
SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments.
gcc make file.
SCons is an Open Source software construction tool—that is, a next-generation build tool. Think of SCons as an improved, cross-platform substitute for the classic Make utility with integrated functionality similar to autoconf/automake and compiler caches such as ccache. In short, SCons is an easier, more reliable and faster way to build software.
SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost effective way. Now the new version is available.
This code is NOT compiled with MPI, and should only be used in parallel on a SINGLE node, via a threaded model.
SplitsTree4 is the leading application for computing unrooted phylogenetic networks from molecular sequence data. Given an alignment of sequences, a distance matrix or a set of trees, the program will compute a phylogenetic tree or network using methods such as split decomposition, neighbor-net, consensus network, super networks methods or methods for computing hybridization or simple recombination networks.
The Sequence Read Archive (SRA) stores raw sequence data from "next-generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence.
HOMEPAGE LINK DOES NOT WORK...
Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank. It uses many of the same functions as Sequin but is driven generally by data files. Tbl2asn generates .sqn files for submission to GenBank. Additional manual editing is not required before submission.
TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads.
Trinity will run in SMP threaded model (i.e. parallel on a single node).
The Vienna Ab-initio Simulation Package, better known as VASP, is a package for performing ab initio quantum mechanical molecular dynamics using either Vanderbilt pseudopotentials, or the projector augmented wave method, and a plane wave basis set.
VERDI is a Java program for visualizing meteorology, emissions, and air quality modeling data.
VirtualGL is an open source toolkit that gives any Unix or Linux remote display software the ability to run OpenGL applications with full 3D hardware acceleration. Some remote display solutions cannot be used with OpenGL applications at all.
VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. VMD supports computers running MacOS X, Unix, or Windows, is distributed free of charge, and includes source code.
Voro++ is a open source software library for the computation of theVoronoi diagram, a widely-used tessellation that has applications in many scientific fields. The main design features are Cell-based computations, 3D calculation and C++ architecture.
The voro++ utility package is added to LAMMPS so that users can access it directly from inside the LAMMPS.
Warp is a extensively developed open-source particle-in-cell code designed to simulate
charged particle beams with high space-charge intensity. The name "Warp" stems from the
code's ability to simulate Warped (bent) Cartesian meshes. This bent-mesh capability
allows the code to efficiently simulate space-charge effects in bent accelerator lattices
(resolution can be placed where needed) associated with rings and beam transfer lines with
dipole bends. The code is setup around the interactive python interpreter with dynamically
The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting needs.