Academic Commons
Subject Repositories
A selection (not intended to be comprehensive) of publicly accessible data repositories categorized by subject.
Agricultural Sciences
Astronomy
- HEASARC - NASA's High Energy Astrophysics Science Archive Research Center
- Infrared Science Archive - NASA's science and data center for infrared astronomy
- Extragalactic Database - NASA's archive of data for over 3 million extragalactic objects
- National Virtual Observatory - Astronomical data from ground and space-based telescopes. Includes data analysis tools
- National Space Science Data Center - Archive for NASA space mission data
- Sloan Digital Sky Survey - Download optical images of the sky. See also, SkyServer for educational portal to the data.
Chemistry
- Cambridge Structural Database - small molecule crystal structures
- eCrystals - x-ray crystallographic data
- PubChem - NCBI's repository of bioactivy/bioassay data and information for "small" molecules (i.e. not macromolecular). Both text-based and structure-based search tools are provided
Computer Science and Source Code
- CodePlex - provided by Microsoft
- GitHub - Hosts developer libraries such as Ruby on Rails, IronRuby, jQuery, Perl
- Google Code Project hosting - open APIs and Google projects like Google Gears, Android, Chromium.
- Launchpad - includes projects such as Ubuntu, MySQL (code hosting)
- SourceForge - the most popular open source code hosting facility according to this Wikipedia comparison
Environmental and Geosciences
- Goddard Earth Sciences Data and Information Services Center
- IRI/LDEO Climate Data Library - Climate-related datasets from the International Research Institute for Climate and Society at Columbia University
- Marine Geoscience Data System (MGDS) - A data portal, hosted at the Lamont-Doherty Earth Observatory
(Columbia University), for a number of NSF-supported marine research programs - Reverb - Locate earth science data from NASA and affiliated centers
- National Climatic Data Center (NCDC) - Meteorology and paleoclimatology
- NCAR/UCAR Community Data Portal - Climate and weather datasets and visualization software from the National Center for Atmospheric Research and the University Corporation for Atmospheric Research
- National Oceanographic Data Center (NODC) - World-wide marine environmental and ecosystem data
- National Center for Atmospheric Research Computational & Information Systems Library
- USGS National Satellite Land Remote Sensing Data Archive - Note that some data access is fee-based
- GEON - Portal for datasets and visualization tools
- National Snow and Ice Data Center (NSIDC) - Cryospheric datasets from ground field reseach and satellites
GIS and Geography
- Geodata.gov - One-stop for federal, state and local geographic data
- GeoCommons.com GIS file repository and finding tool
- Federal Geographic Data Committee - Provides access to the National Spatial Data Infrastructure (NSDI) Clearing House Network and the geodata.gov portal
- National Geographic Data Center - Archive of datasets
Health and Medical Sciences
- Biological Magnetic Resonance DataBank - MRI data
- National Center for Biotechnology Information (NCBI) - Numerous databases with a genomic/proteomic focus
Life and Biological Sciences
- Dryad - Evolutionary biology and ecology datasets used as the foundation for research publications. Dryad has been developed by the National Evolutionary Synthesis Center and the University of North Carolina Metadata Research Center
- GenBank - NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.
- National Biological Information Infrastructure - This portal links to a wide variety of data sources, such as the Fisheries and Aquatic Resources Data Access Wizard and the Biogeographic Information and Observation System (BIOS). A full list of all data sets is available here
- Protein DataBank - Experimentally determined structures for macromolecules (protein and nucleic acids). The site includes search and visualization tools
- UniProt - Free protein sequences
- The Cell: An Image Library - Images of all cell types from all organisms, including intracellular structures and movies or animations demonstrating functions. This project relies upon the cell biology community to populate the library.
Physics
- HEP Data - high-energy physics reaction database of Numerical HEP scattering cross sections
- NIST Physical Standards Laboratory - physical reference data and property tables
- National Nuclear Data Center - includes nuclear structure, reaction and decay databases
