The comparative genomics of Bio and Neuroinformatics
Phillip Lord, School of Computing Science, Newcastle University
<
phillip.lord@newcastle.ac.uk
>
Introduction
Will talk about:
Infrastructure
Data standards
Formats
Ontologies
in both bioinformatics and neuroinformatics
in 10 minutes
Definitions
Bioinformatics — handling the data associated with biology.
generally, this means molecular data
Neuroinformatics — the data associated with neurosciences
most often this is cellular data
Surely the brain is part of a biological entity?
History
Of all sciences, biology has been most affected by data models.
Linnaeus defined modern biology
His classification scheme wasn’t actually that great
His two word, semantics-free(ish) naming scheme is still being used
Infrastructure
Bioinformatics — has a central data type
DNA Sequence (species being close second!)
Generally openly accessible follow Clinton/Blair
Sanger > 300 billion bp per week
Depends on data type — big science to cottage industry
Neuroinformatics
No real central data type
Brain anatomy is closest, but this is coarse-grained
Some big science (Allan Brain Atlas, Genes to Cognition)
Collated databases (cocomac).
Most of the data produced by smaller labs.
Data Standards
Minimal Information
Content standards
MIBBI > 30 of these
Most bioinformatics
Some (MINI, MIfMRI) neuroscience
Formats
Many open, XML formats
Covers both data, and models (SBML).
For neurosciences,
NeuroML for models
Some shared APIs emerging
HDF for data?
Ontologies
Bioinformatics
The Gene Ontology!
There are now many, many registered as OBO (Open Biological Ontologies)
Been something of a land grab for namespace.
Some use OWL and W3C standards; most are (developed) in OBO format.
Neuroinformatics
Brain anatomy reasonable well served
Also Neurolex, Neuronames
Fly in the ointment
Every new technology results in more formats
For Microarray
MicroarrayImager[CombiMatrix]
AIDA[Raytest]
Affymetrix[Affymetrix]
AppliedBiosystems[Celera]
ArrayGauge[FUJIFILM]
ArrayVision[Imaging Research]
BZScan[TAGC ERM206]
And more
BeadStudio[Illumina]
BlueFuse[BlueGnome]
ChipSkipper[EMBL]
CodeLink[Motorola Life Sciences]
Spot[CSIRO]
CodeLink Expression Analysis[GE Healthcare]
Feature Extraction Software[Agilent Technologies]
GEMTools[Incyte Genomics]
GLEAMS[NuTec Sciences]
And more
GenePix[Axon Instruments]
GeneTAC
ImaGene[BioDiscovery]
NimbleScan[NimbleGen Systems]
QuantArray[PerkinElmer]
ScanAlyze[Stanford University]
ScanArray Express[PerkinElmer]
SpotFinder[TIGR]
UCSF Spot
Uni of Toronto in-house analysis software
Formats
In neuroinformatics, equipment vendor lock-in in the norm
In bioinformatics, it is not, although it occurs:
With next gen sequencing, we revisit microarray history
Although formats are interconvertable
So more of a pain and a problem
Conclusions
Two similar appearing fields
But, in practice, with very different stages of development
Neuroinformatics is several decades behind, but is catching up
Not clear yet, how much knowledge is transferable between the two