New schema and updates

Wed, 2013-10-02

Schema Updates

All LRG records have been updated to a new XML schema (schema 1.8). The new records are available on the LRG website and on the FTP site (

The main changes to the schema are:

  • The inclusion of a fixed LRG-specific exon numbering system based on the transcript(s) included in the fixed section. Each distinct exon is numbered consecutively 5′ to 3′; the numbering is then applied to individual transcripts.
  • The creation of a new “Community” annotation set in the updatable section. It includes additional, relevant information provided directly to LRG project curators by collaborators such as Locus Specific Database (LSDB) curators, members of the diagnostic community, clinicians, and researchers.
  • Alternate or legacy exon and amino acid numbering systems widely used by the community have been moved from the NCBI updatable section to the new community annotation set to reflect the fact that this data is provided directly to LRG project curators by members of the community.
  • Inclusion of the HGNC ID as the main identifier in the fixed section since the HGNC symbol and the LRG gene name can update and are not fixed.

Display Updates

Improvements to the view have also been made, notably the inclusion of a summary box at the top right corner of each LRG’s webpage. It lists key information such as identifiers, genomic and transcript sequence sources, and the number of transcripts included in the fixed section.

In addition, the updatable section of each LRG now contains the most up to date information available from NCBI and Ensembl (Ensembl release 73). The next update will be in early 2014 with information from the GRCh38 assembly.


Other changes to the schema have been described in the schema 1.8 documentation. LRG records in the previous XML schemas (schema 1.6 and schema 1.7) have been archived and are publicly available.