FAQ

Tutorial / Webinar
A LRG webinar is available on Youtube (~30min).
This webinar provides you with an overview and demo of the Locus Reference Genomic (LRG) resource - a manually curated reference resource for the reporting of clinically relevant variants.
The webinar and its PowerPoint presentation are available on the LRG page of the EMBL-EBI Online training website.

LRG Frequently Asked Questions (FAQs)

The most frequently asked FAQs

Variant Reporting Standards

Obtaining existing LRGs and requesting new ones

Viewing LRGs

Software support for LRGs

Specifications and standards

Administrative Issues


 

The most frequently asked FAQs

What is an LRG?
A Locus Reference Genomic (LRG) is a manually curated record that contains stable and thus, un-versioned reference sequences designed specifically for reporting sequence variants with clinical implications.
Why do we need LRGs?
Accurate and unambiguous reporting of variants requires internationally recognized reference sequences that do not change over time. The use of multiple sequences for a given locus as well as confusion over versions has resulted in inconsistent variant reporting in the past. The LRG project was created to avoid these problems.
What is contained in an LRG record?
Each LRG contains a stable “fixed” section and a regularly updated “updatable” section. The fixed section contains stable genomic DNA sequence for the region of interest, transcripts and proteins deemed essential for reporting variants, and an LRG-specific exon numbering system (See How is the LRG-specific exon numbering determined?). The updatable section contains mapping information, annotation of all transcripts and overlapping genes in the region, and legacy exon and amino acid numbering systems.
How are the reference sequences in the LRG chosen?
The sequences of each LRG are chosen in collaboration with research and diagnostic laboratories, LSDB (locus specific database) curators and mutation consortia with expertise in the region of interest. Reference sequences are suggested by the community and reviewed by LRG curators. Working with the community ensures that each LRG record contains the most appropriate reference sequences for reporting variants in that region.
How are the transcripts selected?
During the LRG creation process, LRG curators will review the transcript submitted by the requester as well as all other transcripts in the region of interest for potential inclusion in the record. Curators will perform alignments and analyze publicly available expression data as part of this review process. Only transcripts for which there is currently good biological understanding AND are required for the unambiguous reporting of disease-causing variants will be included. When deemed necessary by experts in the community, LRG curators will consider using idealized transcripts as reporting standards, even if they are not supported with biological evidence.
How is the LRG-specific exon numbering determined?
The LRG-specific exon numbering system included in each LRG is based on the transcript(s) included in the fixed section. Each exon is numbered consecutively 5′ to 3′; the numbering is then applied to individual transcripts.
How many LRG records have been created?
As of October 2015, over 1,000 LRGs have been created, of which 655 are public (http://www.lrg-sequence.org/). The aim is to create an LRG for every locus with clinical implications. To request an LRG contact us at request@lrg-sequence.org.
Is there a published account of LRGs that I can read?
The LRG standard and why it is needed is described in the publication Locus Reference Genomic: reference sequences for the reporting of clinically relevant sequence variants, MacArthur JA et al., Nucleic Acids Res. 2014 Jan (doi: 10.1093/nar/gkt1198). See also Locus Reference Genomic sequences: an improved basis for describing human DNA variants, Dalgleish R et al., Genome Med. 2010, 2:24, and the editorial "Conventional wisdom"in Nature Genetics 2010, 42, p.363.
Where can I get more information about LRGs?
The LRG web site (http://www.lrg-sequence.org/) and the partner RefSeqGene site (http://www.ncbi.nlm.nih.gov/RefSeq/RSG/) maintain current information about the LRG project and available sequences.
What is the difference between RefSeqGene and LRG records?
LRG and RefSeqGene are collaborative resources. The advantages of using LRGs for variant reporting are: 1. LRGs are specifically created for the reporting of clinically relevant variants and hence, are for loci with clinical implications. 2. LRGs are stable and therefore are not versioned, thus reducing ambiguity when reporting variants. When an LRG is established for any gene, the RefSeqGene and its annotation will be “frozen” to match that of the LRG.
Does this mean that additional transcripts cannot be added?
It's inevitable that new transcripts of biological importance will be discovered for genes for which an LRG already exists. Such transcripts can be added to the updateable section of an LRG. New transcripts can only be added to the fixed section, if they are essential for reporting clinically relevant variants. A compelling case can be made for their inclusion if the new transcripts encode different proteins of clear clinical importance and variants cannot be meaningfully described in terms of the current transcripts in the LRG. Consideration of requests for the addition of new transcripts will be on a case-by-case basis.
Does this not re-create the versioning problem?
Versioning is an issue with traditional reference sequence records simply because the actual sequences differ from version to version for records with the same accession number. The LRG sequence data for the genomic DNA, the transcripts and their translation products will never be changed. Consequently, a variant description such as LRG_13:g.8290C>A will always remain valid and will never be subject to misinterpretation. The user simply needs to ensure that the LRG contains all of the necessary transcripts for the intended task.
LRG records don't have versions and RefSeqGene records do. Why?
The RefSeq project, following the convention of the International Nucleotide Sequence Database collaboration (http://www.insdc.org/), assigns sequence identifiers as a combination of a stable component (the accession), and a version. Any revision of the sequence results in the incrementing of the version number. The version number is indicated after the decimal point at end of the accession number (e.g. NM_000088.3). Unfortunately, the version of a sequence is often not reported when a variant description is presented in a publication. Thus, uncertainty can result when trying to interpret the consequence of any variant if the current version of the reference sequence is greater than 1. This problem is avoided in the LRG accessioning system by not having versions. Once an LRG is created, the sequence data are never changed.
Will LRGs replace RefSeq and RefSeqGene records?
See "What is the difference between RefSeqGene and LRG records?”. RefSeq and RefSeqGene records will continue to be generated. When an LRG is established, the exact version of the RefSeqGene and its annotated RNA (NM_ or NR_) and protein (NP_) will be “frozen”, and cross references added to the LRG. For an example, see http://www.ncbi.nlm.nih.gov/nuccore/LRG_1.
What will happen when a new genome build is released?
Once a new assembly is released, the mapping information and annotation of all LRGs will be updated to the new assembly. Mapping of the LRG genomic sequence to both the current and penultimate assembly will be included in each LRG.
How will sequence corrections be made to LRGs?
No changes to the sequences in an LRG will be permitted. If it's no longer possible to describe a sequence variant in terms of an existing LRG, it might be necessary to create a totally new LRG with a uniquely different number (e.g. LRG_1275 instead of the existing LRG_89). The original LRG will not be "retired" and it will remain valid to describe variants with respect to that sequence record. Creation of additional LRGs for an existing gene or genomic region will only be considered in the most exceptional circumstances.
What about copy number variation?
Copy number variation (CNV) will certainly be an issue, but LRGs are certainly no less well suited to the task of variant description than existing reference sequence records. Requests will be considered for the creation of an LRG representing a particular allele with respect to CNV and we will work with the requesting party to achieve the best practicable solution to represent the allele.
Who is the final arbiter of LRG content?
LRGs are created for the benefit of the biomedical community and so must meet its needs. We welcome discussion about whether or not individual LRGs fulfil specific needs and we will work with the community to ensure that these needs are met. In the end, we will take the authoritative advice of the community.
How can I use a LRG record?
Once a LRG record has been created, you can e.g.:

 

Variant Reporting Standards

Will I have to learn a new variant nomenclature?
No, the standard HGVS Nomenclature will still be used. HGVS (http://www.hgvs.org/mutnomen/) and EMQN best practice guidelines have endorsed LRGs. The stable identifiers of the genomic, transcript, and protein sequences in the fixed section of an LRG (transcripts: “t1”, “t2”, etc.; proteins: “p1”, “p2”, etc.) can be used for stable reporting of variants. See "Can you give me an example?" for more details.
Can you give me an example?
The COL1A1 gene is represented by LRG record LRG_1 which has a single transcript (t1) and a single corresponding protein (p1). The frequently reported disease-causing variant NG_007400.1:g.9595G>A can also be reported as NM_000088.3:c.769G>A, and as NP_000079.2:p.Gly257Arg using the current RefSeqGene and RefSeq mRNA and protein reference sequences. Since LRGs contain the genomic DNA, mRNA and protein sequences within a single record, the three corresponding descriptions are LRG_1:g.9595G>A, LRG_1t1:c.769G>A and LRG_1p1:p.Gly257Arg.
Description Level RefSeqGene or RefSeq LRG
Gene NG_007400.1:g.9595G>A LRG_1:g.9595G>A
mRNA NM_000088.3:c.769G>A LRG_1t1:c.769G>A
Protein NP_000079.2:p.Gly257Arg LRG_1p1:p.Gly257Arg
What about an example for a gene with more than one transcript?

The calcitonin gene (CALCA) encodes two peptide hormones, calcitonin and calcitonin gene-related peptide (CGRP), that have no amino acid sequence in common. These hormones are derived by enzymatic cleavage of the translation products of two alternatively spliced mRNAs that exclusively contain exon 4 (calcitonin) or exons 5 and 6 (CGRP). Consequently, a SNP in the first base of exon 4 (rs5241) affects only the mRNA that encodes calcitonin.

Using HGVS nomenclature, the variant can be described as NM_001033952.2:c.228C>A using the calcitonin RefSeq mRNA as the reference sequence. The corresponding protein-level description is NP_001029124.1:p.Ser76Arg. Alternatively, it can be described with respect to the RefSeqGene genomic DNA sequence as NG_015960.1:g.8290C>A.

The LRG for the CALCA gene (LRG_13) contains information for both the major alternatively spliced forms of the gene's transcripts. Calcitonin and CGRP are represented by transcripts t2 and t1 respectively. Consequently, the SNP can be described at the DNA level as LRG_13: g.8290C>A or LRG_13t2:c.228C>A. The corresponding protein-level description is LRG_13p2: p.Ser76Arg.

Description Level RefSeqGene or RefSeq LRG
Gene NG_015960.1:g.8290C>A LRG_13:g.8290C>A
mRNA NM_001033952.2:c.228C>A LRG_13t2:c.228C>A
Protein NP_001029124.1:p.Ser76Arg LRG_13p2:p.Ser76Arg
How do I report intronic variants in LRGs?
As the LRG consists of genomic, transcript and protein sequences that are linked, the sequence covering the introns are present too. Below are examples of using HGVS nomenclature to report a variant in an intron e.g.
rs750106647 is LRG_1t1:c.4005+11T>C and rs778417218 is LRG_1t1:c.4005+5G>A.
I have been using reference sequences not included in the LRG record. Is there a tool that can help map all my variants in LRG coordinates?
Yes, the NCBI Genome Remapping Service (http://www.ncbi.nlm.nih.gov/genome/tools/remap/, Clinical Remap tab) can be used to convert variant data into LRG coordinates. This tool will convert locations or HGVS variant descriptions on a selected genomic assembly or RefSeqGene into locations on an LRG if one is public for that region.

 

Obtaining existing LRGs and requesting new ones

How do I find out if an LRG already exists for my gene of interest?
To find if an LRG already exists, you may use the search function on the LRG website (http://www.lrg-sequence.org) or search the list of all LRGs found at http://www.lrg-sequence.org/LRG. You may also search for LRGs in Ensembl and NCBI browsers.
If none yet exists, how do I request the creation of an LRG?
You can request the creation of an LRG for your gene of interest by contacting us at request@lrg-sequence.org

 

Viewing LRGs

How do I use the search function on the LRG website?
You can search by e.g. LRG identifier, HGNC gene name, LSDB name, NCBI and Ensembl accession numbers, gene synonym or LRG status. Wildcards and logical expressions are accepted. Example searches: LRG_1, COL*, Osteogenesis, (NM_000088.3 OR NM_000089.3), collagen, pending. A batch search can be carried out by entering a list of LRG identifiers separated by a pipe symbol e.g. LRG_1|LRG_3|LRG_45
What do I need to view an LRG?
All that you need is a web browser such as Internet Explorer, Firefox, Chrome, Safari, etc. to view the LRGs that are available at ftp://ftp.ebi.ac.uk/pub/databases/lrgex/. LRGs can also be viewed in the Ensembl, NCBI and UCSC genome browsers. See “How can I use a LRG record?”.
Can I download and view LRGs locally?
Yes, each LRG can be downloaded from its page on the LRG website or  by following the instructions described here http://www.lrg-sequence.org/downloads. If you want to display the downloaded LRG(s) locally on your web browser with the same layout as the LRG website, you need to download the files lrg2html.xsl,  lrg2html.css, lrg2html.js and the directory img and place these in the same directory as the downloaded LRG file(s). Without these extra files, your web browser will display the LRG data in XML rather than the nicely formatted version that you see when viewing LRGs from the ftp site.
Can I view the sequences in an LRG in any other format?
Within the browser view of an LRG it's possible to display the individual sequences (genomic DNA, transcripts and their translated protein sequences) in FASTA format. This allows copying and pasting of sequences into other applications that support that format. From NCBI, try the “graphics” display (http://www.ncbi.nlm.nih.gov/nuccore/LRG_1?report=graph).

 

Software support for LRGs

Do you offer programmatic access to LRG data?
Yes, some of the information from LRG records are available through different web services, implemented using the XML-RPC protocol. See this page for more information.
Are LRGs supported by external software?
Yes. Mutalyzer's "Name Checker", "Syntax Checker", and "Name Generator" ensure that variants described using LRG sequences follow HGVS guidelines. Alamut's (Interactive Biosoftware: http://www.interactive-biosoftware.com/), Ensembl's Variant Effect Predictor and Variobox (http://bioinformatics.ua.pt/software/variobox/) facilitate interpretation of variation data described using LRG coordinates. In addition,the LOVD DNA variation database system supports LRGs.
Can I write my own application?
Anybody can write an application to handle and manipulate sequence data in the LRGs. The LRG format is open and the record schema is freely available (see below). Technical support for the schema is available at help@lrg-sequence.org. We would encourage you to make your software free and open and to let us know about it so that we can provide links to your application.

 

Specifications and standards

Where can I get a copy of the LRG specification?
The current version of the technical specification document is available at ftp://ftp.ebi.ac.uk/pub/databases/lrgex/docs/LRG.pdf. There is a version number and date stamp in the bottom margin to help in tracking changes to the specification.
How are LRGs formatted?
LRGs are created in extensible markup language (XML) format. Each XML file is highly structured and contains all the information pertaining to a single LRG.
Is the LRG XML schema available?
The LRG XML schema was created in RELAX NG (http://relaxng.org) schema description language and can be downloaded from (ftp://ftp.ebi.ac.uk/pub/databases/lrgex/LRG.rnc). The schema has a date stamp and any changes would be accompanied by a change in the date. The current version of the schema is Schema 1.9. Documentation can be found on the LRG FTP site (ftp://ftp.ebi.ac.uk/pub/databases/lrgex/docs/LRG_XML_schema_documentation_1_9.pdf).
Can I create my own LRG sequence records?
For an LRG to be an international standard, it must be accessioned by the collaborating groups. If you would like to request an LRG, please contact request@lrg-sequence.org. We would encourage you not to create your own LRG records.

 

Administrative Issues

Who has responsibility for creating LRG sequences?
The creation of LRGs is the joint responsibility of the European Bioinformatics Institute (EBI: http://www.ebi.ac.uk/) and the National Center for Biotechnology Information (NCBI: http://www.ncbi.nlm.nih.gov/).
What is the role of GEN2PHEN in LRGs?
The LRG concept was developed as a project within the remit of the GEN2PHEN project (http://www.gen2phen.org/) and was funded for five years under the European Community's Seventh Framework Programme (FP7: http://cordis.europa.eu/fp7/).
What will happen now that the GEN2PHEN project funding has ended?
Although funding for GEN2PHEN ended in June 2013, EBI and NCBI are fully committed to maintaining the LRG project.