CFSAN logo
GenomeGraphR logo

Distributions / Disclaimer / Suggested Citation / Quick Start / Contact / Versions

Distributions

The distributions GenomeGraphR is populated with are downloaded from the NCBI ftp. Check on the right side of this page to see the available distributions.

Disclaimer

The U.S. Food and Drug Administration (FDA) have taken all reasonable precautions in creating the GenomeGraphR application. FDA is not responsible for errors, omissions or deficiencies regarding the application. The GenomeGraphR application is being made available “as is” and without warranties of any kind, either expressed or implied, including, but not limited to, warranties of performance, merchantability, and fitness for a particular purpose. FDA in not making a commitment in any way to regularly update the system. Responsibility for the interpretation and use of the application lies solely with the user. In no event shall FDA be liable for direct, indirect, special, incidental, or consequential damages resulting from the use, misuse, or inability to use the application. “Third parties” use of or acknowledgment of the application and its accompanying documentation, including through the suggested citation, does not in any way represent that FDA endorses such third parties or expresses any opinion with respect to their statements.

Suggested Citation

Where the GenomeGraphR is used, reference to the system should be made as follows:
Food and Drug Administration Center for Food Safety and Applied Nutrition (FDA/CFSAN). 2018. GenomeGraphR. FDA CFSAN. College Park, Maryland, USA. Available at https://fda-riskmodels.foodrisk.org/genomegraphr/; or
Sanaa M, Pouillot R, Vega FG, Strain E, Van Doren JM (2019) GenomeGraphR: A user-friendly open-source web application for foodborne pathogen whole genome sequencing data integration, analysis, and visualization. PLoS ONE 14(2): e0213039. https://doi.org/10.1371/journal.pone.0213039

Quick Start

On this “Home” tab

Choose a pathogen: Salmonella or Listeria
Choose a distribution: the latest distribution is pre-loaded. If you pick another distribution, expect some time for the data to be loaded;
Choose the SNP threshold: this value will be used to consider that two strains are “connected” (SNP distance lower or equal to this threshold) or not (SNP distance greater than this threshold);
Choose which date to consider: choose if the “date” used by the application for a strain should be the date of collection as informed by the strain submitters (default, there are some missing data), the date of creation of the strain in the NCBI Pathogen Detection system (no missing data) or the date of collection replaced, if missing, by the date of target creation. Note that using the date of target creation could be misleading when analyzing relatedness between strains if strains were sent for WGS analysis long after the date of sample collection;
Choose which location to consider: choose if the “location” used by the application for a strain should be the location of collection as informed by the strain submitters (default, there are some missing data), the location of the submitting center (no missing data) or the location of the collection replaced, if missing, by the location of the center. Note that selecting the center location could be misleading because a laboratory may isolate strains from samples collected from all over the world;
Choose the format for graph exportation: .png, .jpeg or .pdf;
Go to the “Select” Tab

On the “Select” tab

There are three ways to select the strains to explore:
Select targets strains from the isolation source
Click on one of the nodes of the isolation source tree. The application will indicate which food is selected and will indicate how many clinical strains are closer than the SNP threshold from the strains from the selected food;
Or, Select targets strains from Table
The NCBI data will be provided in a table. Select the columns to show in addition to the default ones, if needed. Search for the strains of interest either through a global search over all columns (field “Search”) or more specifically in one column (field at the top of each column). Click on the selected strains (Click again to unselect). The application indicates the selected target accession number (“target_acc”) of the selected strains and (top of the page) how many clinical strains are closer than the SNP threshold from the selected strains.
Or, Select targets strains from File
Choose an Excel file on your computer. The file should contain a column of strain identifiers. The file will be uploaded. When updated the application will try to match the various variables of your file to the various variables from the NCBI database (you can choose to match the data using only one variable with the “Match on which variable” field). The application will provide two tables: the first one shows your file and, or each line of your file, if there is a matching record in the NCBI database; the second one provides the selected strains from the NCBI database. The application also indicates (top of the page) how many clinical strains are closer than the SNP threshold from the selected strains.
Once you have selected your strains, click on the “Select” button.

On the “Selection” tab

This tab presents data from the strains you selected.

On the “Pivot Table” subtab

This tab provides a pivot table for data description. Drag and drop the variables either in line or in column to obtain cross tabulation. By clicking on a variable, you can filter some of the occurences of the variable. You can choose the statistic and the type of output you want;

On the “Epicurve” subtab

A dynamic graph illustrates the isolate occurrence over time. You can choose to group the strains by month, year or week (See which date). You can color code a variable. Hovering over the bars shows the characteristics of the strains;

On the “Map” subtab

A map of the location of the strains is provided. Warnings: the dots are placed at random within the limits of the state (United States) or the country (other countries, including Canada). The position of each dot doesn't represent the actual location of sampling. Strains from the US not assigned to a specific State are placed in the blue square. Strains not assigned to a specific country are placed in the red square. You can choose the size of the dots. Clicking on a dot provides the characteristics of the strains. You can choose the date of isolation. By clicking on the arrow below the slider, you obtain a dynamic graph (localisation of the strains as a function of the date of isolation).

On the “Network” tab

On the “Graph” subtab

This graph shows the connected components (CC) associating the selected strains and the clinical strains. Each node is a strain. An edge is drawn only if i) the distance between two strains is lower or equal to the chosen SNP threshold; ii) it links a selected strain and a clinical strain. Nodes are shown only if they are connected to at least another strain. (On the left) you can change the layout of the graph (the default tries to provide a nice layout. However, it might be interesting to test other layouts), choose if the color palette for years is qualitative (colors varying from year to year in no order) or quantitative (colors from white – older date - to black – recent year -), and choose a selection criterion. By default, the selection criterion is the country, but can be changed to the year, the project center, the serovar, the SNP cluster, the source or the connected component. When you change this selection criteria, you will be able to identify the strains according to this criterion by clicking on the selector “Select by …”.
When a lot of strains are to be drawn, the graph is split in sub graphs of about 1,000 strains each, starting by the smallest connected components. Use the selection box to show other sub graphs, from smallest to largest CCs.
Click on any strain of a CC of interest and click on the “Update Selection Sub Network” Button.

On the “Characteristics” subtab

This page provides some information on the CCs illustrated by the graph. The first table (“Connected-component characteristics”) shows the distribution of the size of the CC provided in the graph. The second table (“Connected-component characteristics per CC”) shows, for each CC numbered from 1 to n, the number of clinical strains, the number of selected strains and the total number of strains. The identifier of the CC is the same as in the following tables. A CC can be identified on the graph by selecting “CC” in the selection criteria field and use this identifier. The graph (“Connected-component timeline”) shows, for each CC, the date of isolation (See which date) of the various clinical and selected strains. Note that it is a dynamic graph that you can zoom, save, … The table “Connected-components” provide the NCBI table for these CC. A specific CC can be selected from its identifier using the “CC_number” fiel.d

On the “CC as a function of SNP distance” subtab

Choose the maximum threshold you want to test and click on the “create/update” graph button. A dynamic graph will show the box-plot of the logarithm (base 10) of the CC size as a function of the SNP threshold. The x-axis is the SNP threshold (from 0 to the chosen limit), the y-axis shows the connected component size (in log base 10).

On the “Sub Network” tab

On the “Graph” subtab

The application selects all the non-clinical strains linked to the clinical strains belonging to the previously selected CC and adds them to the CC to construct a sub-network. In this subnetwork all the links with SNP distance less than or equal to the threshold are drawn. Contrary to the previous graph (showing only clinical-selected edges), edges are drawn between clinical, selected, and other strains. You can change the layout of the graph, choose if the color palette for years is qualitative (color varying from year to year in no order) or quantitative (color from white – older date - to black – recent year -), and choose a selection criterion. By default, the selection criterion is the country, but can be changed to the year, the project center, the serovar or the source. When you change this selection criteria, you will be able to identify the strains according to this criterion by clicking on the selector “Select by …”. You can choose to draw or undraw (clinical – clinical) edges, (non-clinical – clinical) edges and/or (non-clinical – non-clinical) edges. You can reduce the SNP threshold to cut the subnetwork.

On the “Minimum Spanning Tree” subtab

A minimum spanning tree (MST) is a subgraph that connects all the nodes together, without any cycle and with the minimum possible total link weight (that is SNP distance). The same options as in the preceding graph are offered.

On the “Tree” subtab

A SNP distance-based tree is provided. You can choose the method of clustering (Complete, Single (close to MST), Ward, Ward (squared), Average (UGPMA), McQuitty (WPGMA), Median (WPGMC) or Centroid (UPGMC)).

On the “Epicurve” subtab

A dynamic graph illustrates the isolate occurrence over time, by isolation source. You can choose to group the strains by month, year or week (See which date). Hovering over the bars shows the characteristics of the strains.

On the “Circular plot” subtab

This dynamic plot visualizes the overall number of links between categories of isolates. Hovering the mouse on a category provides the distribution of links for this category to the other ones. On the “Sankey” subtab This dynamic plot visualizes similarly the overall number of links between categories of isolates. You can choose the “source” category of strains you want to see the number of links to.

On the “Map” subtab

A map of the location of the strains is provided (See which location). Warnings: the dots are placed at random within the limits of the state (United States) or the country (other countries, including Canada). The position of each dot doesn't represent the actual location of sampling. Strains from the US not assigned to a specific State are placed in the blue square. Strains not assigned to a specific country are placed in the red square. You can choose the size of the dots. Clicking on a dot provides the characteristics of the strains. You can choose the date of isolation (See which date). By clicking on the arrow below the slider, you obtain a dynamic graph (localisation of the strains as a function of the date of isolation).

On the “Sub Network Characteristics” subtab

A table provides the characteristics of the strains.

On the “Download” subtab

You can download the strains data (meta data) and the SNP distance data from the selected CC.

Contact

Questions and comments: FDAFoodSafetyRiskModel@fda.hhs.gov

Versions

Version Beta 2.8: Automatic updates of the distribution for Listeria (published 4/23/2019). New tab for strain description.
Version Beta 2.7: Updated Distribution for Listeria (published 4/11/2019).
Version Beta 2.6: Updated Distributions (published 2/25/2019).
Version Beta 2.5: first released version on December the 2nd, 2018.





Pivot Table for the Selection

Epicurve for the Selection

Legend hover text:
(Date)
Year of collection / Year of creation of the target / Country / Target / Cluster / Best Known Level / Isolation source as specified / Serovar / Duplicates

Map for the Selection

Warnings: the dots are placed at random within the limits of the state (United States) or the country. The position of each dot doesn't represent the actual location of sampling.
Strains from the US not assigned to a specific State are placed in the blue square. Strains not assigned to a specific country are placed in the red square.
Click on a connected component, then on the "Select Sub Network" button

Connected-component characteristics

Connected-component characteristics per CC

Connected-component timeline


Connected-components


Connected-component sizes as a function of threshold

Legend: Year of collection / Year of creation of the target / Country / Target / Cluster / Best known level / Isolation source as specified / Serovar
Legend hover text:
(Date)
Year of collection / Year of creation of the target / Country / Target / Cluster / Best Known Level / Isolation source as specified / Serovar / Duplicates
The numbers indicated are numbers of connections.
Warnings: the dots are placed at random within the limits of the state (United States) or the country. The position of each dot doesn't represent the actual location of sampling.
Strains from the US not assigned to a specific State are placed in the blue square. Strains not assigned to a specific country are placed in the red square.

Connected-component characteristics

Connected-component data


Download metadata and SNP distances