Verdant Tutorial





If you would prefer to use Verdant without creating a CyVerse account, we have provided a general account. All data and analyses ran under this account will be deleted after 72 hours. Use the following to login:

User: verdant_public_user
Password: Verdant1





1. Getting Started

If you will be using Verdant for a long-term project, you will need an account with CyVerse, formerly iPlant. Once you have your CyVerse account, you can return to the Verdant home page and log in (upper right corner of the page).

You will then be at your home page, with three icons, New Project, Create a New Project, and Upload Plastome. If at any point in your work you need to navigate back to your home page, click on your name in the upper right corner of the screen or use the navigation bar at the top of the page.

Verdant can annotate and display the genome of a single chloroplast (Upload Plastome) or can align all or parts of the genomes of multiple chloroplasts (New Project or My Projects). Usually you will start by uploading one or more plastomes and annotating them. After that you will create sets of plastomes (Projects) for analysis.





2. Upload Plastomes



2.1 Annotate and display the genome of a single chloroplast.

The plastome sequence must be in a single text file in fasta format. Verdant is very particular about the format of the fasta header because your data will be compared to data already in the taxonomic database. The FASTA header should include the full taxonomic hierarchy of the sample as follows:

>Genus_Species-epithet_Tribe_Subfamily_Family_Order_Accession_Reference

All fields are required. If your species does not have a particular rank, then mark it as NA. For example, the entry for Typha latifolia from Genbank would look like this:

>Typha_latifolia_NA_NA_Typhaceae_Poales_NC013823_GenBank

Each sample will be put into the database using its taxonomic information. For the "Accession" field, we suggest that authors use the accession ID they have maintained for the sample in their research. For example, you might use the voucher number (e.g. McKain 231). It is highly likely that there will be multiple accessions for each species, so this is a vital entry.

The "Reference" entry refers to the source of the data. You can use the published (or projected to publish) author and year (e.g. McKain 2015) or a reference like "GenBank". We have already uploaded many GenBank sequences, and if you attempt to upload a GenBank sequence that is already in the database, it will simply skip the upload for that particular genome. This is to reduce redundancy in the data base. GenBank accessions should be recorded as the Genbank ID.

The text file must be placed in a folder (directory) and the folder must be zipped so that the uploaded file has the extension .zip. On a Mac this is done by selecting the folder in the finder; then go to the File menu and choose Compress. On a Windows machine the directory is compressed by right-clicking on the folder; choose Send to, and then choose Compressed (zipped) folder. (Common errors include failing to put the file in a folder, or failing to zip the folder.)

Use the Browse button to locate your .zip file on your desktop. Then click Submit. A message will appear near the upper left of the page saying “The file has been uploaded.” The plastome has been automatically uploaded and annotated. (This is the real magic of Verdant.)

To view the annotation and a Circos graph of the plastome, return to your home page (click on your name on the upper right), and go to New Project. Follow the instructions below for Create New Project. When you get to the page for Add/Remove Items, you can either search for your taxon using the drop down menus, or can browse the entire list of uploaded files using Select Individual Taxa. Your newly uploaded genome will appear in the list of genomes to add to the project. (If you have made a mistake in the format of your plastome file, you will still get the message saying that your file has been uploaded, but you will not find it in the list.)

Note: The list of taxa that you can see will be a mix of ones that are publicly available and ones that you have uploaded yourself. Plastomes that you have uploaded are not visible to other users unless and until you make them public.

Once you have chosen your new genome, click Done. You will then be taken to a page that is specific for your new project. It will list your new plastome (and possibly others of the same species depending on how you did the search).

Click on the species name to display the taxonomic information about the accession, the Circos graph, and data on the size of the plastome, the LSC, SSC, and IR.

The three buttons below the size statistics provide three ways to access the annotations. JBrowse will open the annotations in a conventional genome browser. Download annotation will download the annotations as a tab delimited text file with the gene name followed by its coordinates. Download GFF3 will download the annotations in GFF3 format. The GFF3 format is a universal annotation format and can be used for other programs.

The fields under Taxonomic Information can be edited and updated if necessary. If you change anything in these boxes, click Submit to update the database.



2.2 Annotate and display the genomes of several chloroplasts

Put each plastome sequence in its own text file following the conventions described for a single sequence above.

Place all the text files in a single folder (directory). The directory must then be zipped, as above, so that the uploaded file has the extension .zip. On a Mac this is done by selecting the folder in the finder; then go to the File menu and choose Compress. On a Windows machine the directory is compressed by right-clicking on the folder; choose Send to, and then choose Compressed (zipped) folder.

Use the Browse button to locate your .zip file on your desktop. Then click Submit. A message will appear near the upper left of the page saying “The file has been uploaded.” The plastomes have been automatically uploaded and annotated. (This is the real magic of Verdant.)

The uploading and annotating process takes about 1 minute per plastome, so for several plastomes you may have to allow a bit of time before the results are immediately available

To check that your new plastomes have been added to the database, return to your home page (click on your name on the upper right), and go to New Project. Follow the instructions below for Create New Project. When you get to the page for Add/Remove Items, you can either search for your taxa using the drop down menus, or can browse the entire list of uploaded files using Select Individual Taxa. Your newly uploaded genome will appear in the list of genomes to add to the project. (If you have made a mistake in the format of your plastome file, you will still get the message saying that your file has been uploaded, but you will not find it in the list.)

Note: The list of taxa that you can see will be a mix of ones that are publicly available and ones that you have uploaded yourself. Plastomes that you have uploaded are not visible to other users unless and until you make them public.

Once you have chosen your new genome, click Done. You will then be taken to a page that is specific for your new project. It will list your new plastome (and possibly others of the same species depending on how you did the search).

Click on the species name to display the taxonomic information about the accession, the Circos graph, and data on the size of the plastome, the LSC, SSC, and IR

The three buttons below the size statistics provide three ways to access the annotations. JBrowse will open the annotations in a conventional genome browser. Download annotation will download the annotations as a tab delimited text file with the gene name followed by its coordinates. Download GFF3 will download the annotations in GFF3 format.

The fields under Taxonomic Information can be edited and updated if necessary. If you change anything in these boxes, click Submit to update the database.



2.3 Upload annotations directly to Verdant.

If your chloroplast genomes are from a currently not represented branch of the angiosperm phylogeny or you know that your chloroplast are highly divergent from closely related species, then we highly suggest adding a trusted annotation directly to Verdant. This can be done using the Upload Annotations tab when signed into Verdant. GFF3, Verdant, and DOGMA-derived FASTA files are all acceptable annotation formats. Users will also need to upload whole chloroplast sequences with headers formatted as above. When uploaded, users have the same access to the previously annotated chloroplast genome as they would a Verdant-annotated chloroplast genome.




3. Projects

A project is a single data set. It is a set of plastomes that you have chosen for a single set of data analyses. For example, you could create a project that includes all the plastomes of a single genus, or a particular family, or group of families.

Verdant comes pre-loaded with a set of complete plastomes available from GenBank, covering a broad range of angiosperm taxa. These have been inspected and their assembly and annotation verified.




4. Annotate and align the genomes of serveral chloroplasts



4.1 Prepare files, login, and upload as described in 2.



4.2 Create a New Project.

Click on the New Project button. Create a name for your project and choose SUBMIT. You will be taken to a page called My Projects. There are five columns, Project Name, Created, Last Modified, Download Metadata, and Delete from Server.

Project Name is a hyperlink. If you click on this, you will go to another page with several options, beginning with Add/Remove items. Below this is a button Download Annotations, below which is an area called Project Items, which will list the plastomes that you will analyze in the Project. At the moment there is nothing listed.

Clicking on the button Add/Remove Items will take you to a page Add Items to your working Project. This page gives you two ways to add items to your project. Select Taxonomic Group is followed by a set of drop down menus. Choose the plastomes you wish to work on. You do not need to specify all fields. For example, if you wish to work on the genus Chrysopogon, you can simply select Chrysopogon; you do not need to specify the order and family. Once chosen, simply press ADD to put the taxa in your Project. Alternatively, you are also able to remove taxa by group on this page by selecting the taxa and pressing REMOVE.

Select Individual Taxa will take you to a list of plastomes already in the database. This list is fairly long so you will probably have to do some scrolling to get through it. Each column can be sorted by clicking on the column label. Items that can be added have green buttons (Add Item) and those that have already been selected have orange buttons (Remove Item). There is no difference between your private chloroplast genomes and publically available plastomes in this instance. When you click on the green Add Item button it will change to a purple Remove Item button. When you have selected all taxa of interest, click on Done. You will return to the page listing the Project Items.

All the plastomes you have uploaded have already been annotated. If you wish to see the annotations, click on Download Project Annotations, and the GFF3 annotation files will be downloaded as a .zip archive.



4.3 Align the Genome or Regions of Them.

Now you are ready to search for homologous regions of the plastomes, align them, and (if you wish) generate a phylogenetic tree. Under “Region” there is a text box. Enter the name of a locus or a region of the genome. If you want to include the entire genome type in “Full”. Other Regions include LSC, SSC, IRB, and IRA. (These stand for Large Single Copy, Small Single Copy, Inverted Repeat B, Inverted Repeat A.) Loci use standard abbreviations, e.g., rbcL for the large subunit of Rubsico, ndhF for the F subunit of NADH dehydrogenase, etc. For a full list of genes and regions, click on the hyperlinked word “here” and the list will appear in a new window. If you wish to search for multiple items, separate the names by commas (e.g., accD, psaI, ycf4). The names are not case sensitive. If you wish to search for a region of the plastome bounded by particular genes (e.g. the section between accD and petA), type in the names of the first and last features, separated by a colon (e.g. accD:petA).

After you have entered the Regions, you can choose whether you want the output as unaligned fasta files or alignments. Click the appropriate box.

If you chose Alignment, a new set of boxes will appear so you can choose the output format of the aligned sequences; options are fasta, phylip, and nexus. You can also click on the box next to Tree if you would like Verdant to give you a maximum likelihood tree computed in RaxML. The tree will be output in newick format. Fasta formatted sequences will always be downloaded with a search. You must choose a format type to receive an alignment file.

Click on Search. A new window will open saying Your Files Will Be Ready for Download Soon! and Click Here to Access Your Files. Files will be stored in a directory on the CyVerse server. Your directory will have a unique number so you can bookmark your download page and return to it any time. You will also get an email message when your job is done. It often takes Verdant a while to run the analyses so be patient!

Note that Verdant is not designed to be a comprehensive tool for tree searching. The RAxML tree is provided for convenience, but tree-searching scales with the number of sequences so will become progressively slower as Project size increases. If you find yourself waiting for a long time, we suggest that you use Verdant only for alignments and undertake the tree searches on a cluster where more memory can be allocated.

Below the Search button is the header Project Search History, and below that is a table that provides the search link, the time the search was started , and the features searched. You can return to this link any time.



4.4 Return to a Project After You have Created It.

The set of plastomes that you have assembled into a project is saved automatically on the CyVerse server, and you can return to that project at any time. On your home page, click on My Projects and you will be taken to a page called My Projects. There are five columns, Project Name, Created, Last Modified, Download Metadata, and Delete from Server. The Project Name is a hyperlink, which will take you directly in to the project. Date Created and Last Modified are self-explanatory. Download Metadata will produce a comma delimited file listing the genus, species, accession, total number of base pairs in the plastome, number of base pairs in the Large Single Copy (LSC) region, Small Single Copy (SSC) region, Inverted Repeat (IR), the number of protein-coding genes. Delete from Server is also self-explanatory; you will be asked to confirm Delete before the project is deleted.



4.5 Searching Annotated Regions

At the bottom of the page below the Search button is the header Project Search History, and below that is a table that provides the search link, the time the search was started , and the features searched. You can return to this link any time.

Search output will be stored and downloaded as a .zip file. The particular contents will depend on the type of search. Search for a single gene, or for a comma-delimited set of genes: Each gene will be treated separately and will be put in its own folder (e.g., rbcL). Within the folder will be a file containing all the sequences for that gene in fasta format (e.g. 9999.rbcL.rbcL.fsa); the sequences in this file are not aligned. There will also be a folder called Alignments. In the Alignments folder will be folders for each sort of alignment output requested (e.g., Phylip, Nexus) and a folder called Tree if this was requested. Within the alignment-type folder will be one or more alignment files. The first will be the gene itself (e.g., 9999.rbcL.rbcL.nex is the alignment of all requested rbcL sequences). If an additional gene was requested, the second file will be the concatenated alignments of the two (or more; e.g., ycf1-rbcL.concatenated_alignments.nex).

Searching for a range will return files with numbers prepended to the file name (e.g. 0.rbcL.rbcL.nex). These numbers presented the order in the range, in which that annotated feature was encountered. If taxa have variation in gene order, they will have differently numbered files. When the search alignments are made, these individual files are aligned separately. They are then concatenated in numerical order based on the prepended digit. If there is gene order variation, then there will be gaps between the taxa in regions where the order did not match. This method of alignment adds gene order information into the alignment, providing valuable phylogenetic signal. A full length sequence for the range is returned (e.g. 8888.psbA-matK.psbA-matK.fsa) that is extracted from each chloroplast genome directly. This allows for the exact gene order to be represented for each taxon. The number "8888" is used to desginate this specific type of sequence file.

The Tree folder contains the full output from RAxML for each individual gene and for the concatenated genes. For a full description of RAxML outputs, see RAxML Manual.

Search for a range or a named region (e.g., IRB): As for any other search, you will get a .fsa (fasta) file for every gene in the range or region. The Alignments folder will contain an alignment for each individual gene, an alignment for all genes concatenated.

Users may want to BLAST intergenic regions to identify potentially novel genes or pseudogenes, especially if the user thinks particular genes have been missed. Users may also contact Verdant administrators to receive a copy of all unannotated ORFs from user chloroplast genomes.


                    Created By: The Kellogg Lab             BootStrap Theme: Blacktie.co