Skip to content

Submitting

Material

Download the presentation

Our story

Our project Genetic variation of listeria in cow has the aim to gain knowledge about the genetic variability of listeria. As a first step, we isolated Listeria monocytogenes from two different cow milk samples (sample LIS001 and LIS002). The milk was collected at January 2022 at two dairy farms, one near Bern (CH), and the other near Fribourg (CH). We received them on February 2nd of the same year in our lab. By whole genome sequencing we found out that LIS001 belongs to serotype 4b and LIS002 to serotype 1/2b. We performed paired-end sequencing on an Illumina MiSeq (2 x 150 bp) on DNA extracted from both isolates. You can find the raw reads here.

Webin account

These exercises only work with Webin accounts that are more than one day old. Webin accounts created on the same day will not be able to perform the submission.

Exercises

You are asked to submit the reads to a repository at EBI. First, figure out what the best submission strategy is at the submission wizard.

Answer

After answering the questions, you probably conclude you should submit to ENA.

If you haven’t already got one, make a Webin account. After the creation of the Webin account we will continue in the dev environment at https://wwwdev.ebi.ac.uk/ena/submit/webin/login.

From now on dev environment!

Make sure you use the dev environment https://wwwdev.ebi.ac.uk/ena/submit/webin/login. Otherwise the reads will be submitted and stay at ENA!

We will follow the route Submit to ENA using the Webin Portal, which requires four steps.

1. Register study

A study (or project) is an entity describing a research project. Typically this is at the level of a research grant. This make it e.g. easy to find data from multiple experiments from the same study. Click on Register study and fill out the form for the project described above.

Note

The descriptions do not have to be long for this example. Typically, you can copy-paste this information from e.g. the research proposal.

2. Register samples

Go back to the submission portal main page, and click Register samples. Start by finding a sample template.

Exercise: What would be the most appropriate template for our project?

Answer

Listeria monocytogenes is a pathogen, so as checklist group we select Pathogens Checklists. It is food-borne, so the first one COMPARE-ECDC-EFSA pilot food-associated reporting standard could be appropriate. However, we can’t specify the host (cow) in there. So we choose the alternative: ENA prokaryotic pathogen minimal sample checklist.

Exercise: Check out all fields (including recommended and optional). Any field you want to add/remove?

Answer

We don’t have information about where the samples were exactly collected, so we can uncheck lat_lon at ‘Recommended fields’. We can also uncheck strain because we don’t have any strain information.

Exercise: Now download the template as tsv, and edit it in e.g. excel for our two samples (LIS001 and LIS002). Do not change the first three lines. For your convenenience, keep the field description open, so you know what to fill out.

Hint

In order to find the taxonomy ID (tax_id), use NCBI Taxonomy to find it.

Answer

You can find an example of a ‘correctly’ filled out table here.

3. Upload reads

Before we submit the reads, we first need to upload them. We will use the ftp protocol with FileZilla.

Note

Find all ways to upload sequence reads to ENA here.

Get the reads we will submit here. After downloading, extract the tar file.

To upload the reads, create a new Site that connects to ftp://webin2.ebi.ac.uk in FileZilla:

  • File > Site Manager..
  • Click New Site
  • Give the connection a name e.g. ‘webin upload’
  • Specify at Host webin2.ebi.ac.uk
  • At User and Password specify your Webin username and password

After filling out the connection details, click Connect to connect to your personal space on the ftp server.

If the connection was successful, browse to the sequence files on your local computer in the left pane, and drag them to the window on the right pane (the ftp server).

4. Submit Reads metadata

Go back to the submission portal main page, and click Submit reads. Like with the samples, we will download and fill out a template.

Exercise Check out the template choices. What would be the most appropriate template?

Answer

We have paired fastq files, so we’ll choose Submit paired reads using two Fastq files.

Exercise: Again check out the mandatory and optional fields. Any field you would like to add from Optional Fields?

Check ‘Show Description’

The descriptions do not show by default for the read submission template. View it by checking ‘Show Description’ at the top of the page.

Answer

No, probably nothing to add because we don’t have that information. If you have this kind of info, don’t hesitate to add it!

You have probably seen the fields ‘forward_file_md5’ and ‘reverse_file_md5’. md5 checksums are used to check whether the file transfer has completed completely without error. You can calculate them like this:

md5sum *.fastq.gz
CertUtil -hashfile LIS001_R1.fastq.gz MD5
CertUtil -hashfile LIS001_R2.fastq.gz MD5
CertUtil -hashfile LIS002_R1.fastq.gz MD5
CertUtil -hashfile LIS002_R2.fastq.gz MD5

Exercise: download the template and fill out the required information. Check the field description and permitted values in the template description. You can also find an overview of permitted values here.

Answer

You can find a ‘correctly’ filled out table here.

Now you can upload the file at Upload filled spreadsheet template for Read submission, and the website will tell you if it has been submitted successfully. Now at the side of ENA the files and md5 sums will be checked. If that has occurred successfully (usually occuring over night) you’re all set! At Run files report you can check the status of the files.

Extra: submit with webin CLI

Submission though webin CLI requires the same steps for registering a study and registering samples. However, the process of submitting the reads is done programatically. A general overview on how to use Webin CLI for reads can be at the ENA docs.

To do these exercises, follow these instructions to install Webin CLI locally.

Submitting with Webin-CLI starts with creating a json manifest file, in which you specify the metdata associated with your reads. This metadata json basically contains the same information as in the read submission template that we have created above. Here’s an mostly empty example:

manifest.json
{
    "study": "PRJEB00000",
    "sample": "ERS00000000",
    "name": "",
    "platform": "",
    "instrument": "",
    "libraryName": "",
    "library-source": "",
    "library_selection": "",
    "libraryStrategy": "",
    "fastq": [

      {
        "value": "LIS001_R1.fastq.gz",
        "attributes": {
          "read_type": "paired"
        }
      },
      {
        "value": "LIS001_R2.fastq.gz",
        "attributes": {
          "read_type": "paired"
        }
      }
    ]
   }

Creating json files

Typically such a json files is generated programmatically with e.g. R or python.

Exercise fill out this json for the reads associated with LIS001. Use the ENA docs to find the permitted values.

Answer

Here’s an example. Note that study and sample are likely different for you.

manifest_LIS001.json
{
"study": "PRJEB62850",
"sample": "ERS15567929",
"name": "LIS001",
"platform": "ILLUMINA",
"instrument": "Illumina MiSeq",
"libraryName": "LIS001",
"library-source": "GENOMIC",
"library_selection": "RANDOM",
"libraryStrategy": "WGS",
"fastq": [
    {
    "value": "LIS001_R1.fastq.gz",
    "attributes": {
        "read_type": "paired"
    }
    },
    {
    "value": "LIS001_R2.fastq.gz",
    "attributes": {
        "read_type": "paired"
    }
    }
]
}

Now, we can use Webin CLI to validate and submit the reads. On a UNIX-based system it would look like this:

java -jar webin-cli-6.4.1.jar \
    -context reads \
    -userName Webin-XXXXX \
    -password "$WEBIN_PW" \
    -manifest manifest.json \
    -outputDir . \
    -validate \
    -submit \
    -test

Exercise Make sure that you have the webin-cli executable, manifest file and reads are in the same directory and run the command.

Hint

You can store you password in a variable like this:

export WEBIN_PW=mywebinpassword

Warning

Make sure to add the -test option, otherwise Webin CLI will try to submit your reads!