Ecoli: TY-2482
E. coli O104: H4 sequence from the 2011 outbreak

The May 2011 outbreak of an E. coli infection in Europe has resulted in serious concerns about the potential appearance of a new deadly strain of bacteria. In response to this situation, and immediately after the reports of deaths, the University Medical Centre Hamburg-Eppendorf and BGI-Shenzhen have worked together to sequence the bacterium and assess its human health risk. BGI was informed of the dangerous situation and, in collaboration with the University Medical Center Hamburg-Eppendorf researchers, used their genomic technology to determine the infectious strain, reveal the mechanisms of infection, and facilitate the development of measures to control the spread of this epidemic.

Upon receiving the bacterial DNA samples, BGI finished sequencing the genome of the bacterium within three days using their third-generation sequencing platform — Ion Torrent by Life Technologies. According to the results of the draft assembly (see below), the estimated genome size of this new E. coli strain is about 5.2 Mb. Sequence analysis indicated this bacterium is an EHEC serotype O104 E. coli strain. Comparative analysis showed that this bacterium has 93% sequence similarity with the EAEC 55989 E. coli strain, which was isolated in the Central African Republic and known to cause serious diarrhea. This strain of E. coli, however, has also acquired specific sequences that appear to be similar to those involved in the pathogenicity of hemorrhagic colitis and hemolytic-uremic syndrome. The acquisition of these genes may have occurred through horizontal gene transfer.

Due to the importance for public health of releasing the data as quickly as possible we made these initial sequences publicly available, but please be aware that the data is of a preliminary nature and should be treated as such. We will be updating this page with improved assembly data as it is produced.

Data has also be deposited with NCBI SRA037315.1
http://www.ncbi.nlm.nih.gov/bioproject/67657
http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Link&LinkName=bioproject_sra&from_uid=67657
http://www.ncbi.nlm.nih.gov/nuccore/AFOG00000000

Data has also been uploaded to the Crowdsourcing Github:
https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki/

See also the press release:
BGI releases the complete map of the Germany E. coli O104 genome and attributed the strain as a category of Shiga toxin-producing enteroaggregative Escherichia coli (STpEAEC)
BGI Sequences Genome of the Deadly E. Coli in Germany and Reveals New Super-Toxic Strain
Further analysis on improved genome assembly indicates the outbreak E. Coli has complex genetics with resistance to at least 8 antibiotics
New clues found in tracing the origin of the deadly E coli strain and an appeal for the sharing of additional data
BGI releases a complete de novo E. coli O104 genome assembly and is making their detection kit protocols and synthesized primers freely available to worldwide disease control and research agencies


Image source: Wikimedia

Workflow
Methods for Integrated Denovo Assembling 20110606 published:
1. Filtering Illumina singled-end data
	a. Remove the reads with adapter contaminant.
	b. Remove the reads with 3 continuous Q2 bases.
	c. Remove the reads with 5 continuous‘N’bases.
	d. Trim the Q2 bases from the end of reads.
2. Filtering Ion Torrent data
	a. Remove the reads with adapter contaminant.
	b. Trim the base with quality lower than Q14 from the end of reads.
	c. Remove the reads with length shorter than 20 bp.
3. SOAPdenovo assembler
	Version: 1.06 (not released)
	Parameters: -K 51, -d 1, -R
4. Newbler assembler
	Version: 2.0.00.22
	Parameters: default
5. Details for Assembly Calibration using Illumina clean data
	a. Revise Single Base Error and Small Indel Discard the base type with frequency lower than 5%, then select the major base type as a reference into assembly result. 
	b. Break Chimerical Region Remove the bases that were covered by less than 10 reads with different alignment positions.
6. AMOS minimus2 assembler
	Version: 1.59
	Parameters: REFCOUNT=0, OVERLAP=50, MINID=94, MAXTRIM=10
		
Download
ftp://ftp.genomics.org.cn/pub/Ecoli_TY-2482
Assembly
		
03/06/11 Ion Torrent mapped assembly: Escherichia_coli_TY-2482_20110606_upload2ncbi.fa.gz
06/06/11 Ion Torrent+Illumina hybrid assembly: Escherichia_coli_TY-2482.contig.fa.gz
06/06/11 Ion Torrent+Illumina hybrid assembly (NCBI version): Escherichia_coli_TY-2482.contig.20110606.fa.gz
11/06/11 Illumina de novo assembly: Escherichia_coli_TY-2482.scaffold.20110610.fa.gz
16/06/11 Gapless Illumina de novo assembly (chromosome): Escherichia_coli_TY-2482.chromosome.20110616.fa.gz
16/06/11 Gapless Illumina de novo assembly (plasmid): Ecoli_TY-2482/Escherichia_coli_TY-2482.plasmid.20110616.fa.gz

Raw Data

11/06/11 Illumina reads: 110601_I238_FCB067HABXX_L3_ESCqslRAADIAAPEI-2_1.fq.gz

Ion Torrent reads  

02/06/11 Ion Torrent run 1	run1.fastq.gz
02/06/11 Ion Torrent run 2	run2.fastq.gz
02/06/11 Ion Torrent run 3	run3.fastq.gz
02/06/11 Ion Torrent run 4	run4.fastq.gz
02/06/11 Ion Torrent run 5	run5.fastq.gz
03/06/11 Ion Torrent run 6	run6.fastq.gz
03/06/11 Ion Torrent run 7	run7.fastq.gz

Others

		2011vs2001_v2.xls
		Specific_primers_for_PCR_detection.pdf
Citation

To maximise its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 licence. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as:

Li, D; Xi, F; Zhao, M; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, C; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen, Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Song, Y; Zhao, X; Chen, F; Yin, X; Rohde, H; Liang, Y; Li, Y and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011): Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001

http://dx.doi.org/10.5524/100001

CC0
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.