London Calling 2017
Presentation notes on London Calling, written by David Eccles.
Day 1 (May the Fourth be with You)
Gordon Sanghera
The Kinks (intro sound / theme sound for this conference)
- A lot to do with nanopore sequencing
- Disruptive; appeared between The Beatles and [some other band that I didn’t have time to take note of]
- Used knitting needles to slash speakers, basically launched heavy metal and punk
Lord Kelvin
- Without measurement, there is no scientific progress
- We are now taking measurements and gathering information in real time
- Real time changes everything
Billingsgate (location of conference)
- Named from a water gate where goods landed. It was managed by King Billing
- In 1699 an act of parliament allowed a fish market in Billingsgate, no other market was allowed within 6 2/3 miles
- Except for eels – they were allowed to be sold by the Dutch, because of the help the Dutch provided during the Fire of London
- Yesterday, a Dutchman bearing eels gave a presentation
- Jellied eels are a local delicacy; attendees should try them out
Disruptive Innovation
- Venue has been organised to seat 400 people; it has filled up
- There will be a guest speaker at dinner talking about disruptive innovation
- It is you lot [i.e. the attendees] doing the innovating
- Some of us have gambled our whole careers on this
- ONT is here today to learn from the community
- 25% of ONT staff are here to listen
- We are here at the beginning
- Many careers will be made through the bravery of the community
Zoe McDougall – Housekeeping
- Packed program; any speakers should state at the beginning of their talk if they don’t want something tweeted
- WiFi available; Battery charging points all over the place
- No scheduled fire drills, so if you hear an alarm, leave the building through the exists [DE: there was actually a scheduled drill at a previous London Calling conference]
- Three floors; eating and drinking will be done tonight in the basement
Karen Miga
The Use of Really Long Reads
- New excitement in the genomics field for reads 10-100kb in length
- Still not completely there with the longest reference assemblies
- Huge gaps exist in the human genome
- Most sequencing technologies take place on the chromosome arms
- It is difficult to get true haploresolution on the centromeres
- Centromere repeats are head to tail for millions of bases
- A fundamental genomic milestone is almost within reach: telomere-to-telomere assembly
Assembly problems
- Repeats are 98-100% similar
- A short library with reads of 10kb can need reads of up to 100kb to span
- With human centromeres, even 100kb reads are too short
- What about unique markers within the region?
- There are some scars and SNVs
- SNVs are probably the most prevalent thing for assembling
- One sequence is not the problem
- Diploid read phasing is incredibly difficult
- There is sequence similarity for different genomic regions: satellites on chromosome 18 share with chromosome 20
Resolving the issues; 3 key advances required
- Understanding of satellite sequence structure
- Increased throughput
- High-quality base calls
Satellite array on chromosome Y
- Why Y? Y chromosome has the smallest satellite array, and is well characterised
- Have used a BAC-based strategy to assemble
- There are a set of 9 BACs that are known to span the region
- Assembly process: sequence BAC to high depth, use Illumina reads as a truth set
Using UCSC Longboard protocol
- Linearising process for BAC, optimised to give the BAC sequence
- Managed to get a 100kb+ [library] N50
- In total, there were 3.5k reads of greater than 150kb
- From the nanopore reads alone, the accuracy was not high enough for centromeric analysis
- Polishing can be done to end up with high-quality consensus
- Polishing works well with about 10~60X coverage
BAC sequencing example: 221.4kb
- Longest sequenced microsatellite region
- Can now detect 634 errors in this region
- Lots of the errors are homopolymers A & T
- Resequencing carried out on Illumina, including pentamer analysis
- Comparison with Illumina gives strong correlation
- ONT had only one outlier, AAAAA/TTTTT
- After error correction, only 23 of the variants seemed to be true
Other BACs
- All BACs put together produce 346kb centromeric regions
- Assembled to R1B [particular human reference], median repeat region 350kb, SD 220kb ~ 460kb
- Looking at the individual cell lines, can find restriction enzymes that do not cut within the array and haplogroup-matched dataset
- get blot at the expected region
Future work
- Move along from BAC-based approach
- With long reads, can probably span these regions
- Optimising the 1D protocol, ideally want to get to 60X coverage for long reads
- UCSC working on visualisation for variants and centromeric regions
Questions
- How long to reads need to be to do all centromeres?
- A megabase sized read might work; the mean centromeric length is 3Mb
- If 3Mb can be spanned, then it will be possible to do a diploid resolution of centromeres
- Why does accuracy go down when increasing coverage?
- False positive consensus; there is a sweet spot
Björn Usadel - Bringing Omics Data to Users
Plant Genomes: a historical perspective
- Plants are particularly nasty beasts with very large genomes
- Plants are riddled with repetitive sequence
- Wheat is particularly nasty (12GB)
- The whole genome and gene space is repetitive
- Multiple copies, even for small genome families
Tomato
- Contrary to the usual definition of species, different tomato species are all very happy to cross with each other
- Solanum penelli is a very hardy plant
- Did sequencing with Illumina, fosmid, BAC
- Got scaffold N50 of 1.6Mb, contig N50 of 45kb
- Sequenced a new “cultivar”, but there are millions of variants
Oxford Nanopore and Plant Genomes
- Started with small algal genomes
- Canu and Pilon polishing worked well
- Processed 31 flow cells, each with over 1Gb per flow cell
- Read length was tunable, but a little bit dependant on what can be selected with the Pippin Prep
- Output quite high in 24 hours
- After the run was done, the spent flow cell could bee used for testing purposes
- Fairly good correlation between reported q value and actual quality
- 81-82% accuracy
- Very clean DNA is needed; herbal plants can be problematic
- Read correction with Canu improved accuracy to about 90%
Assembly experimentation
- Tried a whole bunch of assemblers, tried subsampling data
- Started with 100X coverage
- Canu correction together with SMART-denovo got the best N50
- Canu makes fewer mistakes; Canu-corrected reads are fed into SMART-denovo
- Is SMART-denovo the best to use?
- It was much faster; Canu-corrected took 10,000 CPU hours
- Tried sampling, putting emphasis on getting long reads
- 30X coverage with longest reads works best
- In general, looked good when mapped to the reference genome
Other quality metric: BUSCO
- All methods had similar unpolished BUSCO scores
- Pilon polishing worked the best
S. penelli Compared to other genomes
- Pineapple N50 120kb
- Quinoa scaffold assembly 3.84Mb
- Citrus PacBio + Illumina, contig N50 2.2Mb, scaffold 4.2Mb
- Can actually get a good assembly in a few months
Future / Questions
- Finishing other genomes
- Waiting on albacore improvements: longer insertions, longer deletions, overall better
- With R9.5, got one read that aligned well to chloroplasts
- What does it mean? Small labs can sequence a genome
- Repetitive elements are being analysed now. Centromeres are not being talked about
- Does Pilon introduce noise in the repeat regions?
- After 5-10 rounds of Pilon, things started jumping around
- Prefers 2-3 rounds of error correction with Pilon
Jared Simpson – Analysis Tools For Nanopore Data
Signal-level nanopore data
- Want to work with raw signal as much as possible
- Channel passes single-stranded DNA through a nanopore
- Current samples are written out to FAST5 files
- Sequence disrupts the current
- DNA movement introduces a new sequence context
- We hope to see movements up and down that reflect movement of the DNA through the pore
- Signal-level analysis involves working with both raw samples and segmented currents (events)
- Events are fewer in number
- Event-based algorithms typically run faster, not a lot of data lost
Basecalling
- The primary analysis task of nanopore
- Involves inferring the sequence that gives rise to the current
- Take a vector of events, apply labels to the events
- Overlapping labels are merged together to give a final base-called sequence
- Don’t want any loss of information in going from events to base calls
Event data / Nanopolish
- Larger files, slower models
- Nonetheless, better improvements in data
- Nanopolish was originally for improving consensus sequences
- Can now call SNPs, INDELs, and modified bases
- Can also be used for phasing reads and long-range haplotypes
Lightning Talk – Raja Mugasimangalam
Is The Pot Labelled Correctly ?
- From India, taking us back several thousand years ago
- No dairy products had labels, all in pots
- Now have labels, but wanted to know if they were correct
- ViBact labelled well; it had a single microbe in it
- Yakult labelling okay, but could be better
- Found some fungi that shouldn’t be there
- Home curd is unlabelled and untold
- The process of discovery: stir / spin / extract
- 12 products + 9 other
- Could identify pot with a very low number of reads
- Need at least 500 reads for a complex genome (if more than 10 bugs)
- Major composition of samples was determined in 4-6 hours
- Now have local calling
- Were there reagent contamination issues?
- Used controls with common contaminants
- Did multiple samples from different shops
- The same lots contained the same bugs
Lightning Talk – Sebastian Johansson
HLA Typing
- HLA is important, especially for organ transplantation
- There are two HLA classes of varying length
- Got products from an amplification primed on HLA sequences, pooled together
- 24 hours of sequencing produced a good barcode distribution
- Coverage was good for short reads, not so good for long reads
- Looked at alignments to HLA after filtering
- Got small islands of coverage that disappeared after filtering
- A few reads were found spanning the entire HLA region
- Works well for A, B, C, DQ
- Would be improved with spanning primers
Lightning Talk – Franz Josef Müller
Rapid Identification of Genomic Regions / SelectION
- Developed tool is a solution to a specific problem
- Intermediate step between basecalling and alignment (anchoring)
- Cuts down on the time required to carry out an alignmment
- Localised 160 reads to FMR1 region; 28 reads to repeat within FMR1
- Subsequent analysis time was less than a minute
- Essentially Burrows-Wheeler Transform + FM Index
- Four orders of magnitude faster than mapping everything ‘ Mapping of over 90% of the reads, accuracy over 90% compared to BWA
- Nanopore sequencing excels at repeat expression finding
Lightning Talk – David Eccles
Sequencing a parasite genome
- A long history of sequencing, started doing his own prep with R9 kit
- Runs in October / November last year were bad
- ONT offered to do the sequencing for them
- Five sequencing runs, all better than 20 previous runs at institute
- One run amplified from a single worm
- Metagenomic profile from a 35,000-read subset
- One run had a very long read length “hump”
- Sonication + Tip20 extraction
- Run N50 was better than current reference genome
- Poster talks about assembly discoveries
Lightning Talk – Sally James
Telomere to Telomere sequencing of Galdieria sulphuraria
- A real pain to lyse
- Telomere is an octamer repeated 26 times; other end is repeated 29 times
- Nanopore reads aligned to an area right at the ends of contigs
- Reads were piling up at the ends of contigs
- Many 40bp reads, pileup and completely stop dead at the end of the contig
- Can look at multiple repeats
- Currently assembling complex sub-telomeric regions
Lightning Talk – Beth Lodge (ONT)
VolTRAX
- All electronics are in the base of the unit, flow cells are added on top
- Voltrax integration program is open, applications are available online
- High-output kit, gives best throughput
- Yield variation between labs is reduced significantly
- It can do extraction as well as PCR
- Similar yields to other extraction methods
- Looking at 16s sequencing
- People don’t need a fully-kitted out lab to do sample prep
Philipp Euskirchen
[DE: starting to feel jet lag problems]
[History of brain tumour diagnosis]
Diagnosis
- Tumours can have very different morphologies but be molecularly similar
- Diagnosis is preferred if it leads to a prognosis
- e.g. identification of an H1 mutation; a 1p or 19q codeletion strongly correlated with prognosis
- For subgroups of medulloblastoma, prognosis depends on the subtype
- How can diagnosis be implemented?
- There are four subtypes of medulloblastoma based on their transcriptomic and methylation classification
- A lot of different machines and protocols
- Two paradigms
- Low-pass whole-genome sequencing, producing results within a day
- Deep amplicon sequencing for point mutations and GC-rich regions
Copy-number profiling
- DNA with rapid kit and sequencing
- Used out-of-the-box tools from the supermarket
- Read numbers are not gigantic, but could still generate decent profiles within a day
- Visually explored the EGFR region on chromosome 7
- Saw focal amplication
- Was able to identify split reads at both ends
- Remapped to a synthetic sequence for collapsed reads
- Confirmed by sanger sequencing
Methylation sequencing
- Can use the same reads as used for copy-number profiling
- Compared with Illumina 450k chip
- Wanted to know if this could possibly work for nanopore
- Classifier error changes things; don’t know in advance what CpG sites will be seen
- Compared with Illumina, methylation profile has a nice correlation
Pan-cancer classification
- Classification from a while different method x * Works for IDH mutant and medulloblastoma
- What works in brain tumours should have benefit for other cancers
Real-time read depth monitoring
- 10-20 minutes until a region is 1000X-covered, including a GC-rich region
- Results mostly in line with Illumina data
Breakout 3 Discussion
- Nanopolish doesn’t work so well with the new version of albacore
- ONT wants to work to produce a better base caller, want to avoid needing nanopolish
- Don’t want a base caller that introduces systematic error
- Damien has tried tracking virus with mouse samples, seen different diversity in different tissues
- With direct RNA sequencing, even with 10^6 to 10^7 copies per ml, sill gets swamped out by the host
- What is the optimum depth?
- Read length matters
- With good long reads, 20-30X is good enough
- For clinical use, need to be very sure about accuracy
- There are applications where read-until will be useful
Clive Brown
Novel statements selected by David Eccles
- Thinking of making a GridION version that has PromethION flow cells
- PromethION Will also do a 4-way MUX (max 10,400 wells)
- New chemistry coming which will increase MinION yield by 20-30%
- Accuracy should get to 99.9% [q30] by the end of this year (possibly this summer)
- Updated pore [technology development] has two times read-ahead, and should be able to model up to 30bp homopolymers
- A reasonable user will be able to knit their own clothes using nanopore packaging
- Have been able to get a metal-metal interface working on the Flongle
- A sequencer that doesn’t do base calling – Clive’s pet project for the rest of the year
[evening for Clive brown is after 6pm]
Goal of ONT: Anyone sequencing anything anywhere
- Most products are targeted at this gooal
- Sequencing is becoming more ubiquitous
- When things get out of the lab, people find out more uses
- People have gone out and found real applications
- Can go back and mark Clive’s homewark as a permanent record
Clive explaining nanopore sequencing
- Protein pore at the moment
- Pore will scan the entire sequence length
- Other technologies are typically limited by photo damage
- Sequencing is fast: per-molecule cycle time is milliseconds
- Should have significant implications in the future
- Fragment length is effectively read length
- No reason why full-length chromosomes can’t be done
- Histones shouldn’t hinder sequencing
- What if we could do whole chromosomes?
MinION
- Probably all people at the conference are MinION users Have always
- Felt MiNION would be good enough; it is the democratising feature of ONT
- Sits in no-mans-land because a lot of electronics are in the flow cells
- There are now about 4,500 users, ideally want 10,000-15,000
- Workflow is never linear with the MinION
- Can analyse data in an hour, some people are beginning to do this by not just firing and forgetting it
Lab Work
- Clive is not good at lab work, but has had a go anyway
- Clive has run what ONT has to see how hard it was
- In half an hour got 140µg DNA using a Qiagen kit
- Could load flow cells (got one bubble once, but easily fixed by other ONT staff)
- First attempt used 30 flow cells
- Subsequent run on GridION, 5 flow cells at 14-15 gigabases
- Some users have got 16 gigabases
- Some have trouble with getting anything
- Clive used reagents that we have now, demonstrating that the GridION can be used now
- Another run used newer chemistry (not available yet in the community), and achieved 24 gigabases per flow cell
- Now has over 200 gigabases, people can do with that whatever is ethically reasonable
- Probably some protocol issues; some samples don’t yield much DNA
- People are now using VolTRAX to give consistent yield; flow cell is now tip top
GridION
- Started as a modular rack-mounted system with reagents, dynamically forming membranes and sequencing
- Abandoned this approach, but left it parked on the website
- New GridION is modular
- Thinking of making a version that has PromethION flow cells
- Will have Run Until and Read Until
- People are licensed to use the GridION for commercial use
- Can alternatively use five MinIONs, with laptops, software, etc.
- Process is all automated on the GridION
- Box can do real-time online basecalling
FPGAs
- Internal effort to map data processing into FPGAs
- Can call 1 million bases per second at the moment
- Not as much success with GPU / CPU
- Gives very fast feedback, even with a high number of pores
Funding model
- ONT doesn’t like the idea of how people pay for things
- Preferred model is consumable only; almost Pay-As-You-Go, with contracted minimum
- This has been the most popular model
Manufacturing
- Box is very highly manufacturable
- After June, should be able to make 3 per day
- Shipping in about 48 hours
- Flow cells have about a 48 hour delivery
- First GridION out on the 15th May, might be the same GridION used for the Cliveome, but maybe not
PromethION
- Designed 3 years ago, everything is completely new
- Was designed to take on Illumina
- In principle, will generate more data than a NovaSeq
- Head room is so much that even if NovaSeq is brought forward, will still be competitive
- Have 12-14 boxes out now, produces a ton of data
- Bulk purchase cost of flow cells is about $600
- Just sit tight
- Can de-batch the entire workflow
- e.g. 50 samples on a Wednesday, one sample the next day; more real time and less batched
- PromethION is licensed as fee-for-service
- Flow cell has a different layout
- A few issues remain, but nothing that hasn’t been seen before with the MinION
- Even with issues, getting 22 Gb in 22 hours [DE: I recall there was a ‘23’ somewhere]
- Active unblock is not yet fixed on the PromethION
- Flow cell run time has been increased to 4 days
- Will also do a 4-way MUX (max 10,400 wells)
- Even if just using similar output to a MinION, is still better than NovaSeq
PromethION Processing
- Increased speed of sequencing
- Internally have a run at 1,000 bases per second
- Couldn’t fit processing into the original design
- Solution is to make as separate computer
- Will eventually shrink by about 1/3
- Composed mostly of FPGAs, will allow complete base call in real time
- Lower module will be a network switch
- Inside is a server with a skin
- FPGAs with neural networks
- Loads of them out and configured
- Steady incremental improvement
PromethION evolution
- Firmware tweaks are increasing processing to 4.8 Tb
- Ultimately will retransfer computing, end up with a single server
- People should sit tight for this one; this is the box that will turn the supertanker market around
MinION yields
- Customers are getting pretty good yields
- Should be able to get 20 gigabases per day, but more realistically 6 or 7
- Someone got 16 gigabases [one run], Clive wishes they would tweet that run
- 16 gigabases is a realistic expectation
- New chemistry coming which will increase that yield by 20-30%
MinKNOW
- Improving rapidly
- Should be able to wash out flow cell sample, fix sample prep, and move on
- Want to get rid of the priming step
- Just load the flow cell without priming
- This flow cell should be ready [DE: i.e. almost shippable]
- Working on recommendation list
- Radically simplifying the software
- A team is working on redesigning the user experience
- People should just know how to run MinKNOW
- After a time, all base calling improvements will be in MinKNOW
- Process should be very straightforward
VolTRAX
- Moving along pretty well
- After a time all protocols will be moded onto VolTRAX
- By the end of the year, should be using lyophilised reagents
- Multi-sample flow cells
- Quantification + possibly amplification
- Available at the end of the year (VolTRAX mk 1)
Raw signal
- Another slide Clive hates
- Clive doesn’t like the evnets
- Breaks down at fast speed
- Event data bloats the files
Algorithm updates
- 1D will improve
- 2D idea: template + hairpin + complement
- need to replace with something new
- 1D²; make the second strand hang around for longer so second strand follows through
- 96% modal accuracy, limited by how data is combined
- Should get to 1% error
- 1D² will be implemented next week
R9.5
- New pore
- Backwardly compatible
- Coming 8th May
- All ready to rock
- 2D can now go into the dustbin of history
- Nanopore killed 2D today (please tweet)
- Very little is different between R9.4 and R9.5
- What is different should be fixable in software
Homopolymers
- If taken out, now getting pretty good numbers
- Can do quite well, pretty much all homopolymer calls are out by 1
- Homopolymers will go out as systematic errors
Basecalling
- Still an enormous amount of headroom
- End up with detritus of error
- Should get to 99.9% by the end of this year (possibly this summer)
- Still lots of unmodelled signals
- e.g. there is a fungus that puts sugars on DNA
- People here have shown that things are visible in the signal
New pores
- Also working on new pores; patent has just been published on this
- In the future, will now never replace pores, just migrate pores
- Updated pore has two times read-ahead
- should be able to span 30bp homopolymers
- may have new pore by the end of summer
Raw Calling
- A lot of work done on this
- Captured on FPGA
- Pretty close now to comparable performance to what is done now
- Making available raw data base caller now
- Any useful software will migrate to MinKNOW
- Want to just output FASTQ
Removing cold-chain system
- Can now store and ship new packaging in wool
- Creating jobs for sheep
- A reasonable user will be able to knit their own clothes using nanopore packaging
- A lot of work being done to make a lyophilised reagent system
- Performance is comparable to conventional wet reagents
- 1D² version will be available shortly
Pipettes
- That leaves pipettes; pipettes have to go
- Need to encapsulate prep into a dry / warm container
- Reagents put into prototype tube
- There will come a point when you can put a swab in, and 15-20mins later start looking at reads
- Not quite there to show with date or product yet
Increasing sensitivity
- Can currently do well down to about 1ng
- can go even lower (with reduced yield)
Additional device with new name needed
- Dongle with FPGA and firmware
- Can basecall MinION sequences
- Encapsulated supercomputer
- Computer can be cheap and small
- Device will eventually be able to work on its own (with connected power supply)
- Should be out by the end of summer
- FPGA currently Intel Arria 10, runs at 10Mb per second
- Next version will be 7Mb per second
- Theme is portability: want to make things as easy as possible
Flongle
- Electronics moved onto dongle
- One-off buy, can be thought of as a mkII MinION
- Cheaper flow cell snaps into adapter
- Fewer channels, higher volume [DE: presumably production volume], cheaper price
- The substrate device… the MinION they should have made
- Very significant for ONT
- Should be out by the end of the year
- Previously had gel-gel interface
- Can now get metal/metal interface
- have prototype chips and membranes at the other end of the scale
- was actually a prototype used for the SmidgION
SmidgION
- Won’t be around this year, but it will come
- Chris has been able to run a MinION on a mobile phone
Hypothesis-driven sequencing
- Looking to develop this
- There are a lot of things that can be done with nanopores
- Blob counting: sequencing of site-specific things
- Let sequences whizz through the pore
- Blob does the counting
- Very rapid yes/no sequencing
- Getting to very rapid MinION sequencing
Solid-state nanopores
- Will be solid-state Flongle; will definitely [DE: also] be a blob counter
Cas9
- Can be programmed: a programmable blob
- Target certain fragments
- Use chemistry to select out molecules for sequencing
- Can use the same trick for detection as well
- Have pretty good working implementation of Cas9 target selection
- Multiplexing
- Very cost-effective for small region for thousands of samples
- “Cas Me If You Can”
- Custom Cas9 provided by ONT
- Also planning to make pre-made versions
Dates
- May
- MinKNOW 1D² 9.5
- 8th: new flow cell
- June 17th (Clive Brown’s birthday): raw basecaller
- June 30th: new cDNA kits
- July: new ambient shipping methods
- September: Cas Me If You Can
Epitome
- Make standard Bioinformatics workflows
- Platform is now pretty mature
Is Basecalling Necessary?
- Do you need to basecall and align?
- Have been teaching neural networks to look at raw signal to decide what species it is
- Want to go from squiggle
- There are cases that don’t require base calling
- A sequencer that doesn’t do base calling – Clive’s pet project for the rest of the year
Questions
- Gordon’s ambition is to put the Flongle in as a diagnostic device
- Consensus reading errors: just a matter of knocking downm the obvious problems
- If ONT runs into non-software problems, just need to change the chemistry
Day 2 (May fhe Fifth)
Gordon Sanghera
Throughput
- Since shifting to the 9 series, there has been an exponential move to throughput [DE: it was exponential prior to that, but no one cared because it was exponentially low]
- A lot of issues will be application-specific
- 20 gigabases is the output of a 2-day flow cell, but ONT wants to move to 3 days
- Hope to get all customers to 15-20 gigabases
- No reason why the flow cells can’t reach their theoretical maximum
- Sensor chip is designed to handle 1000 bases per second
- Might be able to adjust sequencing speed in the future for application-specific uses
- Is there an underlying single molecule optical[?] limit to accuracy?
- Accuracy doesn’t seem to have a limit with nanopore
- 100% Accuracy is the target
- No fundamental reason why this is not achievable
- R9.5 is squeezing the last 10% out of the flow cells
Marketing
- The challenge is getting the technology to market
- In 2017, we might be ready for prime time
- We are answering biological questions that other platforms cannot solve
- Innovating quickly: it is in our DNA
- ONT cannot rest on their laurels
- We are disruptive innovators
Flongle
- This will be big
- Thinking about crossing the chasm into diagnostics [repeated multiple times]
- If you could, you would probably want to sequence the whole of HIV
- Flongle allows ONT to be applied
- Can be HIV or HepC
- Can delete what is not needed
- Will challenge PCR diagnostics, a tech that has been around for 25 years
- Next year, flongle will be available as a registered diagnostic device
Disruption
- The challenge is to disrupt the diagnostic market [etc….]
- DNA information will disrupt everybody
- [video about student education; students having the opportunity to use the MinION]
- Children will say, “You didn’t know that beef was actually Kobi beef, you just ate it?”
Nick Loman
- Setting expectation levels corretly for this talk: The answer is NO! [DE: referring to a statement about the suitability of the MinION in the title]
Dr Seuss
- People with young kids use kids books for inspiration
- The MinION has sequenced almost everywhere
- On a boat
- With a goat
- On a plane
- On a train
- Down some holes
- At the poles
- In outer space
- At quite a pace
- One notable place where it hasn’t sequenced; more aboout that later
2015
- Nick’s last talk at London Calling was in May 2015
- Very excited, because they had done successful sequencing in New Guinea, with results back on a time scale of days
- One genome was sequenced per flow cell
- Generated sequence quickly enough to feed back to WTO
- By May, had sequenced 50% of Ebola cases in Guinea
- Really important findings came out of that ebola work
- Data needs to be shared as quickly and as ethically as is possible
- Could measure cross-border transmissions
- When a new case popped up, could identify if it was linked to other existing cases
Ebola progression
- It was originally thought that Ebola was over by September / October, but a number of additional flare-ups happened
- Used MinION to identify one case
- The seminal fluid of a survivor was infected
- The virus was “frozen in time”
- [Video / animation: 1600 Ebola Virus transmissions, representing 5% of all outbreaks, 30,000 cases, 10,000 deaths]
- 10.1038/nature22040
- Many opportunities to stop Ebola from moving
- MinION/NickSeq wasn’t deployed until April 2015
- Is there a case for getting sequencing done early?
Zika Project
- Introduced last year
- Unbelievable arrogance in the grant application: claimed that they would sequence 750 genomes
- For epidemiology, see Oli Pybus’ talk
Concentration issues
- Zika was almost impossible to sequence using normal procedures
- Doesn’t work with metagenomic techniques
- cT values for PCR of 34 to 36
- Pathogen enrichment challenges
- At the moment, 100ng of DNA is required for MinION sequencing
- Can be done by PCR, whole-genome amplification, or bait probes
- PCR-based approach represents a pragmatic, or easy way
PCR Primer design
- Josh spent a couple of months working on sequencing
- Developed “Primal Scheme”: an application to generate a multiplex panel for siling sequencing
- Can now sequence Zika, with reasonably complete genomes up to a cT of 36
- Some amplicons drop out more than others
- Now have everything in place to do this outbreak stuff in real time
Yellow Fever
- Currently an outbreak in Brazil
- Process is much quicker this time
- Biggest hold-up was taking about a week for the university to process the primer order (actually not that bad)
- Josh and Nick didn’t have to go this time
- Can now enable colleagues to do this
- Could do six genomes per flow cell, 1-4 million reads
- Yellow fever has a low cT, so easier to sequence
- Now have easy-to-install workloads
Health Service Publicity
- Check out Nick above the escalators at Euston
Working with high-yield MinION flow cells
- Mix rapid kits together, pull out single contigs
- Yield improvements make this stuff very feasible
- Someone should sequence inside whales
- There is a mapping from whale weight to read length
- Still haven’t got a megabase read
- A flow cell is running outside, Nick has been checking during the conference
- Next improvement to be trialled: extracting E. coli from an agarose plug
- Have got read N50 up to 150kb
- Can get an entire E. coli genome in seven reads
- longest read so far is 886kb
- Could we trivially finish a genome?
- With a 30X dataset, could probably get human contig N50 to 50Mb
Wish list
- All of Nick’s wish list from 2015 is done, except for low-input
NHS – where sequencing is not done
- Sort it out and bring it to the UK
- Money is an issue
- Clinical microbiologists have been reluctant
- Physicians seem quite into it
- A lack of will; people just need to go and do in
Questions / Answers
- Nick hasn’t looked at why people are resistant
- Could do pathogen discovery and RNA at the same time
Lightning Talk – Niranjan Nagarajan
Human Gut microbiome
- Large ongoing project
- Seven questions
- Host/microbial factorss
- Evolution of resistome
- Novel plasmids
- Strain dynamics
- Begin with a pilot project, using long reads for metagenomic assembly
- Then use real clinical samples
- The aim is to maintain the diversity of the community
- Many reads are longer than 5kb
- Carried out an analysis on a single patient
- Significant enrichment of K. pneumoniae with different abundances
- N50s of hundreds of kilobases
- Were able to identify extended-spectrum lactamase gene
- Observe plasmid dynamics and rearrangements in plasmids
Lightning Talk – Michael Boemo
DNA base analogues
- Use analogues to look at DNA replication
- Pulse of thymidine analogues, then can work out where the origins of replication are in the genome
- How well does it work?
- Analogues make quite a difference in the signal
- Working pretty well in synthetic substrates
- Don’t care about the exact base, just interested in the region
Lightning Talk – Celine Bigot
Free Pathogen Identification
- NGS enables the detection of microbial species without a-priori models
- MiSeq, MinION, PGM were all tested
- Working on own reference material, plus the Zymo community standard, plus other types of samples
- CNRGH standard has 10 bacteria, one fungal, and one other
- Standard is still in progress
- Interesting preliminary analysis
- Has done a WIMP analysis
- Based on processing time, the MinION and MiSeq are preferred to the PGM
Lightning Talk – Scott Gigante
Basecalling (Nanonet)
- Custom neural net basecaller; replaced with black box in the slides
- Wants to do away with Hidden Markov Models
- Used two E. coli libraries, one treated with methyltransferase, one unmethylated by PCR
- Used EventAlign for associating the basecall with events
- Training neural networks is not easy
- Feedback used with training, issues are often only realised weeks later
- Using a longer kmer results in a faster training loss, but lower error with increasing kmer size
- It is hard to judge when the training is complete
- GLM calls 99% of sites correctly
- Nanonet provides an excellent opportunity for training
Lightning Talk – Ben Matern
Identification of HLA Variants by cDNA sequencing
- Genes are highly polymorphic
- Interesting to look at the DNA and RNA
- RNA has alternative splicing
- Used GMap, which can insert introns (splicing-aware mapper)
- Has made a software platform, an alternative to laboratory techniques
- Aligned boundaries mapped to the reference genome
- Can take expression profiles and cluster into expression patterns
Lightning Talk – Benjamin Istace
Banana Genome
- One of the most sold / produced fruit around the workd, 100 million tonnes
- Most edible species of banana are from two hybrids
- Sequenced and assembled two genomes, looking at one now
- Long fragments were selected with the BluePippin
- Ran four R9.4 runs, generating 22 gigabases of reads
- Peak [modal] read length was between 25-30kb
- Read correction with Canu, processed into SmartDenovo
- Produced a genome with 704 contigs, N50 of 1.85 Mb
Lightning Talk – Matthew McCabe
Sequencing of BRD viruses
- Using the rapid sequencing kit
- Over 20 viruses that are associated with disease
- Diagnosis at the moment is mostly by PCR
- Looking for a rapid, untargeted sequencing approach
- Can untargeted MinION do this?
- 4 cultures, each infected, pooled, and nuclease added to remove host DNA
- Non-exised nucleic acid goes into the RAD kit
- Got 7,000 reads in the pass folder
- BLAST top hit took 9 hours
- Local viral sequence took about 10s
- 99.6% of reads were correctly identified
Lightning Talk – Libby Snell (ONT)
Direct RNA Sequencing
- Kit is available now from the ONT store
- Sequencing of RNA molecule directly; it’s possible for things that maybe couldn’t have been done previously
- Not possible using any other technology
- Up and coming… ONT wants to get rid of cDNA; it’s only there while they’re stabilising the technology
- Ideally want to move to a 30-minute rapid preparation
- Ligate, wash, and load
- Potentially allows for a lower input amount
- Tested with quantitative samples: Ambion ERCC, Lexogen spike-in (SIRVs)
- Single-stranded prep works, and a 2D prep works (with RNA stabilised by cDNA)
- Getting full-length transcript coverage
- Currently output is about 70% of what it should be, probably due to blocking
Direct RNA Breakout – John Tyson
[DE: I missed this; I was distracted talking to David Stoddart about chimeric reads]
Direct RNA Breakout – Rachael Workman
Using C. elegans for direct RNA sequencing
- comparing direct RNA to cDNA
- looking at basic differences, what to expect
- Used strand switching and ligation sequencing
- No need for PCR when working with C. elegans, can get 30µg cDNA easily
- Called datasets with both older and newer basecallers
Yield
- Direct RNA had less yield (as expected)
- Direct RNA had shorter reads
- RNA transcripts were perfectly fine for C. elegans
- lower MAQ alignments with RNA
- Once MAQ filtered, match percentage was similar between RNA and cDNA
Comparing to Wormbase transcripts
- cDNA found most transcripts
- 1140 transcripts were detected in RNA, but not in cDNA
- Filtering out by base length didn’t throw out much
Full-length transcripts
- More degredation in the RNA run
- Looking at RNA alignments, get degredation in the UTR
- Might be aligner clipping
- Other end, more exon dropout, need to look at the soft clips
Abundance comparison
- RNA tracked cDNA fairly well (R=0.76)
- Would be interesting to look at an Illumina run for comparison
Homopolymer calling
- cDNA and RNA were pretty identical
- Initially compared without homopolymer correction
- Also compared with transducer basecalling
- [DE: With the transducer, the amount of C homopolymer bases exactly matched the amount of G bases]
Take-home messages
- Library preps were robust for both cDNA and RNA
- RNA will get better
- You could stop doing Illumina, switch to cDNA and be fine
Direct RNA Breakout – Chris Vollmers
Single-cell RNASeq
- Using nanopore technology to improve
- Human cell atlas is now picking up pace
- Looking at the types of every human cell
- Illumina is good at defining gene expression of single cells
- Most single cell library preps involve full-length cDNA as an intermediate step
B-cells
- Lab’s primary interest is in B-cells
- Would love to do direct RNA seq, but cells contain about 10 femtograms of RNA
- A couple of orders of magnitude less than what is needed
- Was waiting [a long time] for Illumina run results to get back
- Why not put cDNA into a MinION flow cell?
- Designed indexes by randomly generating 60bp random sequences
- Trying to quantify gene expression, ended up breaking tools
- Looked at the overlap between Illumina and Nanopore expression
- Got fairly good correlation, r = 0.9 approx.
- Nanopore doesn’t seem to have a length bias
- Cutting is needed for short transcripts, but can’t be done
- [Possible that the inability to cut small RNA molecules means they are under-represented in Illumina reads]
- Used SIRV / synthetic spike-in molecules as a control
- Made 10fg of DNA, trivially easy and fairly quantitative
- Wanted to get by without reference genome annotation
Pipeline – MandalorION
- Only wrote this because it wasn’t available elsewhere
- Picked up stuff [transcript isoforms] that wasn’t in the reference sequence
- Sort reads based on the distribution, create [model] isoforms
- Get approximately linear correlation (r=0.97) when comparing reference-free versus reference-based mapping
- CD20 gene – found an exon that wasn’t in the reference assembly
- Looked at assembly by Trinity, found assemblies that were just really bad
- Working on systematic way for naming isoforms [odd that this doesn’t exist already]
Direct RNA Breakout – Andrew Smith
Direct sequencing of 16s rRNA
- Very abundant in cells
- Getting to 16s RNA levels from gDNA would need 13-14 rounds of PCR
- If we could get a hold of 16s, woud be able to do a lot of good things
- Needed to customise the ONT RNA kit a bit
- Kit adapter was modified to be complementary to the 3’ end of 16s
- This is a very well-conserved gene at the 3’ end
Proof of principle
- Working with highly-purified 16s
- 1.5kb rRNA gene
- Proof of concept carried out first on E. coli
- For methanococcus sequencing, needed to tweak the adapter a little bit
- With increasing read length, it becomes very much easier to classify things
- Reads that were over 1000 bases had 98% classification accuracy
Coverage hump
- Hump at a particular region
- Very consistent base miscall
- Compared MRE600 E. coli with RsmG-delta
- Base call was different
- Nanoraw showed a modified ribonucleotide in the 16s gene
- This is a mechanism by which some microbes can defend against other microbes
- Wanted to show that this could be detected at other regions
- Pronounced low-amplitude signal from the modified strain
- Also found pseudouridine modification
- Very exciting to be able to detect this
- Ion current change affects current at adjacent regions
- Probably due to 5-6bp nucleotide window from which the current is sampled
Diagnostic tool
- Titration with lower and lower amounts of 16s
- Could still see reads even at 5pg
- Depending on the amount added, could find result in 20s
- Basically the earliest opportunity for a read sequence to appear was when the first 16s reads came through
Direct RNA Breakout – General Discussion and Questions
- Should we stop calling RNASeq [the stuff done on other systems] RNASeq?
- We are fighting an uphill battle
- The community will eventually come to a consensus sequence
- Neither GMap nor BLAT use a [transcript] reference; they don’t rely on genome annotations
- Just doing cDNA is losing a lot of information
- Read-until could get rid of ribosomal sequences
- Direct RNA with a polyA primer is the only thing that can get the correct length of the polyA sequence
- No need to do VT23, can just use T23
- Can tell mispriming events when they happen
- [DE: Why not use a polyT primer with a double-stranded component?]
- SIRVs polyA tail is [always] 30bp
- mRNA only makes up 15% of total RNA sequences
- There is lots of rRNA
- Also a whole number of small RNA molecules
Plenary Panel – Raymond Hulzink, KeyGene
Background
- Wet lab biologist
- Started in 2014 with the MinION
- In 2016, started doing serious MinION sequencing
- In 2017, started sequencing the melon genome, and some BAC clones
- Recently obtained the PromethION, have done a lambda trial
- Will use VolTRAX for plant genomic DNA
Plant / crop genomes
- Large, usually polyploid
- Long reads should resolve the genome complexity
- As read length is increased, contig length increases
- Plant cells are challenging
- Very rigid cell wall, a bit in contradiction to what needs to be done
- Have lots of metabolites, can damage DNA or reduce the efficiency of sequencing
- Have chloroplasts and mitochondria
- DNA isolation has no generic protocol; every plant is different
- Understanding DNA quality is important
- Can degrade over time
- Comparison of fresh sample vs. ultra-pure sample, no difference in high-quality reads regardless of the isolation method or storage
- QC changes when looking at reads longer than 50kb
Melon sequencing
- Decided to use DNA that was less than one month old
- Other crops may need purification
- Compared 75% of the shortest reads to 75% of the longest reads
- Longer reads have contigs over twice the length of the shortest reads
Generic workflow
- Isolation of nuclei
- Want to hunt for ultra-long reads
- Use agarose-embedded nuclei
- Needs a lot of work in the lab to get ultra-pure DNA
- If the fragment length is greater than 25kb, typically end up with read lengths of greater than 15kb
- New albacore improves the read quality
- Expect to generate platinum assemblies from plants
Plenary Panel – Ivo Gut
- This sequencing stuff is a bit of a clown act; Ivo prefers to play guitar
- Doing de-novo sequencing on the hummingbird and houbara bustard
ONT History
- ONT has actually been around since 2008, and collaborated with Ivo Gut at that time
- Ivo got a ball of dirt as a prize
- Ivo helped in the development of structures to do de-novo assemblies
- Fosmid-pooled sequencing for a complex genome
- Used to take a whole lot of complicated preps and bring it all together
- Required some complicated manoevering and sequencing
- Has gone on to throw every technology at it
- As time when on, Ivo got more and more confident about nanopore
- At some time, just took DNA and put it into nanopores
Houbara bustard
- Endangered species: lives in the arid desert and is flightless
- Hunted using other birds (which prefer airplane travel)
Hummingbird
- Part of the genome 10k project, which turned into the vertebrate genome project
- Has been sequenced using everything except nanopores
- Got a lot of help from ONT for sequencing this bird
- Generated about 20X coverage
- Taking only Nanopore reads, got an N50 of 2.7Mb
- Compared to PacBio, which was 5.37Mb N50 (but about 3 times more data)
- Used a Canu assembly combined with Pilon
Houbara bustard
- Did phenol chloroform extraction
- 10X coverage by nanopore, together with Illumina sequencing
- Got scaffold N50 of 5Mb
- Last night, an additional 3 flow cells were mapped on top
- N50 of 8Mb, with 181 contigs of over 1Mb
The hummingbird now
- Re-exttracted DNA, got tissue shipped to Simon Mayes
- As a pilot, Simon got some duck tissue for sequencing
- Very optimistic that it can be done with long reads
- With the MinION, there’s no more difference between scaffolds and contigs
Plenary Panel – Christiaan Henkel
Genomes that have not been previously sequenced
- Many amphibians and plants with extremely large genomes
- True long-read sequencing
- If 1Mb is the length of the shard, then the genome is the length of the sun and back
- Get unique bits in a sea of repetitive content
- Lightweight assembly
- Assume perfect long reads
- Just compare the ends, then assemble the genome
- Doesn’t work for repetitive regions at the end
- So… find unique regions
Sequencing the eel
- Want to know why they are critically endangered
- Did an assembly: 860Mbp assembly, 2366 scaffolds with 1.2Mbp N50
- Corrected with Pilon
- Tried to compare against an older Illumina-based draft
- Lots of tiny scaffolds unmapped
- Some discontinuities, nanopore data almost always compares favourably
Tulip assembly with Tulip
- Library assembly method called Tulip
- Tulip production is easy, but tulip breeding is not
- Reading a single chromosome would take a month on nanopore [serially-sequenced]
- It is hoped that the PromethION will be good enough
- Pilot with one MinION flow cell for sequencing
- Surprisingly, the tulip genome is not very repetitive at all
- Assembly should work well with the lightweight method
Plenary Panel Questions
- Pilon polishing is a stop-gap measure
- Using other technologies (e.g. 10X, dovetail)… looking at that, but Ivo prefers using as few technologies as possible
- Data analysis – No best practise for de-novo assembly
- A few very smart people out there looking at different things
Kazuharu Arakawa
Starting with the very basics
- needed to do sequencing, so was investigating Nanopore
- Looked at bacterial draft genome
- Used MinION reads to finish genomes
- “how low can you go?”
- In under 1hr, was able to complete a full genome with one 4.4Mb circular chromosome and one 44kb plasmid
Spider silk
- 4 times stronger than steel
- in nature, about 10µm in diameter
- More elastic than nylon
- it’s weird to have both high strength and high elasticity
- has the highest “toughness” [combination of strength and elasticity] among all fibres
- Made from proteins, so it’s re-usable
- If spinning a web with 1cm-diameter silk, it would be possible to catch a plane
- If that same web were made of steel, the plane would crash and break into pieces
- If that same web were made of rubber, the plane would break through
- Did an experiment comparing spider silk to carbon fibre and polyester
- Spider silk was able to handle more weight than both of them
- Carbon fibre broke fairly quickly
- Polyester stretched too much and broke
Spiber
- Company set up by former institute members
- Taking spider genes and putting them into bacteria
- Creating clothes like the Moon Parker
- Can be made into sheets, gels, and other non-thread things
Silk diversity
- Spiders make a whole lot of different spider silks
- All silk genes are monophylogenetic (from the same original gene)
- With enough data about the genetics and properties, can work out how amino acid changes affect the properties
- Will sequence 1,000 spiders (for funding purposes)
- Has already done a lot of those [? over 900]
- Spider genomes are quite long
- Don’t really know what components are in the genes
- One gene has periodic polyalanines
- Can only find 10 spider silk genes in GenBank
MinION categorisation of genes
- Can’t do PCR because of repeats
- Next time, got a better yield
- Did whole-genome sequencing at 2-3X coverage, got N50 of about 10kb
- The challenge is to find entire components of spider silk
- Protein fractionation yielded 3 bands that were very specific to drag-line silk
- Did gland-specific transcriptome
- Looked at top 50 most expressed genes
- Novel genes were found that corresponded to the gland in the spider
- But first, they needed to clone the genes
- Tried cloning, sorted out and cloned two genes
- The other two genes were actually different components of the same gene
- sequenced 2.4kb
- use MiNION to finish it
- no conserved N-terminal sequence
Questions
- Only saw repeat elements within spider silk genes
Zoe
- ONT can probably fit another 200 seats in the venue for next year
Gordon Sanghera
- Get back to your labs and get innovating