GridION X5 - The Sequel
A presentation by Clive Brown, CTO Oxford Nanopore Technologies; notes by David Eccles, 2017-Mar-15. See the presentation here.
Preface
- Some people watching this in Australia 3am [and David Eccles in New Zealand at 4am]
- This is Clive’s once-yearly update, given shortly before the main conference in May. A few things will be saved for the meeting in London
- Company mission: enable anybody to sequence anything anywhere
- Publications / elsewhere demonstrate that people are already attempting to use the MinION technology in lots of places
- The technology is designed to be used anywhere, with very simple workflows.
- Main device (MinION) is portable, with minimal capital cost
- System is effectively real-time, or very close to real time
- System is demand driven – can be used, put down and taken up again many times
- Read lengths are intrinsically long, limited only by sample prep
- Accuracy is pretty good, and improving
- Sequencer is good at cDNA; this aspect has probably been undersold
- This talk doesn’t cover everything available. Clive Brown has given a few other previous previous talks
- The last google hangout at end of March [2016] is a good place to start
How nanopore sequencing works
- proten pore embedded in membrane
- array of membranes embedded in electrical sensor
- speed at which sensors measures things is orders of magnitude faster than CCDs
- after a second, have 1000b read, entirely available for analysis as the run proceeds
- can pipeline bioinformatics pipelines based on sequencing
- can make devices work together to a shared goal; methods continue
- a piece of DNA will give you a reproducable signal that can be decoded into bases
- currently running R9.4 on mem 10, motor E8 (phenomenally processive helicase)
decoding signals
- originally done by HMM
- explosion in past 2 years around neural networks
- now good experience decoding signals using NNs
- many methods in signal processing area that could be used
- current methods can learn local context over a window
MinION device
- At least ~4,000 MinIONs out now
- Aluminium device
- Inject liquid sample into system, pores devour sample in real time
- 512 channels running at once, can get at 450b/s/channel v. high throughput
- Mark Ib – no significant improvements planned in MinION
Technology workflow
- Clive doesn’t like slide: implies linear workflow
- Need DNA
- Variety of kits; simplest/snappiest uses transposome complex. On long DNA, can add in adapters; 5-10 mins
- Lengthiest preps (ligation) take about an hour; working to smooth out variation in sample prep
- Can take sample out, can flush sample out, can put more in, can muck with sample while it’s in the system
- While running, run NN basecaller
- Calling can be done on the fly, or post-run
- Programmable feedback loop; feature to come back with a vengeance in future
PromethION; elephant in the room
- Cake-tin sized benchtop sequencer
- Different ASIC, a few thousand channels, in aggregate 144,000 channels
- Laid out to be pipette friendly
- Can put sample in flow cell offline and while running
- Bottom is compute module, can write data out at full-pelt to external storage
- Box was designed when running at 30b/s, probably can’t handle a full run for real-time basecalling
- Running at full-pelt can produce a very large 233Gb per flow cell in 48h, 11Tb assuming 100% bandwidth utilisation, 3x a NovaSeq; right up there in that category of high-througphut sequencers
- No chance of falling behind; not going out of date before the box is delivered
On-demand sequencing on PromethION
- All kinds of workflow tricks to optimise pipelines
- No need to wait for sample, can run 3 samples, 1 sample, 48, or multiplex
- Can deal with lumpy demand
- Turn-around time basically limited by postage
- Can be shipping data back while running
- If not enough slots, can just buy more sequencers
- When fully deployed, will provide significant competitive advantage
PromethION Flow cell performance
- A few bad channels, but mostly green
- Numbers are high, yield numbers high, above the threshold for shipping
PromethION scaling
- Not novices for scaling; Gordon worked on Glucose blood strips
- Working to produce more flow cells
PromethION performance and yield
- Key problem to do with flow cell blocking
- Promethion typically 10Gb in 6 hours, aim >50Gb per flow cell
- Firmware updates for higher
- Can probably run for up to 4 days
- Software mature, run in house all the time, evolution of MinION and GridION software
- Control in a similar way to MinION, in paralle
Instrument shipping
- 1-2 per week, a bit slower than expected
- expect all backorder done by Q3 (original prediction was Q3 2016)
- Putting in software that lets ONT do remote firmware updates
- Ability to swap out hardware/software very quickly
PromethION Flow cells
- First shipping 3rd April to 12 sites
- a little bit of hand-holding, will ramp up rate of flow cell shipping after that
- Haven’t had a single dropout from waiting list
- target headroom performance is so high that it will not go out of date
PromethION design change
- Compute module will not be able to keep up with 1,000 bases per second
- Need a bigger box, getting too tall
- Decided to move all of the computing into a separate box
- Compute module becomes a switch that lets us stream data to a compute room
- People who want to run multiple promethIONs can cable up and have processing elsewhere
- Most people put PromethIONs on UPSs anyway, might as well put a compute module there as well
- Can add in up to 80 TFLOPS of computing in compute module; can handle 1,000 b/s on a fully-running sequencer
- Will map consensus callers and assemblers in box
- Can use idle computing for local bioinformatics; e.g. containerised workflows
- Network requirements much simpler for a single compute box
- To be rolled out; upgrades included in the cost of purchase
What’s inside the compute box
- Cramming in FPGAs on PCIe cards, easily 60Mb/s, 80 TFLOPs
- Allows to fully keep up with real-time basecalling on PromethION
- Can also implement own proper version of things like read-until
PromethION evolution
- Currently 24 flow cells enabled, can dump data to external disk
- Q4 add compute module to enable all 48 flow cells with local basecalling
- 2018 additional evolution, rolled into current purchasing
Base call acceleration
- NNs are fascinating things
- Engineers have been able to produce versions that run pretty efficiently on FPGAs
- Tend to get higher uplift than CPU/GPU; power cost is lower
- Possible to use stripped-down, mini FPGAs for smaller devices
Base call acceleration design
- Typically 1.8 events per base, very high requirements
- MinION at full output needs 200 GFLOPs
- writing OpenCL versions of base callers, unlikely to be as optimal as VHDL
Base call benchmarking
- Working closely with Intel
- MinION about 240kb/s
- Promethion about 65,000 kb/s
- Can only utilise around 10% of available CPU
- on GPUs, seem to only be able to use about 2% of GPUs; not a good performance payoff
- on i7-type CPUs, can process 200,000 bases/s, still only using ~10%
- on Intel Arria / Stratix, getting to 1M-7.5M called bases per second, using 60% of available processing power
- pretty confident that this is the way to go
PromethION base call implementation
- current generation 9Mb/s, almost a Tb per day currently
- second generation of cards up to 4Tb of real-time calling per day
- looked at accelerator for MinION… but you can do that
- coding in the background a dongle, either separate or intercalated between MinION and computer that will do local basecalling, and stream out reads to the computer
MinION compute requirements
- Cloud base calling currently - Clive’s fault that ONT did that
- If you provide a safety net, people start to use it as a hammock
- Cloud base calling will be discontinued at 21st March
- MinKNOW now has integrated basecaller which will do it for you
- A good high-performance laptop will be fine
- Can just leave computer running, will keep basecalling after finish of run
- Provide binary base caller
- Writing to shared drive, another computer can do base calling
- Most people getting 3-10Gb
- Theoretical maximum 200kb/s, best internal about 100kb/s
- laptop CPU can do about 40kb/s base calling
- MinKNOW can deal with this
Accuracy / chemistry / algorithm
- On 1D, R9.4 base calling modal accuracy of just over 90%, maybe a bit higher
- old 2D system at 250b/s
Consensus accuracy
- Basic message: accuracy improving, data amenable to polishing
- Most errors now falling in / adjacent to homopolymers
- Will think about releasing optimised consensus callers in a reasonable timeframe
Homopolymers
- ONT’s unfinished business is held up by detractors; fixed by ONT, then detractors move onto the next unfinished business
- Scrappie package, learnings are migrating into MinKNOW base calling
Novel base calling
- Working from raw data (more about that later)
Homopolymer
- Recently held up by competitor as a systematic flaw; just another obstacle to overcome (e.g. black knight in Monty Python)
- Scrappie doing fairly well, consensus calling of homopolymers can be done using scrappy output
- not as mature as other callers, but methods will only improve
Base calling from raw signal
- Clive hates event calling, has wanted to get rid of it
- Current base callers just take raw signal, output base calls
- Neural networks trained to optimally extract features from raw data, architecturally / conceptually better
- Accuracy improves, scales to faster sequencing speeds
- Can go straight from raw to FASTQ / SAM, recover about 80% of disk space
- left with compressible integers in FAST5
- Developer versions released Easter
- base calling should just get better and better
Base caller landscape
- Clive likes open development
- Albacore is the production basecaller; can be run offline; fully-supported
- Nanonet is a research base caller, available under open source, not supported
- Scrappie is the New Kid on Block, limited support, available to everyone shortly
- Standard workflow is MinKNOW + onward analysis
- Can intercalate other basecallers; preferred by power users
Other accessory tools
- Lamprey; file-watching wrapper for open-source base callers. Does what old cloud program did
- If writing to shared drive, or external compute, can use Lamprey for local-cloud-type thing
- A middle ground between basic and power users
Throughput
- Clive took his own blood (with help), sequenced it himself, gets 20G per flow cell
- Other customers typically between 3-11G; would be nice to get everyone up to 20-30Gb
- This still only represents about 20-30% of what is possible
- Lots of reasons, many to do with extraction; people not knowing how much DNA they’re putting in
- Trying to reach into upstream workflow; focusing now on good sample prep
Software improvements to throughput
- DNA complex would occasionally wedge on top of pore; takes a few seconds, once in, won’t come out
- If caught quick enough, can do a bit of read-until and flick complex back out of pore
- This is no longer the limiting factor
Read length
- Read length = fragment length
- If a pore is presented with megabase sequences, it will produce megabase reads
- Other systems will fail due to photodamage
- If you can figure out how to get molecules into the system, can produce reads
- Josh Quick / Nick Loman managed long reads (>750kb), largely by avoiding pipetting
- have accomplished N50s of 60/70kb
- Probably no limit; limit is what can be put in. Clive expects 7Mb sequence should be able to be done
- Some nutcases at ONT think you can do whole chromosomes
- Need to take what is being learned and make it easier for everybody
Upcoming improvements for throughput and sensitivity
- MinKNOW upgrade, improved unblocking
- Lifetime of flow cell improved by 50%, yield per flow cell and cost per base goes up concommitantly
- Working on releasing an official read-until
- Working group looking on samples people are looking at, looking at best library prep to give best output
Improving / replacing 2D
- Introduced in NY meeting; phasing out 2D sequencing, replaced with 1D^2
- 2D has always been a problem; strands covalently joined with hairpin
- Accuracy plots have quite a different accuracy for template/complement
- Bad structural effects that bugger-up the basecalled signal
- can’t get speed above 250b/s
1D^2
- 1D prep
- As template strand is drawn in, other end gets closer and closer to pore
- Finish first molecule, other molecule is sitting nearby on membrane
- If second molecule hangs around long enough, it will be processed by the pore
- This occurs naturally about 1% of the time, no joining between template/complement
- Trick is to make the second molecule hang around longer
1D^2 consensus
- Accuracy much sharper
- 2 strands look like and behave like individual molecules
Traces
- Open pore current; drop as molecule goes into pore
- When first molecule is traversed, current goes back up to open pore
- Second molecule is complement of DNA
- With some trickery, can make second molecule hang around for longer
- 60% of data comes in template/complement pairs; expect that ONT can get that higher
- Can get very high 1D^2 yields at 450b/s
1D^2 accuracy
- Modal accuracy of 97/98%, a proportion are above 99%
- Long stretches of perfect data
- Algorithm is not fully optimal
- Would like modal accuracy to 99%
- Base-caller was not Scrappie, so at least has homopolymer issues
- Will need to change to R9.5 pore; better at capturing second signal
- Can still generate 1D reads
- consensus calling know what pairs are, helps polishing
- metagenomics may be more important to look at single molecules
- expect this will be forwardly-compatible with 1000 bases/s
1D^2 release
- To developers 27th March, developer kit + base caller
- general release to community 3rd May
- 2D kits discontinued on 5th May
New product; MinION well established
- Over 4,000; just started pushing into China, India, Japan
- workflow getting better, work to do on input material
- Aim to make more runnable
- MinION not licensed for service sequencing; makes sense to Clive & Spike, but not anybody else
- MinION is your personal sequencer
Offerings
- Huge performance gap between MinION and PromethION; bookending the space
- PromethION might be too large (only 10/20 HiSeq 10)
GridION
- Will make GridION X5 available
- the sequel, because it follows on from both MinION / GridION
What is GridION
- Original system that was proposed by Clive / ONT
- Designed around loading membranes in the lab; tore up design in 2011 to change to loading at ONT
- Concept is sound: large arrayable computers that can work together or individually on samples
- For a long time, GridION wasn’t taken off website
GridION X5
- Bench-top format, a big MinION
- 5 individually-addressable flow cells
- Inside, taken PromethION developments and shrunk down
- FPGAs inside, real-time base calling for up to 1000b/s for 5 flow cells
- Everything is in the box
- Allows for small group-level or service sequencing
GridION Production
- All mature, in build
- Very highly-manufacturable design
- In the zone
GridION Pricing
- Two ways to buy: capital loaded, consumable loaded
- Capital commitment of $125k, flow cells $300
- Licensed for use as fee for service
- At 10G per flow cell, $30 per Gb; at 20G/flow cell $15/Gb
- Capital-free model $475 per flow cell with support fee
- $47 per Gb at 10Gb per flow cell, $24 per Gb at 20Gb per flow cell
Also Nanopore service certified
- Institute-wide service
- will run training and QC certification process; contact support
- enables you and customers to know that samples can be processed
Summary tables for 3 products
- Should be product info on website
Shop opens for GridION next week
- Expect to ship devices in May at the latest
Dates
- March
- 1D^2 with developers
- Lamprey developer release
- Cloud base calling discontinued on 21st
- April
- Transducer (MinKNOW HP fix) on 20th
- May
- Broader 1D^2 release (3rd May)
- GridION flow cells 15th
- 2D gone on the 5th
Lots of things Clive hasn’t spoken about
- A lot covered in more detail in London
- Devlopment on targeted CAS9
- Massively improving array sensitivity
- CliveOME (replaced, with ultra-long reads)
- wants to see about enriching immunoglobulin regions
- wants centromere-spanning reads
- Zumbador looking really exciting
- has to become from a drop of blood to a genome
- Flongle / SmidgION
- Metrichor / Epi2Me; becoming completely separate from Nanopore
Questions
- Devices for service: yes, but not MiNION
- System suitable for 16s
- Error rate improving
- Guaranteed performance metrics: certification scheme will probably include this; should talk to Louisa
- Scrappie avialable for developer licenses, some in MinKNOW
- PromethION compute box can be made rack-mountable (part of roadmap, but need to see demand first)
- Switch box / old compute module can have different output options
- Optiomal temperature for sequencing - temp rises, sequencer runs quickly
- probably no problem at 50 degrees, not sure about colder
- Lifetime for flow cells * up to 4 days
- How much DNA is needed to make 20Gb?
- answer will be forthcoming
- MinION is available now in China
- Preparation of 1Mb DNA: look at Nick Loman’s blog; just be gentle
- MinION on 3rd April - can draw down latest pore or not
- same base-callers work on both pores; just rate of capture
- Consumable & Capital models are both available in parallel
- No need to implement PromethION technology on MinION
- expect to improve throughput to 1000 bases/s
- in principle 40Gb/day (60Gb/day theoretical max)
- complement distorts DNA signal, probably by stretching
- segmentation - no methods to deal with this yet, but nothing scary
- Plan for clinical MinION use? probably
- How close to routine culture-free WGS? Zumbador problem; probably pretty close
- can pop cells, tagment
- problem is sensitivity
- in principle, if they can get to the membrane, will see a lot of them
- trying to improve array sensitivity to deal with low inputs
- drop of blood / spit into tube, wait 10 mins
- Conversion of ONT terminology
- Clive used Molarity when he did molecular biology
- significant stumbling block for people
- PromethION service fee - chat with ONT service team
- PromethION based on R9.5 pores? No one has run PromethION/GridION in field
- they will only ever be 1D or 1D squared
- MinION accuracy for short fragments
- pore accuracy is independent of length
- aiming to remove capture advantage for short reads