Section 1: Introduction
Section 2: Sample-By Sample Analysis
Section 3: Batch Analysis
Section 1: Introduction
Inference of CRISPR Editing (ICE) is a free online tool that provides an easy quantitative assessment of genome editing. The software compares the sequence traces of amplicons generated from genomic DNA isolated from both the edited and unedited (control) pools of cells to identify percentage of genomes modified with insertions and deletions (indels). Ideally, the desired indel should be located ~200 bp downstream of the sequencing primer. For a more complete guide to ICE analysis including detailed explanation and examples, reference Hsiau, et al. biorxiv (2018).
The current version (1.0) of ICE software can analyze indels that result from a single CRISPR-Cas9 double-stranded DNA break using SpCas9. Requests for other nucleases, or other features and functionalities should be made to firstname.lastname@example.org. We welcome your feedback.
ICE software has two possible analysis formats: Sample-by-Sample analysis and Batch analysis. The former is used to compare up to five editing experiments at a time. Batch analysis is used to analyze more than five experiments at a time. Below are step-by-step instructions for how to conduct both types of analyses.
Section 2: Sample-By-Sample Analysis
To run a Sample-By-Sample ICE analysis follow these steps:
- Go to https://ice.synthego.com
- Click on the “Sample by Sample Upload” tab.
- Add ABI files by dropping into the upload space, or by clicking “browse your files” to open all files. If the upload file type is correct, it will turn green; if not, it will turn red.
Add the following Information:
- Control AB1 File (left): upload control Sanger sequence files ( .ab1 format)
- Edit AB1 File (right): upload experimental Sanger sequence files ( .ab1 format)
- Guide Target Sequence: add the 17-23 nucleotide sequence of the DNA-targeting region of the guide RNA excluding the PAM. This can be provided as either DNA or RNA sequence.
- Label: a unique sample name will be automatically generated with your Edit File name. The names can be modified to any unique sample name (255 character limit).
Download the example files (on lower left) and upload/enter the following information into Sample-By-Sample form:
Control ABI File (left): CEL_Negative;CEL_R2.ab1
Edit ABI File (right): CEL_modcrispr_1_A;CEL_R2.ab1
Guide Target Sequence: AACCAGTTGCAGGCGCCCCA
- Click “Add to Analysis”. The files will display in the table called “Your Experiment”, which is a running summary of all your uploads:
Additional files can be added one at a time for up to 700 individual analyses. To add additional samples for analysis, fill in the form again with the information for each sample as noted above. Each new sample will be added as a new row (Test 1, Test 2, etc.) in the table:
Note: If over five editing outcomes are being analyzed at any given time, we recommend using the use of “Batch Analysis”.
- To complete the analysis and look over the outcomes of the ICE analysis, Click “Analyze Experiment”. For an explanation of analysis results, please see the section entitled
“Overview of Editing Analysis by ICE” below.
Section 3: Batch Analysis
To run a Batch ICE analysis follow these steps:
- Go to https://ice.synthego.com
- Click on the “Batch Upload” tab.
- Add Zip and Excel files by dropping into the upload space, or by clicking “browse your files” to open all files. If the upload file type is correct, it will turn green; if not, it will turn red.
Add the following information:
AB1 Files (left): ZIP archive containing AB1 files
Add a single .zip file containing experimental and negative controls (wild-type) Sanger sequence files (.ab1 format). More than one control sequence can be used. Up to 700 samples can be included in the .zip file and analyzed at once (file size limit is 225 MB).
Definition File (right): Excel file with definitions of AB1 files
A single Microsoft Excel file (.xlsx format only) that lists a unique label, the control file name, the experimental file name, and the guide sequence for each sample. Example and template files (template_definitions.xlsx) can be downloaded on the ICE webpage. See example below:
Note: Please follow these instructions for the Excel file upload:
- Do not modify or change the current headers in the template_definitions.xlsx.
- The Label column is used for labelling your samples with a unique name that has a 255 character limit.
- The Control Files column should contain the name of the .ab1 file containing the Sanger sequence for each negative control (e.g. CEL_Negative;CEL_R2.ab1). This file must be included in the zip file.
- The Experiment Files column should contain the name of the .ab1 file containing the Sanger sequence for each experimental sample (e.g. CEL_modcrispr_1_A;CEL_R2.ab1). This file must be included in the zip file.
- The Guide Sequence column should contain the 17-23 nucleotide sequence of the DNA-targeting region of the guide RNA (excluding the PAM) for each sample. This can be provided as either DNA or RNA sequence (e.g. AACCAGTTGCAGGCGCCCCA or AACCAGUUGCAGGCGCCCCA). By default, ICE assumes you are using spCas9. However, ICE does not check if the PAM site is NGG and uses the input guide sequence to place the predicted cutsite 3bp upstream of the end of the input sequence. If you wish to analyze other nucleases, you can input a fake guide sequence and position your expected cut site 3bp from the end of your sequence. We will add explicit support of other nucleases in upcoming versions of ICE.
Note: You can copy and paste multiple file names by selecting multiple files in MacOS Finder or Windows Explorer and pasting into an Excel column.
Overview of Editing Analysis by ICE
Once analysis is complete, a new screen appears with a graphical representation of the results and a list of the analyzed samples (see below).
1. If the sample run has no issues, the analysis window will show a green checked circle in front of the sample name. Samples that were processed with a minor error will return a yellow checked circle. Typically, a yellow check mark indicates that ICE needed to adjust a particular parameter in order to generate results. If there are no results or there was a processing error, you will see a red exclamation point in front of that sample. You can hover over the yellow or red checked circles to gather details on the issues associated with each sample.
2. Successfully analyzed samples will display the following parameters
- Sample Label - The unique label name that you provided for each sample.
- ICE Score - The editing efficiency (percentage of the pool with non-wild-type sequence) as predicted by comparing the edited trace to the control trace. In the ICE procedure, potential edit outcomes are proposed and fitted to the observed data using linear regression.
- R2 Score - When the ICE linear regression is computed, the Pearson r correlation coefficient is also computed, and reported. The higher the R2 score, the more confident you can be in the ICE score.
- ICE-D Score - An alternative way to predict percentage of edits in the sequenced pool. The ICE-D method measures how much of the edited sequence trace differs from the control sequence trace and uses an empirically derived correction factor to estimate the editing efficiency. The ICE-D score is useful when you have unexpected edits (eg, a large deletion or insertion) that are not modeled by the ICE algorithm. You should use the ICE-D score if the R2 value is low and the ICE-D score is higher than the ICE score
- Guide Sequence - This is the 17-23 nucleotide sequence of the DNA-targeting region of the guide RNA and excludes the PAM sequence.
- PAM Sequence - The Protospacer Adjacent Motif (PAM) sequence for the nuclease used. Currently, ICE is configured for the nuclease from Streptococcus pyogenes (SpCas9).
3. The analysis can be sorted by any of the parameters displayed on the summary table. In order to search for a particular sequence or name, your browser’s “Control F” functionality can be used to find a guide or name. Note: The control sequence is not listed in the summary table.
4. The entire analysis can be downloaded as a .zip file by clicking “Download Analysis Data” on the bottom right of the analysis screen.
5. Each sample can be individually inspected in greater detail by clicking on the sample name or on its corresponding bar graph entry. This will open up a new window with four tabs. To return to the main analysis screen, hit the “back” button at any time on the top left of the screen.
The first tab on the left is “TRACES”. This view shows the edited and wild-type (control) Sanger traces in the region around the guide binding site. The sequence base calls from the ab1 file are also shown above each trace. The horizontal black underlined region represents the guide sequence. The horizontal red underline is the PAM site. The vertical black dotted line represents the actual cut site. Cutting and error-prone repair usually results in mixed sequencing bases after the cut.
Discord & Indel Tab
The second tab is “DISCORD & INDEL.” The discordance plot shows the level of
disagreement between the wild type (control) and the edited sample in the inference window (the region around the cute site), i.e. it shows, base-by-base, the average amount of signal that disagrees with the reference sequence derived from the control trace file. On the plot, the green (edited sample) and orange (control sample) lines should be close together before the cut site, and a typical CRISPR edit results in a jump in the discordance near the cutsite and continuing after the cut site (representing a high level of sequence discordance).
The Indel plot (on the right of the screen in this tab) displays the inferred distribution of indel sizes in the entire edited population of genomes. Hovering over each bar of the Indel plot shows the size of the insertion or deletion (+ or - 1 or more nucleotides), along with the percentage of genomes that contain it. Note: Each indel size represented in the Indel plot may not necessarily occur in the same sequence. The percentages of different indel sizes in the cell population are not the same as ICE/ICE-D scores.
The “Contributions” tab shows the inferred sequences present in your edited population and their relative proportions (in contrast to the Indel plot under the Discord & Indel tab that does not specify sequence contributions). The cut site is represented by a black vertical dotted line and the wild-type sequence is marked by a “+” symbol on the far left.
The fourth tab is “ALIGNMENT”. This final tab can be used for further troubleshooting. The “Alignment of Sanger Base Calls" at the top depicts the alignment of the entire control (wild type) and the edited Sanger sequences. If the dominant indel in the edited sequence has high representation, it may be visible in this alignment. A vertical line will be missing between bases in the sequence if the dominant nucleotide in the edited sample is different from the control sequence. At the bottom, the “Alignment of Control Window to Edited Sample" comparison depicts the alignment window (the high-quality portion of the edited sequence before the most upstream-reaching indel) and the full edited sequence. This bottom alignment can be used to troubleshoot the alignment process. There is a horizontal scroll tool (color-coded to each base) at the bottom that lets you navigate through the entire analyzed sequence.
6. In order to return to the main analysis page containing all the samples, please click “Back to all”. You can also select any sample directly in the dropdown menu at the top of the screen after “Analysis of _____.” The “Next” and “Previous” buttons or pressing the arrow keys on your keyboard will also take you to the next sample on your summary table.
For questions, please consult the FAQ, or contact us at: email@example.com