Section 1: Introduction
Section 2: Sample-By Sample Analysis
Section 3: Batch Analysis
Section 1: Introduction
Inference of CRISPR Editing (ICE) is a free online tool that provides an easy quantitative assessment of genome editing. The software compares the sequence traces of amplicons generated from genomic DNA isolated from both the edited and unedited (control) pools of cells to identify percentage of genomes modified with insertions and deletions (indels). Ideally, the desired indel should be located ~200 bp downstream of the sequencing primer. For a more complete guide to ICE analysis including detailed explanation and examples, reference Hsiau, et al. biorxiv (2018).
The current version of ICE software can analyze indels that result from single or multiplex CRISPR-Cas9 double-strand DNA breaks using SpCas9. More features will be added to the ICE software tool soon. To request other nucleases or other features and functionalities, please email firstname.lastname@example.org. We welcome your feedback.
Learning how to use the ICE software tool is quick and easy. Simply upload your Sanger sequencing files and provide basic information such as your sample names and guide sequences, and ICE will do the rest. There are no parameters that need optimizing and no complicated steps to learn. For increased flexibility and scalability, the ICE software has two analysis formats: “sample by sample” analysis, which can compare up to five editing experiments at a time, and “batch” analysis, which compares hundreds of samples simultaneously.
Below are step-by-step instructions on how to conduct both types of analyses.
Section 2: Sample-By-Sample Analysis
To run a Sample-By-Sample ICE analysis follow these steps:
- Go to https://ice.synthego.com
- Click on the “Sample by Sample Upload” tab.
- Add .ab1 files by dropping into the upload space, or by clicking “browse your files” to open all files. If the upload file type is correct, it will turn green; if not, it will turn red.
Add the following Information:
- Control .ab1 File (left): upload control Sanger sequence files ( .ab1 format)
- Edit .ab1 File (right): upload experimental Sanger sequence files ( .ab1 format)
- Guide Target Sequence: add the 17-23 nucleotide sequence of the DNA-targeting region of the guide RNA excluding the PAM. This can be provided as either DNA or RNA sequence.
- Label: a unique sample name will be automatically generated with your Edit File name. The names can be modified to any unique sample name (255 character limit).
Download the example files (on lower left) and upload/enter the following information into Sample-By-Sample form:
Control .ab1 File (left): CEL_Negative;CEL_R2.ab1
Edit .ab1 File (right): CEL_modcrispr_1_A;CEL_R2.ab1
Guide Target Sequence: AACCAGTTGCAGGCGCCCCA
- Click “Add to Analysis”. The files will display in the table called “Your Experiment”, which is a running summary of all your uploads:
Additional files can be added one at a time for up to 700 individual analyses. To add additional samples for analysis, fill in the form again with the information for each sample as noted above. Each new sample will be added as a new row (Test 1, Test 2, etc.) in the table:
Note: If over five editing outcomes are being analyzed at any given time, we recommend using the use of “Batch Analysis”.
- To complete the analysis and look over the outcomes of the ICE analysis, Click “Analyze Experiment”. For an explanation of analysis results, please see the section entitled
“Overview of Editing Analysis by ICE” below.
Section 3: Batch Analysis
To run a Batch ICE analysis follow these steps:
- Go to https://ice.synthego.com
- Click on the “Batch Upload” tab.
- Add Zip and Excel files by dropping into the upload space, or by clicking “browse your files” to open all files. If the upload file type is correct, it will turn green; if not, it will turn red.
Add the following information:
.ab1 Files (left): ZIP archive containing .ab1 files
Add a single .zip file containing experimental and negative controls (wild-type) Sanger sequence files (.ab1 format). More than one control sequence can be used. Up to 700 samples can be included in the .zip file and analyzed at once (file size limit is 225 MB).
Definition File (right): Excel file with definitions of .ab1 files
A single Microsoft Excel file (.xlsx format only) that lists a unique label, the control file name, the experimental file name, and the guide sequence for each sample. Example and template files (template_definitions.xlsx) can be downloaded on the ICE webpage. See example below:
Note: Please follow these instructions for the Excel file upload:
- Do not modify or change the current headers in the template_definitions.xlsx.
- The Label column is used for labelling your samples with a unique name that has a 255 character limit.
- The Control Files column should contain the name of the .ab1 file containing the Sanger sequence for each negative control (e.g. CEL_Negative;CEL_R2.ab1). This file must be included in the zip file.
- The Experiment Files column should contain the name of the .ab1 file containing the Sanger sequence for each experimental sample (e.g. CEL_modcrispr_1_A;CEL_R2.ab1). This file must be included in the zip file.
- The Guide Sequence column should contain the 17-23 nucleotide sequence of the DNA-targeting region of the guide RNA (excluding the PAM) for each sample. This can be provided as either DNA or RNA sequence (e.g. AACCAGTTGCAGGCGCCCCA or AACCAGUUGCAGGCGCCCCA). By default, ICE assumes you are using spCas9. However, ICE does not check if the PAM site is NGG and uses the input guide sequence to place the predicted cutsite 3bp upstream of the end of the input sequence. If you wish to analyze other nucleases, you can input a fake guide sequence and position your expected cut site 3bp from the end of your sequence. We will add explicit support of other nucleases in upcoming versions of ICE.
Note: You can copy and paste multiple file names by selecting multiple files in MacOS Finder or Windows Explorer and pasting into an Excel column.
Overview of Editing Analysis by ICE
Once the analysis is complete, a new screen appears with a graphical representation of the results and a list of the analyzed samples (see below).
1. If the sample run has no issues, the analysis window will show a green checked circle in front of the sample name. Samples that were processed with a minor error will return a yellow checked circle. Typically, a yellow check mark indicates that ICE needed to adjust a particular parameter in order to generate results. If there are no results or there was a processing error, you will see a red exclamation point in front of that sample. You can hover over the yellow or red checked circles to gather details on the issues associated with each sample.
2. Successfully analyzed samples will display the following parameters:
- Sample Label - The unique label name that you provided for each sample.
- ICE Score - The editing efficiency (percentage of the pool with non-wild type sequence) as determined by comparing the edited trace to the control trace. In the ICE algorithm, potential editing outcomes are proposed and fitted to the observed data using linear regression.
- R2 Score - When the ICE linear regression is computed during generation of the ICE Score, the Pearson correlation coefficient (r) is also computed and reported. The higher the R2 value, the more confident you can be in the ICE score.
- KO Score - Represents the proportion of cells that have either a frameshift or 21+ bp indel. This score is a useful measure for those who are interested in understanding how many of the contributing indels are likely to result in a functional Knockout (KO) of the targeted gene.
- Guide Sequence - This is the user-defined 17-23 nucleotide sequence of the DNA-targeting region of the guide RNA, excluding the PAM sequence.
- PAM Sequence - The Protospacer Adjacent Motif (PAM) sequence for the nuclease used. Currently, ICE is configured for the Cas9 nuclease from Streptococcus pyogenes (SpCas9). . The analysis can be sorted by any of the parameters displayed on the summary table. In order to search for a particular sequence or name, your browser’s “Control F” functionality can be used to find a guide or name. Note: The control sequence is not listed in the summary table.
4. The entire analysis can be downloaded as a .zip file by clicking “Download Analysis Data” on the bottom right of the analysis screen.
5. Each sample can be individually inspected in greater detail by clicking on the sample name or on its corresponding bar graph entry. This will open up a new window with three tabs. To return to the main analysis screen, hit the “back” button at any time on the top left of the screen.
The “Contributions” tab shows the inferred sequences present in your edited population and their relative representation in the edited pool. The black vertical dotted line represents the cut site, and “+” symbol on the far left marks the wild type. If you are viewing a multiplex sample, the cut site will be aligned to the most upstream cut site.
In the “Indel Distributions” tab, you’ll find an Indel plot which displays the inferred distribution of indel sizes in the entire edited population of genomes. Hovering over each bar of the Indel plot shows the size of the insertion or deletion (+ or - 1 or more nucleotides), along with the percentage of genomes that contain it.
The discordance plot shows the level of disagreement between the non-edited wild type (control) and the edited sample in the inference window (the region around the cut site). It shows, base-by-base, the average amount of signal that disagrees with the reference sequence derived from the control trace file. On the plot, the green (edited sample) and orange (control sample) lines should be close together before the cut site, and a typical CRISPR edit results in a jump in the discordance near the cut site and continuing after the cut site (representing a high level of sequence discordance).
The “Traces” tab shows the edited and control, non-edited Sanger traces in the region around the guide binding site(s). The sequence base calls from the .ab1 file are also shown above each trace. The horizontal black underlined region represents the guide sequence, and the horizontal red underline is the PAM site. The vertical black dotted line represents the cut site. Cutting and error-prone repair typically result in mixed sequencing bases downstream of the cut.
6. In order to return to the main analysis page containing all the samples, please click “Back to all”. You can also select any sample directly in the dropdown menu at the top of the screen after “Analysis of _____.” The “Next” and “Previous” buttons or pressing the arrow keys on your keyboard will also take you to the next sample on your summary table.
For questions, please consult the FAQ, or contact us at email@example.com