Protein Identification by Nanopore Peptide Profiling



This dataset belongs to “Protein Identification by Nanopore Peptide Profiling” and describes the raw data and analysis of tryptic digested peptides translocating through a mutant Fragaceatoxin C nanopore. A jupyter notebook describing the analysis and structure is added to this dataset.

Data description:
Protein Identification by Nanopore Peptide Profiling.ipynb
Jupyter notebook contained data analysis of data contained in and (Python 3.7)
Supplementary  scripts belonging to “Protein Identification by Nanopore Peptide Profiling.ipynb”. See explanation of custom classes in the jupyter notebook. - Folder containing raw electrophysiology data and result after analysis with “Protein Identification by Nanopore Peptide Profiling.ipynb”, with each folder containing the following:
Alpha casein: Tryptic digest of alpha casein
Beta casein: Tryptic digest of beta casein
BSA: Tryptic digest of bovine serum albumin
Control: Tryptic digest of water (no protein, control measurement)
Cytochrome c: Tryptic digest of cytochrome c
DHFR_His6: Tryptic digest of dihydropholate reductase (His6 tagged)
EFP: Tryptic digest of elongation factor P
HMW1Act: Tryptic digest of high molecular weight adhesin protein - Folder containing raw electrophysiology data, comma-separated MS peptide masses, and result after analysis with “Protein Identification by Nanopore Peptide Profiling.ipynb”, with each folder containing the following:
Lysozyme: Tryptic digest of lysozyme              
PAN: Tryptic digest of proteasome-activating nucleotidase
TbpA_Y27A: Tryptic digest of periplasmic binding protein
Trypsin: Tryptic digest of bovine trypsin
Mass_spec: csv files containing measured ESI-MS peptides
Lysozyme synthetic peptides: Synthetic peptides:
Lys2alk:  C(+57.02)ELAAAMK
Lys4alk: WWC(+57.02)NDGR
Lys6alk: GYSLGNWVC(+57.02)AAK

The structure of the data files is registered in 'index.csv' (digested proteins) and  'index_peptides.csv' (synthetic peptides) contained in the data folder. In this file, we describe the protein that was measured as well as the folder location and the expected baseline / standard deviation.

Structure of ./data/index.csv
Protein (string) | Folder (string) | Baseline (pA) (float) | Baseline Error (pA) (float)
In each Folder, there is another 'index.csv', explaining which files are with protein and which are without (blank).

Structure of ./data/[protein]/[repeat]/index.csv
blank (boolean) | fname (string)

Each folder in and contains a folder for each measure protein, which contains a folder for each repeat. The repeats contain raw axon binary files (.abf), each file contains measurement conditions as follows:
    [Date of measurement]_[Pore type]_[Buffer conditions]_[added analyte(s)]_[operator initials]
    e.g: 20200312_1M_KCl_50mM_Citricacid_50mM_BTP_pH_38_FraC_G13F_neg70mV_20ul_CytC_TrypsinGold_FL_0000

    Measured on 12-03-2020, in 1M KCl buffered with Citricacid (50 mM) adjusted using bis-tris-propane to pH 3.8, using Fragaceatoxin C mutant G13F at a negatively applied potential of 70 mV. 20 µL cytochrome c was added to the cis compartment.

The total volume of the container used for all electrophysiology experiments was 400 µL, all samples were prepared at a 1 g/L concentration. A prefix “perf” before analyte description indicates that the chamber was flushed with approximately 2 mL fresh buffer prior to analysis. The buffer condition "BTP" means bis-tris-propane, which is used to titrate to the exact pH of 3.8.

Each analysed folder contains results.pkl file, containing the analysis result as provided by “Protein Identification by Nanopore Peptide Profiling.ipynb” - see the jupyter notebook

Each analysed folder contains results_analysis.xlsx, which contains sheets with excluded currents, standard deviations, dwell time and beta value for the pore without analyte added “Blank” and results from the analyte added in “Results”. Parameters used for fitting are contained in “Parameters”. The “Histograms” tab shows the raw data of the excluded current spectra. – Folder containing mass spectrometry files as analysed by PEAKS Studio

The folder contains an subfolder for each protein measured using electrospray ionisation mass spectrometry (ESI-MS).
acasein: alpha casein protein
b_casein: beta casein protein
BSA: bovine serum albumin
CytC: cytochrome C digested
DHFR: dihydropholate reductase
HMW1_Act: high molecular weight adhesin protein
PAN: proteasome-activating nucleotidase
ThBP: periplasmic thiamine binding protein
Trypsin: bovine trypsin
