You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

109 lines
4.3 KiB

7 months ago
# DataCon 3.0. Design a Peptide Vector for Drug Delivery
7 months ago
7 months ago
![Project logo](https://t3.ftcdn.net/jpg/04/87/14/48/360_F_487144857_lwRd6hyeEktmt70UOAgojHzlwvY6OgQp.jpg)
7 months ago
7 months ago
## About the project
7 months ago
### What are CPPs?
7 months ago
**Cell-penetrating peptides (CPPs)** are short sequences of amino acids that have the remarkable ability to cross cellular membranes, facilitating the intracellular delivery of various therapeutic agents, including drugs, nucleic acids, and proteins. These peptides exploit mechanisms such as direct penetration or endocytosis to traverse cell membranes, making them powerful tools in drug delivery systems.
In real-world medical applications, CPPs are being leveraged to enhance the efficacy of treatments for a range of conditions. For instance, they are used in targeted cancer therapies to deliver chemotherapeutic agents directly to tumor cells, minimizing damage to healthy tissues. Additionally, CPPs are employed in gene therapy to transport genetic material into cells, offering potential treatments for genetic disorders like cystic fibrosis and muscular dystrophy. Their versatility and efficiency in overcoming cellular barriers position CPPs as a promising frontier in the development of advanced therapeutic strategies.
7 months ago
### Project aim
Our ultimate goal is to develop precise machine learning (ML) model allowing to **design CPPs with superior activity**.
### Challenges
The main challenge here is to develop **unbiased model** not limited to existing CPP structures and cell penetration mechanisms. Another challenge is to develop CPPs **for particular drug delivery system and setup**, which includes multi-property optimization (amphiphilicity, molecular weight, toxicity etc.). Finally, models should be **interpretable**, which means user should know why particular CPP demonstrates its activity, and what are the possible ways to improve it further.
### Schedule
7 months ago
DataCon 3.0 includes not only practices but authoritative lectures and other activities, therefore check for any schedule updates [HERE](https://link-url-here.org](https://scamt.ifmo.ru/datacon/).
7 months ago
7 months ago
## Data description
7 months ago
7 months ago
### 1. Mixed CPPs
7 months ago
7 months ago
Contains CPPs with natural or modified amino acids.
7 months ago
7 months ago
1.1. POSEIDON
Contains heterogeneous experimental data regarding CPP (natural and non-natural amino acids) activity measurements (.csv format), which are:
- peptide name,
- target cell line CPP was tested on cell penetration ability,
- delivered molecule/protein,
- paper PubMed ID,
- cellular uptake measurement + measurement units,
- CPP+cargo concentration,
- incubation time,
- incubation temperature,
- determination method,
- uptake type,
- sequence.
7 months ago
7 months ago
### 2. Natural CPPs
7 months ago
7 months ago
Contains only sequences with natural amino acids.
7 months ago
7 months ago
2.1. CPPBase
Contains sequences of CPPs with experimentally proved activity in .fasta format.
7 months ago
7 months ago
7 months ago
2.2. Experimental and Experimental2
Contain more sequences of CPPs with experimentally proved activity in .txt format.
7 months ago
7 months ago
7 months ago
2.3. Experimental_high_uptake
7 months ago
Contains CPP sequences with high (but not stated) uptake in .txt format.
2.4. Balanced_dataset
Represents a balanced dataset of CPPs and non-CPPs; often used for model benchmarking.
7 months ago
7 months ago
### 3. Non-CPPs
7 months ago
7 months ago
Contains negative CPP samples in .txt format.
7 months ago
7 months ago
3.1. Generated
Contains randomly generated sequences treated as negative.
7 months ago
7 months ago
7 months ago
3.2. Experimental
Contains non-CPP sequences shown not to demonstrate activity experimentally.
7 months ago
7 months ago
### 4. Non-Natural CPPs
7 months ago
7 months ago
Contains CPPs consisting of non-natural amino acids.
7 months ago
7 months ago
4.1. CPPBase_modified
7 months ago
Contains a list of modified CPPs with experimentally proved activity in .fasta format.
7 months ago
7 months ago
7 months ago
4.2. CPPBase_modified_symbols
7 months ago
Contains a list of abbreviations for modified amino acids in .txt format (ABBREVIATION: NAME; ...: ...).
7 months ago
## Useful tools
### Structure prediction
In the relevant folder you can find a Jupiter notebook with AlphaFold 2.
Just insert the sequence
![insert sequence](https://github.com/acid-design-lab/DataCon24/assets/82499756/607ad1b2-0b29-4490-8771-76de5fd6f9b3)
and get a 3D structure!
![3d structure](https://github.com/acid-design-lab/DataCon24/assets/82499756/93b60dd2-b627-4c46-82c9-6127af4e26bb)