SoTA решение на хакатоне DataCon24 от ИТМО по автоматизации создания средств доставки лекарств
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
ACID Design Lab ee0701e736
Add files via upload
7 months ago
Articles to read Add files via upload 7 months ago
Datasets Add files via upload 7 months ago
Useful tools Add files via upload 7 months ago
README.md Update README.md 7 months ago

README.md

DataCon 3.0

Project logo

Data description

1. Mixed CPPs

Contains CPPs with natural or modified amino acids.

  1.1. POSEIDON

  Contains heterogeneous experimental data regarding CPP (natural and non-natural amino acids) activity measurements (.csv format), which are:
  - peptide name,
  - target cell line CPP was tested on cell penetration ability,
  - delivered molecule/protein,
  - paper PubMed ID,
  - cellular uptake measurement + measurement units,
  - CPP+cargo concentration,
  - incubation time,
  - incubation temperature,
  - determination method,
  - uptake type,
  - sequence.

2. Natural CPPs

Contains only sequences with natural amino acids.

   2.1. CPPBase
  
   Contains sequences of CPPs with experimentally proved activity in .fasta format.


   2.2. Experimental and Experimental2
  
   Contain more sequences of CPPs with experimentally proved activity in .txt format.


   2.3. Experimental_high_uptake
  
   Contains CPP sequences with high (but not stated) uptake in .txt format.

   2.4. Balanced_dataset
   
   Represents a balanced dataset of CPPs and non-CPPs; often used for model benchmarking.

3. Non-CPPs

Contains negative CPP samples in .txt format.

   3.1. Generated
  
   Contains randomly generated sequences treated as negative.


   3.2. Experimental
  
   Contains non-CPP sequences shown not to demonstrate activity experimentally.

4. Non-Natural CPPs

Contains CPPs consisting of non-natural amino acids.

   4.1. CPPBase_modified
  
   Contains a list of modified CPPs with experimentally proved activity in .fasta format.

  
   4.2. CPPBase_modified_symbols
  
   Contains a list of abbreviations for modified amino acids  in .txt format (ABBREVIATION: NAME; ...: ...).

Useful tools

Structure prediction

In the relevant folder you can find a Jupiter notebook with AlphaFold 2.

Just insert the sequence

insert sequence

and get a 3D structure!

3d structure