SeqVec/embedding2structure

Authors: Michael Heinzinger

License: MIT

Contributed by: Michael Heinzinger

Cite as: https://doi.org:/10.1101/614313

Type: None

Postprocessing: None

Trained on: NetSurfP-2.0 data set

Source files

3-state, 8-state secondary structure and disorder prediction based on SeqVec


Schema

Inputs

Single numpy array

Name: None

    Shape: (1,) 

    Doc: embeddings derived from SeqVec


Targets

List of numpy arrays

Name: d3_Yhat

    Shape: (None, 3) 

    Doc:

Name: d8_Yhat

    Shape: (None, 8) 

    Doc:

Name: diso

    Shape: (None, 2) 

    Doc:


Dataloader

Defined as: ../embedding

Doc: Data-loader returning protein sequence as required by ELMo

Authors: Michael Heinzinger

Type: Dataset

License: MIT


Arguments

fasta_file : fasta file containing multiple protein sequence(s)

split_char (optional): charcter used for separating header of fasta files (together with id_field used to extract protein identifier)

id_field (optional): index for extracting protein identifier from fasta header after splitting after split_char


Model dependencies
conda:
  • python=3.6
  • conda-forge::allennlp=0.7.2
  • pip=9.0.3

pip:
  • scikit-learn==0.22.2.post1

Dataloader dependencies
conda:
  • python=3.6
  • conda-forge::allennlp=0.7.2

pip: