Problem 14: Finding Motif in Protein Sequence (UniProt)

Hard

A protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}. Your task is to retrieve protein sequences from UniProt and identify the positions of the motif.

Given: Up to 15 UniProt Protein Database access IDs.

Return: For each protein possessing the motif, output the UniProt ID followed by a list of locations in the protein string where the motif can be found.

Sample Dataset:

A2Z669 B5ZC00 P07204_TRBM_HUMAN P20840_SAG1_YEAST

Sample Output:

B5ZC00\n85 118 142 306 395\nP07204_TRBM_HUMAN\n47 115 116 382 409\nP20840_SAG1_YEAST\n79 109 135 248 306 348 364 402 485 501 614

Post a Comment

0Comments

Post a Comment (0)

#buttons=(Ok, Go it!) #days=(20)

Our app uses cookies to enhance your experience. Check Now
Ok, Go it!