Skip to content

Add preprocessing script for aws genbank

rmuntean requested to merge aws_s3 into main

We will use AWS S3 (based on GenBank, by Nextstrain) as the dataset from which the snapshots are created. This PR implements the required preprocessing steps for all AWS S3 data to compose the snapshot suitable for our tracking changes tool. This consists of a JSON file that contains the following fields for each sequence:

  • Accession ID
  • Collection Date
  • Location
  • Genome Sequence (in the aligned format)
  • Owner Lab

Merge request reports