Add preprocessing script for aws genbank
We will use AWS S3 (based on GenBank, by Nextstrain) as the dataset from which the snapshots are created. This PR implements the required preprocessing steps for all AWS S3 data to compose the snapshot suitable for our tracking changes tool. This consists of a JSON file that contains the following fields for each sequence:
- Accession ID
- Collection Date
- Location
- Genome Sequence (in the aligned format)
- Owner Lab