# wip_kaggle the 'most_similar_papers_via_tfidf.py' script is the heart of this WIP. All other files are just here to see what I have been exploring in the dataset. Sadly, 'most_similar_papers_via_tfidf.py' has a problem with parsing some of the papers. So it is advised to only try the code with the data from biorxiv_medrxiv.