I am working on a project about error identification by data science students and data science professionals to see how does ability to catch errors varies by level of experience as well as familiarity to the data and type of analysis. I am also developing the glossary Python package in collaboration with The Carpentries and developing software to facilitate technology education research.
I automatized the process of creating Virtual Machines for training purposes using Packer, Ansible, Docker Compose and GitLabCI in a Digital Ocean environment. I also deployed data science infrastructure such as RStudio Professional Products in Amazon Web Services using Terraform. As well, I administered RStudio Connect, RStudio Package Manager and RStudio Server Pro instances for clients in the pharmaceutical and media industries. As part of my responsabilities I developed R packages and Shiny apps for Systems Administration.
I designed an instructor led training course guided on evidence-based approaches for teaching technology. I also delivered lectures around Data Science, Machine Learning, Ethics and Statistics.
I designed the curriculum for a machine learning bootcamp based in a startup accelerator following evidence-based approaches. I also delivered lectures around Data Visualization, Statistics and Machine Learning.
I collaborated in the development of educational material for the first iteration of the Introduction to Data Science course at UBC. I also developed laboratories and workshops that contained automated testing so that students could obtain instant feedback.
I developed an open source Shiny application to visualize expenses from the Government of Puerto Rico. (https://github.com/ian-flores/TransparenciaFinanciera) I also worked for the Puerto Rico Violent Death Reporting System, a surveillance system in collaboration with the CDC and hosted by the Institute of Statistics. As part of my collaboration with the System, I produced reports with RMarkdown reducing report-generation time from 6 hours to 15 minutes. I also implemented a database migration from on-premise MS SQL Server to a cloud CouchDB instance deployed in Amazon Web Services. I had within my responsabilities the maintainance and completion of a React application used for data collection purposes ensuring data quality through the UX. Within the scope of this work I documented the data infrastructure for technical and non-technical stakeholders.
I applied neural networks to classify animal callings in audio recordings. I also devised data augmentation techniques for images of the callings to lower the validation error of classification algorithms applied to small datasets.
I developed an open source application to visualize the spatial distribution and temporal patterns of animals in the continental United States. I also researched the use of scientific visualization techniques to make data insights more accessible to non-scientific users.
I analyzed the spatial distribution of malaria in lizards in a secondary forest. I also implemented regression methods to explain the distribution of the disease.
I implemented mark-recapture methods to sample bats throughout the metropolitan area of Puerto Rico. As part of this study I also analyzed microscopic samples of bat hair searching for pollen grains to track their eating patterns.
Master of Data Science. Courses focused on Inferential Statistics, Machine Learning, Bayesian Statistics, Natural Language Processing, Ethics, Security, Software Development and Databases. Collaborated with Dr. Greg Wilson and RStudio as part of my final project applying graph embeddings to understand which workflows users follow on the GitHub platform when versioning their code. Awarded MDS International Scholarship in 2018.
Bachelor of Science in Integrative Biology. Collaborated in research projects with Dr. Elvia Melendez-Ackerman studying bat pollinators and their spatial range in urban areas, with Dr. Miguel Acevedo studying the spatial distribution of malaria and its implications on disease dynamic modeling, and with Dr. Carlos Corrada exploring data augmentation methods with small datasets to apply Deep Learning methods to bioacoustics.