Saving Coral Reef
This is the next breadcrumb! Its been a while. Saving Coral Reef's will answer how I got into machine learning.
I worked at Scottrade during my college years as a glorified receptionist and can admit had ample downtime and boredom while on the job. My curiosities led me to undergraduate research. So this is undergrads doing research for a professor that has the time and patience to meet with undergrads that most likely have no idea what they're looking at. I fell under this category nicely. With few replies, those that did usually politely said no to seeing as I was an undergrad, to the countless emails to any professor in related fields. It was not until the winter break heading into my final semester I got a reply from Dr. Peter Salamon. A math professor who'd just gotten back from a sabbatical (the email response was something like three months after I sent it). He was someone I though was most likely not taking on an undergrad due to his pedigree of research papers (150+) and degree of mathematics he was teaching. I'd say I got pretty lucky with the timing there.
After an introductory interview to see if I'd be fit and an overview of what this research would entail, I blindly took on the challenge not knowing where this experience would be worth the commitment. Dr Salamon began by having me to get everything in the school library about neural networks. I think I got five books related to the brain, learning patterns, and this subfield within artificial intelligence. I had no idea what this stuff was, how it was used, or even why it mattered. The name that kept coming up in our early discussions was Geoffrey Hinton. Little did I know this is the "godfather" of modern day smart devices. He rebirth-ed the concepts behind neural networks and basically revived the ability of computers to learn. His work and theory is behind much of what is seen with these smart computers beating humans (Go and youtube scraping to recognize cats) and advancements in modern computing (cpu to gpu and now cpu). I ended up watching every video he and another big name Andrew Ng put out in MOOC format behind Standards online education. This is the definition of drinking through a firehose I'd say.
After the initial steps, Dr Salamon had me build my own neural network on the classic Iris data set. Best way to learn was to do, even if I didn't fully grasp the terms, theory, or purpose. However, when I saw I could build a system that recognizes a flower based on only three features I became a believer. Some point early in the semester, I remember Dr Salamon saying I was progressing very fast (completely did not think so, actually thought quite the opposite) and telling me about a project involving the biology department and coral reefs. He saw I could fit in and help them finish a paper related to coral reefs. The topic is a direct implications of climate change and water pollution. After diving into what was already there, my task was to build a system that accurately scored the health of the coral reef based only on an image. The end goal was to have this available at tourist locations that dove into coral reefs and have real time updates on the health of coral reefs.
Needless to say, I dove right in; an opportunity to contribute to an actual paper and a chance to apply what I was learning. That was something special given the out of the blue nature of my research. I was given hundreds of coral images, research papers related to this topic, and more books to understand the best methods to build this system. Building the system even required that I tap into the schools "super" computer, named Palpatine, since my little Macbook didn't have the processing power to build it. I had Palpatine running for days at some point. The premise of neural networks is to hopefully find a global minimum in terms of mean square error when approximating a sample to an ideal output. In this case, that was the NCEAS score of the coral reef in regards to the reefs health. This takes ample time and computing power to find. Initially the images were analyzed on 109 features ranging from hue, contrast, colors but part of my research was to figure out which of those were redundant and not needed in order to obtain viable results. After doing covariance analysis, I was able to drop this down to 20 features. This effectively reduces the noise in the data. However, still quite the step up from the Iris system. To assure results, we combined multiple neural nets with other machine learning methods, such as random forests and support vector machines, to output health scores.The final system was able to reduce MSE down to below 3, which is viable given the ranges of health under the NCEAS scale.
My final task was to write up (technical writing was completely a task of its own to learn) my process and findings and finalize a paper to submit to journals. I ended the research and semester by presenting my findings to a panel of teachers and researchers. Funny to say, my harshest interrogator during the presentation was Dr. Salamon. A lot of luck was involved for me to be able to go from a begging undergrad to a paid researcher contributing to a scientific paper. I actually apply this knowledge to my work at Pinn as we utilize similar technology to build our own recognition systems. For this, I am ever grateful of those that I came across whom gave me the opportunity and took the leap of faith when trusting me with their time. Here is the paper we submitted and got published by PeerJ and a picture of a coral that I used from the Curacao region! Hope you enjoyed.