How can data make your life better? This was one of the central questions explored at the third annual Boston Data Festival (September 18-23, 2015). It was inspiring to meet so many people and companies sharing ideas about how to work together on data to solve today’s challenges.
Through two days of events focusing on Big Data and Data Science innovation, I learned about a wide assortment of the relatively new ways that data science is helping businesses, researchers, and city governments solve problems efficiently and effectively. Below I discuss two themes that emerged during several of the sessions.
Data Science is for Making the Complex Comprehensible
Kelly Jin leads the Data Visualization team for the City of Boston. Her presentation on "Creating a Culture of Data Visualization and Analytics" was about Boston's Citywide Analytics team and their work creating tools for citizens and city employees. Some tools provided summary visuals that served to make better sense of messy datasets, such as office based dashboards providing the Mayor live insights about city performance, as well as mobile tablet-based dashboards for employees to use in the field. Other projects dealt with streamlining the way data is handled in the city; for example, parts of the the city permit system that require certain forms to be filled out with a typewriter.
Kelly described her team's purpose as "Improving quality of life for Boston citizens". The team’s non-technical mission (compared to "Synthesizing and optimizing municipal data streams to maximize fiscal efficiency") is critical for getting more people to understand and embrace creating data science teams in their organizations.
The importance of not getting caught up in the technical details was echoed by another speaker from a very different field. John Hugg is one of the creators of VoltDB, a database that is over a hundred times faster than traditional databases. In his talk "7 Years of Lessons Learned", he suggested that "Explaining tech to people is not a good way to sell something", even when pitching to engineers or technical teams. Instead, he recommended emphasizing what the technical capabilities would enable them to do. For example, rather than telling people that using distributed cluster for their records would speed up their systems by two orders of magnitude, one should focus on highlighting the outcomes that a hundred times faster speed would allow (such as enabling real-time feedback for health applications, potentially saving lives).
Data Science may involve lots of numbers and abstract theory, but it captures people's imaginations because of the tangible impact it can have.
Bridging Information Silos with Data Science Applications
Interactive data visualization is not just about creating visually pleasing graphics. There are now more and more tools to help people create "data applications" that facilitate exploration and "play" with algorithms to facilitate useful discoveries. Dave King of Exaptive offered several anecdotes about spontaneous cross-disciplinary inspiration, where a team of researchers studying effects of drugs on fish motor patterns solved their problem through using an algorithm developed for studying hurricane movement. He (and other data scientists) are looking to figure out how to create workflows that make these serendipitous connections happen more frequently.
There are several tools developed in recent years that enable collaborations between people who are familiar with different computing languages. There doesn't seem to be a "singular language" of data science- there were staunch supporters of Julia, R, Python, MATLAB, Scala, R, and Stata present. It seems like that each language has particular strengths. Exaptive and the Beaker Polyglot Notebook are two "translation" applications that let people write program components in their favorite language, and then facilitate connections between the different components. For example, someone could write a web-scraping service in Python, analyze the data in R, and then pass calculated values to a D3 visualization library. These two tools have many differences, but the spirit of letting people collaborate with others using different languages is the same.
In her city data analytics presentation, Jin described an application that showed the benefit of getting disparate departments to share datasets. Her team developed a mapping application for the Fire Department dispatcher. It sources building hazard data from several Boston government agencies to place icons marking various hazards relevant to firefighters (asbestos, failed building code violations, etc.) onto the same map used to direct firefighters. This data had always existed in the Boston Government, but it hadn't been previously integrated into the firefighters' main dispatch map. This dashboard will help keep the city firefighters safe and prepared as they embark upon their missions.
When Data Science applications translate and unite previously divided data sources, organizations can perform more creatively and effectively.
This is just a small sample of the ideas and technologies shared at the conference. I left the event feeling very optimistic about the future of data science in government, business, and academia.
If you're looking to get started with open data sources in Connecticut, check out CtData.org!
Cameron Yick is a Computer Science/Electrical Engineering major at Yale College (‘17). He joined the Environmental Performance Index (EPI) this year to work on data analytics and build interactive visualizations for the 2016 EPI launch.