This Month In Data Science—April, 2015

This Month In Data Science—April, 2015

This month in dataThe impact of data science and big data technologies was felt far and wide in April, with applications ranging from determining the latest fashions, to sensor-connected lighting, and even to entirely new forms of food. Here’s our roundup of the biggest data science news of the month, both from within Pivotal and beyond.

The Latest Fashion, Trending on Google
Google is scanning its wealth of user-generated data for signs of fashion trends as it works to enhance its online advertising sales. The company plans to start issuing fashion trend reports based on user searches twice a year, hailing from its emerging fashion and luxury team, headed by Lisa Green.

The Sensor-Rich, Data-Scooping Future
This article details how new sensor-equipped home and office devices and supplies are changing the way manufacturers such as General Electric and Huawei plan on doing business. Rather than being disposable, long-lasting and sensor-connected products such as LED lights will require manufacturers to consider lower costs and greater energy efficiency, problems with direct data science applications.

Crowdsource Data Science to Add Superficial Intelligence to AI
TechRepublic warns that data science and big data-driven products and services may be developed with too much attention paid to AI, and not enough to what it calls “superficial intelligence.” By integrating superficial intelligence—data about users’ whims, activities, and other behavior—companies can develop enhanced applications and services that better serve their consumers.

Could Data Science be PR’s Holy Grail?
Measuring success is a long-running challenge within the public relations business, making it difficult for practitioners to point to unqualified successes, which can lead to client dissatisfaction. While agencies attempt to track “crude metrics”, data science approaches which analyze sentiment and response in the media and on social media sites may provide better, and more useful, metrics on the success of a public relations campaign.

Cybersecurity, Data Science and Machine Learning: Is All Data Equal?
When applying machine learning to cybersecurity data, it’s important to recognize that not all data sources are equally relevant to the models being developed. In this opinion piece for Computerworld, David Lopes Pegna argues that the value of the data is directly correlated to its type. “Positive data, i.e. malicious network traffic data from malware and cyberattacks,” he writes, “have much more value than some other data science problems.”

Less Noise but More Money in Data Science
While the New York Times states that the hype surrounding data science is slowing down, a new survey by recruitment company Burtch Works finds that the demand for data scientists continues to grow. The study charts the median base salary for data scientists at $91,000 ($110,000 in Silicon Valley,) with demand for practitioners increasing in consumer-facing economic segments, such as retailing, consumer products, insurance, healthcare and manufacturing.

From Big Data to Big Bets on Food Science
Dan Zigmond, vice president of data at Hampton Creek, is applying data science to not only the production and distribution of food, but to create entirely new types of food. One example is an ersatz chicken egg made from Canadian yellow pea and sorghum, which could be produced in a healthier, more efficient, and cost-effective way for low-income populations. The company’s biologists have cataloged over 4000 plant proteins, searching for ways to make food that is healthier, better tasting, and friendlier to the environment.

This Month in Pivotal Data Science

Data As The New Oil: Producing Value for the Oil & Gas Industry
During a webinar for Data Science Central, Pivotal’s Senior Data Scientist, Rashmi Raghu, discussed how apt the popular metaphor “data is the new oil” is, particularly within the oil and gas industry. Significant technological advances in oil and gas production methods are producing an ever-increasing amount of data from sensors, which companies can leverage to improve logistics, business operations, and more. Through a wide breadth of data collection sources and big data technologies and techniques, oil companies can improve efficiency, realize new business opportunities, and enhance decision making.

Pivotal Extends HAWQ, The SQL On Hadoop Engine, To Hortonworks HDP
Pivotal continues to make quick progress on our mission to make our industry leading Big Data Suite products more accessible and usable across the Apache Hadoop® ecosystem. Building on February’s announcements around the Open Data Platform initiative, where Pivotal and Hortonworks along with other big data visionaries, partnered to create a common commercial core of Apache Hadoop® projects, Pivotal and Hortonworks announced today that Pivotal HAWQ, the SQL on Hadoop query engine that speeds up big data queries by up to 1000x, is now available on Hortonworks Data Platform. The new release also supports Apache Ambari.

Pivotal People–Sarah Aerni on How To Become a Data Scientist
In this post, one of Pivotal’s Principal Data Scientists, Sarah Aerni, answers 7 questions about data science as a profession and covers the type of data scientists on the Pivotal team. She answers questions about career paths, popularity, skills, experience, typical projects, tips for students, and why Pivotal is a great place to be a data scientist.

Pivotal Data Science Team Iterates Faster, Beats Existing Malware Detection Tools
This post provides a case study in innovative IT security practices for a small, 6-week project. The challenges, approach, and outcomes of this data science-driven effort show how an innovative use of algorithms can do a better job detecting malware than existing, best-in-class malware detection tools and operate much faster than existing customer models within Apache Hive™. Using an accelerated, agile, and iterative development approach with methods borrowed from graph theory, natural language processing (NLP) and anomaly detection, we implemented in a highly parallelized way on Pivotal Greenplum Database, the team from Pivotal Data Labs quickly achieved significant results for this global insurance company.

Build Newsletter: Open Source, Cloud Platforms, Mobile/IoT, DevOps/Agile—April 2015
In this month’s issue of Pivotal’s Build Newsletter for developers and architects, we first update you on big releases in the open source space—including open sourcing over a million lines of Pivotal GemFire® code in Project Geode. More news announcements around the Open Data Platform, Cloud Foundry and Spring are also included. We also look at evidence that mobile and IoT are connecting in many ways and running on the same back end platforms while the front-ends are experiencing increasingly faster innovation cycles. Lastly, we have a roundup of several excellent perspectives on Agile and DevOps, from CIO to developer.

All Things Pivotal Podcast Episode #24–Big Data vs Climate Change
Scientists around the world are performing experiments and doing analysis with a focus on investigating the nature of Climate Change. The Big Data vs. Climate Change program is a joint effort by EMC Corporation, Pivotal and the Earthwatch Institute. It enables the study of interactions between nature and climate, and promotes the engagement of citizen scientists using data lakes, analytic tools and visualizations. In this episode, Simon is joined by Vatsan Ramanujan, a Principal Data Scientist at Pivotal. Vatsan shares some insight into the work that was done, and some interesting stories from “out in the field”.

Which Way Might the Apache Way Take Geode?
Pivotal’s Director of Open Source explains more on why Pivotal is open sourcing the core of Pivotal GemFire® under the name Project Geode, and why Pivotal chose the Apache Software Foundation to provide open governorship for the core of its market leading in-memory database software.

Become A Founding Contributor of “Project” Geode
Pivotal announced the creation of “Geode”, the new in-memory distributed database that will form the open source core of Pivotal GemFire. As part of this announcement, Pivotal has submitted a proposal to The Apache Software Foundation (ASF) to establish and incubate Project Geode through collaborative development. The Project Geode community will remain in a fledgeling state until such time as ASF agrees to incubate the project. This post covers more on what Project Geode is, how to get involved, and how customers have been using Pivotal GemFire to date.

Editor’s Note: The Apache Software Foundation voted to approve Project Geode to become an incubator project on April 27, 2015. Announcements on the progress of this project and where to download the source are expected shortly.

Pivotal Data Roadshow: A View From the Road
The response to the Pivotal Data Roadshow, held in a number of cities across North America this month, has been quite impressive. It clearly demonstrates the customer and market demand for hands-on experience and training with big data technologies. Our initial events have been over capacity, with packed houses numbering nearly 100 attendees in each city. The response represents the massive progress within the open source Apache Hadoop ecosystem in recent years, and the degree to which Hadoop has been embraced within the enterprise to handle big data storage and analytics workloads.

Pivotal Data Science Events in May

Pivotal Big Data Roadshow : Dallas/Fort Worth
Apr 30, 2015, Dallas/Fort Worth
Join data technology experts from Pivotal to get the latest perspective on how big data analytics and applications are transforming organizations across industries.

SIAM Data Mining
May 2, 2015, Vancouver Conference Center
An international conference on data mining.

Chicago Coder Conference
May 14 – 15, 2015,  Chicago Conference Center
A one and a half day conference with tracks on Java, .NET, Big Data and Mobile.

Pivotal Big Data Roadshow : Rio de Janeiro
May 19, 2015, Rio de Janeiro
Join data technology experts from Pivotal to get the latest perspective on how big data analytics and applications are transforming organizations across industries.

OpenGov Singapore
May 21, 2015, Singapore Conference Center
What does a smart city look like? Be a part of Singapore’s ‘ Intelligent Nation’ vision of a safer, more connected community for all Singaporeans. The investment in technology and the focus towards innovation is relentless within the ranks of the Singapore Government.

Pivotal Big Data Roadshow : São Paulo
May 21, 2015
Join data technology experts from Pivotal to get the latest perspective on how big data analytics and applications are transforming organizations across industries.

 

©2015 Pivotal Software, Inc. All rights reserved. Pivotal, GemFire and HAWQ are trademarks and/or registered trademarks of Pivotal Software, Inc. in the United States and/or other countrie. Apache, Apache Hadoop, Hadoop, Apache Hive and Apache Ambari are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Hortonworks and HDP are either registered trademarks or trademarks of Hortonworks Inc. in the United States and/or other countries.