Scalable Data Pipelines for Mastering & Integration - an ML Approach

Originally aired:

About the Session

Integrating multiple and diverse datasets for analytics are an essential part of a data scientist's life. This is an essential part of the analytics journey, as feature engineering on dirty data will only be faulty. However, current tools do not make the process simpler. There is a wide variety of data attributes and formats to take care of. Preparing for analytics by matching and deduplicating records remains a challenge. Unifying matching records into a definite representation of an entity is both time consuming and error prone. Hence, preparing data for predictive analytics requires manual effort and occupies upto 60-70% of a data scientist's time.

In this talk, we discuss how data engineers and scientists can augment their data preparation by leveraging machine learning. We talk about schema mapping, identifying attributes on disparate data sources which refer to the same values. We discuss data mastering and how it is different from a typical clustering and classification problem. We also elaborate about scaling these approaches, and how machine learning can help.

Come see how ML can be leveraged for data preparation for analytics.

See Highlights

Hear What Attendees Say


“Once again Saltmarch has knocked it out of the park with interesting speakers, engaging content and challenging ideas. No jetlag fog at all, which counts for how interesting the whole thing was."

Cybersecurity Lead, PwC


“Very much looking forward to next year. I will be keeping my eye out for the date so I can make sure I lock it in my calendar."

Software Engineering Specialist, Intuit


“Best conference I have ever been to with lots of insights and information on next generation technologies and those that are the need of the hour."

Software Architect, GroupOn

Hear What Speakers & Sponsors Say

Scott Davis

“Happy to meet everyone who came from near and far. Glad to know you've discovered some great lessons here, and glad you joined us for all the discoveries great and small."

Web Architect & Principal Engineer, Scott Davis

Dr. Venkat Subramaniam

“Wonderful set of conferences, well organized, fantastic speakers, and an amazingly interactive set of audience. Thanks for having me at the events!"

Founder of Agile Developer Inc., Dr. Venkat Subramaniam

Oracle Corp.

“What a buzz! The events have been instrumental in bringing the whole software community together. There has been something for everyone from developers to architects to business to vendors. Thanks everyone!"

Voltaire Yap, Global Events Manager, Oracle Corp.