Podcast

O'Reilly Data Show - O'Reilly Media Podcast

Big data and data science interviews, insight, and analysis.

Episodes

  • What data scientists and data engineers can do with current generation serverless technologies

    Apr 11 2019

    The O’Reilly Data Show Podcast: Avner Braverman on what’s missing from serverless today and what users should expect in the near future.In this episode of the Data Show, I spoke with Avner Braverman, co-founder and CEO of Binaris, a startup that aims to bring serverless to web-scale and enterprise applications. This conversation took place shortly after the release of a seminal paper from UC Berkeley (“Cloud Programming Simplified: A Berkeley View on Serverless Computing”), and this paper seeded...more

  • It’s time for data scientists to collaborate with researchers in other disciplines

    Mar 28 2019

    The O’Reilly Data Show Podcast: Forough Poursabzi Sangdeh on the interdisciplinary nature of interpretable and interactive machine learning.In this episode of the Data Show, I spoke with Forough Poursabzi-Sangdeh, a postdoctoral researcher at Microsoft Research New York City. Poursabzi works in the interdisciplinary area of interpretable and interactive machine learning. As models and algorithms become more widespread, many important considerations are becoming active research areas: fairness an...more

  • Algorithms are shaping our lives—here’s how we wrest back control

    Mar 14 2019

    The O’Reilly Data Show Podcast: Kartik Hosanagar on the growing power and sophistication of algorithms.In this episode of the Data Show, I spoke with Kartik Hosanagar, professor of technology and digital business, and professor of marketing at The Wharton School of the University of Pennsylvania.  Hosanagar is also the author of a newly released book, A Human’s Guide to Machine Intelligence, an interesting tour through the recent evolution of AI applications that draws from his extensive ex...more

  • Why your attention is like a piece of contested territory

    Feb 28 2019

    The O’Reilly Data Show Podcast: P.W. Singer on how social media has changed, war, politics, and business.In this episode of the Data Show, I spoke with P.W. Singer, strategist and senior fellow at the New America Foundation, and a contributing editor at Popular Science. He is co-author of an excellent new book, LikeWar: The Weaponization of Social Media, which explores how social media has changed war, politics, and business. The book is essential reading for anyone interested in how social medi...more

  • The technical, societal, and cultural challenges that come with the rise of fake media

    Feb 14 2019

    The O’Reilly Data Show Podcast: Siwei Lyu on machine learning for digital media forensics and image synthesis.In this episode of the Data Show, I spoke with Siwei Lyu, associate professor of computer science at the University at Albany, State University of New York. Lyu is a leading expert in digital media forensics, a field of research into tools and techniques for analyzing the authenticity of media files. Over the past year, there have been many stories written about the rise of tools for cre...more

  • Using machine learning and analytics to attract and retain employees

    Jan 31 2019

    The O’Reilly Data Show Podcast: Maryam Jahanshahi on building tools to help improve efficiency and fairness in how companies recruit.In this episode of the Data Show, I spoke with Maryam Jahanshahi, research scientist at TapRecruit, a startup that uses machine learning and analytics to help companies recruit more effectively. In an upcoming survey, we found that a “skills gap” or “lack of skilled people” was one of the main bottlenecks holding back adoption of AI technologies. Many companies are...more

  • How machine learning impacts information security

    Jan 17 2019

    The O’Reilly Data Show Podcast: Andrew Burt on the need to modernize data protection tools and strategies.In this episode of the Data Show, I spoke with Andrew Burt, chief privacy officer and legal engineer at Immuta, a company building data management tools tuned for data science. Burt and cybersecurity pioneer Daniel Geer recently released a must-read white paper (“Flat Light”) that provides a great framework for how to think about information security in the age of big data and AI. They list ...more

  • In the age of AI, fundamental value resides in data

    Jan 03 2019

    The O’Reilly Data Show Podcast: Haoyuan Li on accelerating analytic workloads, and innovation in data and AI in China.In this episode of the Data Show, I spoke with Haoyuan Li, CEO and founder of Alluxio, a startup commercializing the open source project with the same name (full disclosure: I’m an advisor to Alluxio). Our discussion focuses on the state of Alluxio (the open source project that has roots in UC Berkeley’s AMPLab), specifically emerging use cases here and in China. Given the large-...more

  • Trends in data, machine learning, and AI

    Dec 20 2018

    The O’Reilly Data Show Podcast: Ben Lorica looks ahead at what we can expect in 2019 in the big data landscape.For the end-of-year holiday episode of the Data Show, I turned the tables on Data Show host Ben Lorica to talk about trends in big data, machine learning, and AI, and what to look for in 2019. Lorica also showcased some highlights from our upcoming Strata Data and Artificial Intelligence conferences.Here are some highlights from our conversation: Real-world use cases for new technolo...more

  • Tools for generating deep neural networks with efficient network architectures

    Dec 06 2018

    The O’Reilly Data Show Podcast: Alex Wong on building human-in-the-loop automation solutions for enterprise machine learning.In this episode of the Data Show, I spoke with Alex Wong, associate professor at the University of Waterloo, and co-founder of DarwinAI, a startup that uses AI to address foundational challenges with deep learning in the enterprise. As the use of machine learning and analytics become more widespread, we’re beginning to see tools that enable data scientists and data enginee...more

  • Building tools for enterprise data science

    Nov 21 2018

    The O’Reilly Data Show Podcast: Vitaly Gordon on the rise of automation tools in data science.In this episode of the Data Show, I spoke with Vitaly Gordon, VP of data science and engineering at Salesforce. As the use of machine learning becomes more widespread, we need tools that will allow data scientists to scale so they can tackle many more problems and help many more people. We need automation tools for the many stages involved in data science, including data preparation, feature engineering...more

  • Lessons learned while helping enterprises adopt machine learning

    Nov 08 2018

    The O’Reilly Data Show Podcast: Francesca Lazzeri and Jaya Mathew on digital transformation, culture and organization, and the team data science process.In this episode of the Data Show, I spoke with Francesca Lazzeri, an AI and machine learning scientist at Microsoft, and her colleague Jaya Mathew, a senior data scientist at Microsoft. We conducted a couple of surveys this year—“How Companies Are Putting AI to Work Through Deep Learning” and “The State of Machine Learning Adoption in the Enterp...more

  • Machine learning on encrypted data

    Oct 25 2018

    The O’Reilly Data Show Podcast: Alon Kaufman on the interplay between machine learning, encryption, and security.In this episode of the Data Show, I spoke with Alon Kaufman, CEO and co-founder of Duality Technologies, a startup building tools that will allow companies to apply analytics and machine learning to encrypted data. In a recent talk, I described the importance of data, various methods for estimating the value of data, and emerging tools for incentivizing data sharing across organizatio...more

  • How social science research can inform the design of AI systems

    Oct 11 2018

    The O’Reilly Data Show Podcast: Jacob Ward on the interplay between psychology, decision-making, and AI systems.In this episode of the Data Show, I spoke with Jacob Ward, a Berggruen Fellow at Stanford University. Ward has an extensive background in journalism, mainly covering topics in science and technology, at National Geographic, Al Jazeera, Discovery Channel, BBC, Popular Science, and many other outlets. Most recently, he’s become interested in the interplay between research in psychology, ...more

  • Why it’s hard to design fair machine learning models

    Sep 27 2018

    The O’Reilly Data Show Podcast: Sharad Goel and Sam Corbett-Davies on the limitations of popular mathematical formalizations of fairness.In this episode of the Data Show, I spoke with Sharad Goel, assistant professor at Stanford, and his student Sam Corbett-Davies. They recently wrote a survey paper, “A Critical Review of Fair Machine Learning,” where they carefully examined the standard statistical tools used to check for fairness in machine learning models. It turns out that each of the standa...more

  • Using machine learning to improve dialog flow in conversational applications

    Sep 13 2018

    The O’Reilly Data Show Podcast: Alan Nichol on building a suite of open source tools for chatbot developers.In this episode of the Data Show, I spoke with Alan Nichol, co-founder and CTO of Rasa, a startup that builds open source tools to help developers and product teams build conversational applications. About 18 months ago, there was tremendous excitement and hype surrounding chatbots, and while things have quieted lately, companies and developers continue to refine and define tools for build...more

  • Building accessible tools for large-scale computation and machine learning

    Aug 30 2018

    The O’Reilly Data Show Podcast: Eric Jonas on Pywren, scientific computation, and machine learning.In this episode of the Data Show, I spoke with Eric Jonas, a postdoc in the new Berkeley Center for Computational Imaging. Jonas is also affiliated with UC Berkeley’s RISE Lab. It was at a RISE Lab event that he first announced Pywren, a framework that lets data enthusiasts proficient with Python run existing code at massive scale on Amazon Web Services. Jonas and his collaborators are working on a...more

  • Simplifying machine learning lifecycle management

    Aug 16 2018

    The O’Reilly Data Show Podcast: Harish Doddi on accelerating the path from prototype to production.In this episode of the Data Show, I spoke with Harish Doddi, co-founder and CEO of Datatron, a startup focused on helping companies deploy and manage machine learning models. As companies move from machine learning prototypes to products and services, tools and best practices for productionizing and managing models are just starting to emerge. Today’s data science and data engineering teams work wi...more

  • How privacy-preserving techniques can lead to more robust machine learning models

    Aug 02 2018

    The O’Reilly Data Show Podcast: Chang Liu on operations research, and the interplay between differential privacy and machine learning.In this episode of the Data Show, I spoke with Chang Liu, applied research scientist at Georgian Partners. In a previous post, I highlighted early tools for privacy-preserving analytics, both for improving decision-making (business intelligence and analytics) and for enabling automation (machine learning). One of the tools I mentioned is an open source project fo...more

  • Specialized hardware for deep learning will unleash innovation

    Jul 19 2018

    The O’Reilly Data Show Podcast: Andrew Feldman on why deep learning is ushering a golden age for compute architecture.In this episode of the Data Show, I spoke with Andrew Feldman, founder and CEO of Cerebras Systems, a startup in the blossoming area of specialized hardware for machine learning. Since the release of AlexNet in 2012, we have seen an explosion in activity in machine learning, particularly in deep learning. A lot of the work to date happened primarily on general purpose hardware (C...more

  • Data regulations and privacy discussions are still in the early stages

    Jul 05 2018

    The O’Reilly Data Show Podcast: Aurélie Pols on GDPR, ethics, and ePrivacy.In this episode of the Data Show, I spoke with Aurélie Pols of Mind Your Privacy, one of my go-to resources when it comes to data privacy and data ethics. This interview took place at Strata Data London, a couple of days before the EU General Data Protection Regulation (GDPR) took effect. I wanted her perspective on this landmark regulation, as well as her take on trends in data privacy and growing interest in ethics amon...more

  • Managing risk in machine learning models

    Jun 21 2018

    The O’Reilly Data Show Podcast: Andrew Burt and Steven Touw on how companies can manage models they cannot fully explain.In this episode of the Data Show, I spoke with Andrew Burt, chief privacy officer at Immuta, and Steven Touw, co-founder and CTO of Immuta. Burt recently co-authored a white paper on managing risk in machine learning models, and I wanted to sit down with them to discuss some of the proposals they put forward to organizations that are deploying machine learning. Some high-pr...more

  • The real value of data requires a holistic view of the end-to-end data pipeline

    Jun 07 2018

    The O’Reilly Data Show Podcast: Ashok Srivastava on the emergence of machine learning and AI for enterprise applications.In this episode of the Data Show, I spoke with Ashok Srivastava, senior vice president and chief data officer at Intuit. He has a strong science and engineering background, combined with years of applying machine learning and data science in industry. Prior to joining Intuit, he led the teams responsible for data and artificial intelligence products at Verizon. I wanted his pe...more

  • The evolution of data science, data engineering, and AI

    May 24 2018

    The O’Reilly Data Show Podcast: A special episode to mark the 100th episode.This episode of the Data Show marks our 100th episode. This podcast stemmed out of video interviews conducted at O’Reilly’s 2014 Foo Camp. We had a collection of friends who were key members of the data science and big data communities on hand and we decided to record short conversations with them. We originally conceived of using those initial conversations to be the basis of a regular series of video interviews. The lo...more

  • Companies in China are moving quickly to embrace AI technologies

    May 10 2018

    The O’Reilly Data Show Podcast: Jason Dai on the first year of BigDL and AI in China.In this episode of the Data Show, I spoke with Jason Dai, CTO of Big Data Technologies at Intel, and one of my co-chairs for the AI Conference in Beijing. I wanted to check in on the status of BigDL, specifically how companies have been using this deep learning library on top of Apache Spark, and discuss some newly added features. It turns out there are quite a number of companies already using BigDL in producti...more

  • Teaching and implementing data science and AI in the enterprise

    Apr 26 2018

    The O’Reilly Data Show Podcast: Jerry Overton on organizing data teams, agile experimentation, and the importance of ethics in data science.In this episode of the Data Show, I spoke with Jerry Overton, senior principal and distinguished technologist at DXC Technology. I wanted the perspective of someone who works across industries and with a variety of companies. I specifically wanted to explore the current state of data science and AI within companies and public sector agencies. As much as we t...more

  • The importance of transparency and user control in machine learning

    Apr 12 2018

    The O’Reilly Data Show Podcast: Guillaume Chaslot on bias and extremism in content recommendations.In this episode of the Data Show, I spoke with Guillaume Chaslot, an ex-YouTube engineer and founder of AlgoTransparency, an organization dedicated to helping the public understand the profound impact algorithms have on our lives. We live in an age when many of our interactions with companies and services are governed by algorithms. At a time when their impact continues to grow, there are many sett...more

  • What machine learning engineers need to know

    Mar 29 2018

    The O’Reilly Data Show Podcast: Jesse Anderson and Paco Nathan on organizing data teams and next-generation messaging with Apache Pulsar.In this episode of the Data Show, I spoke with Jesse Anderson, managing director of the Big Data Institute, and my colleague Paco Nathan, who recently became co-chair of Jupytercon. This conversation grew out of a recent email thread the three of us had on machine learning engineers, a new job role that LinkedIn recently pegged as the fastest growing job i...more

  • How to train and deploy deep learning at scale

    Mar 15 2018

    The O’Reilly Data Show Podcast: Ameet Talwalkar on large-scale machine learning.In this episode of the Data Show, I spoke with Ameet Talwalkar, assistant professor of machine learning at CMU and co-founder of Determined AI. He was an early and key contributor to Spark MLlib and a member of AMPLab. Most recently, he helped conceive and organize the first edition of SysML, a new academic conference at the intersection of systems and machine learning (ML). We discussed using and deploying deep l...more

  • Using machine learning to monitor and optimize chatbots

    Mar 06 2018

    The O’Reilly Data Show Podcast: Ofer Ronen on the current state of chatbots.In this episode of the Data Show, I spoke with Ofer Ronen, GM of Chatbase, a startup housed within Google’s Area 120. With tools for building chatbots becoming accessible, conversational interfaces are becoming more prevalent. As Ronen highlights in our conversation, chatbots are already enabling companies to automate many routine tasks (mainly in customer interaction). We are still in the early days of chatbots, but if ...more

  • Unleashing the potential of reinforcement learning

    Mar 01 2018

    The O’Reilly Data Show Podcast: Danny Lange on how reinforcement learning can accelerate software development and how it can be democratized.In this episode of the Data Show, I spoke with Danny Lange, VP of AI and machine learning at Unity Technologies. Lange previously led data and machine learning teams at Microsoft, Amazon, and Uber, where his teams were responsible for building data science tools used by other developers and analysts within those companies. When I first heard that he was mov...more

  • Graphs as the front end for machine learning

    Feb 15 2018

    The O’Reilly Data Show Podcast: Leo Meyerovich on building large-scale, interactive applications that enable visual investigations.In this episode of the Data Show, I spoke with Leo Meyerovich, co-founder and CEO of Graphistry. Graphs have always been part of the big data revolution (think of the large graphs generated by the early social media startups). In recent months, I’ve come across companies releasing and using new tools for creating, storing, and (most importantly) analyzing large graph...more

  • Machine learning needs machine teaching

    Feb 01 2018

    The O’Reilly Data Show Podcast: Mark Hammond on applications of reinforcement learning to manufacturing and industrial automation.In this episode of the Data Show, I spoke with Mark Hammond, founder and CEO of Bonsai, a startup at the forefront of developing AI systems in industrial settings. While many articles have been written about developments in computer vision, speech recognition, and autonomous vehicles, I’m particularly excited about near-term applications of AI to manufacturing, roboti...more

  • How machine learning can be used to write more secure computer programs

    Jan 18 2018

    The O’Reilly Data Show Podcast: Fabian Yamaguchi on the potential of using large-scale analytics on graph representations of code.In this episode of the Data Show, I spoke with Fabian Yamaguchi, chief scientist at ShiftLeft. His 2015 Ph.D. dissertation sketched out how the combination of static analysis, graph mining, and machine learning, can be used to develop tools to augment security analysts. In a recent post, I argued for machine learning tools to augment teams responsible for deploying an...more

  • Bringing AI into the enterprise

    Jan 04 2018

    The O’Reilly Data Show Podcast: Kris Hammond on business applications of AI technologies and educating future AI specialists.In this episode of the Data Show, I spoke with Kristian Hammond, chief scientist of Narrative Science and professor of EECS at Northwestern University. He has been at the forefront of helping companies understand the power, limitations, and disruptive potential of AI technologies and tools. In a previous post on machine learning, I listed types of uses cases (a taxonomy) f...more

  • How machine learning will accelerate data management systems

    Dec 21 2017

    The O’Reilly Data Show Podcast: Tim Kraska on why ML will change how we build core algorithms and data structures.In this episode of the Data Show, I spoke with Tim Kraska, associate professor of computer science at MIT. To take advantage of big data, we need scalable, fast, and efficient data management systems. Database administrators and users often find themselves tasked with building index structures (“indexes” in database parlance), which are needed to speed up data access.Some common exam...more

  • Machine learning at Spotify: You are what you stream

    Dec 07 2017

    The O’Reilly Data Show Podcast: Christine Hung on using data to drive digital transformation and recommenders that increase user engagement.In this episode of the Data Show, I spoke with Christine Hung, head of data solutions at Spotify. Prior to joining Spotify, she led data teams at the NY Times and at Apple (iTunes). Having led teams at three different companies, I wanted to hear her thoughts on digital transformation, and I wanted to know how she approaches the challenge of building, managin...more

  • The current state of Apache Kafka

    Nov 22 2017

    The O’Reilly Data Show Podcast: Neha Narkhede on data integration, microservices, and Kafka’s roadmap.In this episode of the Data Show, I spoke with Neha Narkhede, co-founder and CTO of Confluent. As I noted in a recent post on “the age of machine learning,” data integration and data enrichment are non-trivial and ongoing challenges for most companies. Getting data ready for analytics—including machine learning—remains an area of focus for most companies. It turns out, “data lakes” have become s...more

  • Building a natural language processing library for Apache Spark

    Nov 09 2017

    The O’Reilly Data Show Podcast: David Talby on a new NLP library for Spark, and why model development starts after a model gets deployed to production.When I first discovered and started using Apache Spark, a majority of the use cases I used it for involved unstructured text. The absence of libraries meant rolling my own NLP utilities, and, in many cases, implementing a machine learning library (this was pre deep learning, and MLlib was much smaller). I’d always wondered why no one bothered to c...more

  • Machine intelligence for content distribution, logistics, smarter cities, and more

    Oct 26 2017

    The O’Reilly Data Show Podcast: Rhea Liu on technology trends in China.In this episode of the Data Show, I spoke with Rhea Liu, analyst at China Tech Insights, a new research firm that is part of Tencent’s Online Media Group. If there’s one place where AI and machine learning are discussed even more than the San Francisco Bay Area, that would be China. Each time I go to China, there are new applications that weren’t widely available just the year before. This year, it was impossible to miss bike...more

  • Vehicle-to-vehicle communication networks can help fuel smart cities

    Oct 12 2017

    The O’Reilly Data Show Podcast: Bruno Fernandez-Ruiz on the importance of building the ground control center of the future.In this episode of the Data Show, I spoke with Bruno Fernandez-Ruiz, co-founder and CTO of Nexar. We first met when he was leading Yahoo! technical teams charged with delivering a variety of large-scale, real-time data products. His new company is helping build out critical infrastructure for the emerging transportation sector.While some question whether V2X communication is...more

  • Transforming organizations through analytics centers of excellence

    Sep 28 2017

    The O’Reilly Data Show Podcast: Carme Artigas on helping enterprises transform themselves with big data tools and technologies.In this episode of the Data Show, I spoke with Carme Artigas, co-founder and CEO of Synergic Partners (a Telefonica company). As more companies adopt big data technologies and techniques, it’s useful to remember that the end goal is to extract information and insight. In fact, as with any collection of tools and technologies, the main challenge is identifying and priorit...more

  • The state of machine learning in Apache Spark

    Sep 14 2017

    The O’Reilly Data Show Podcast: Ion Stoica and Matei Zaharia explore the rich ecosystem of analytic tools around Apache Spark.In this episode of the Data Show, we look back to a recent conversation I had at the Spark Summit in San Francisco with Ion Stoica (UC Berkeley professor and executive chairman of Databricks) and Matei Zaharia (assistant professor at Stanford and chief technologist of Databricks). Stoica and Zaharia were core members of UC Berkeley’s AMPLab, which originated Apache Spark,...more

  • Effective mechanisms for searching the space of machine learning algorithms

    Aug 31 2017

    The O’Reilly Data Show Podcast: Kenneth Stanley on neuroevolution and other principled ways of exploring the world without an objective.In this episode of the Data Show, I spoke with Ken Stanley, founding member of Uber AI Labs and associate professor at the University of Central Florida. Stanley is an AI researcher and a leading pioneer in the field of neuroevolution—a method for evolving and learning neural networks through evolutionary algorithms. In a recent survey article, Stanley went thro...more

  • How Ray makes continuous learning accessible and easy to scale

    Aug 17 2017

    The O’Reilly Data Show Podcast: Robert Nishihara and Philipp Moritz on a new framework for reinforcement learning and AI applications.In this episode of the Data Show, I spoke with Robert Nishihara and Philipp Moritz, graduate students at UC Berkeley and members of RISE Lab. I wanted to get an update on Ray, an open source distributed execution framework that makes it easy for machine learning engineers and data scientists to scale reinforcement learning and other related continuous learning alg...more

  • Why AI and machine learning researchers are beginning to embrace PyTorch

    Aug 03 2017

    The O’Reilly Data Show Podcast: Soumith Chintala on building a worthy successor to Torch and on deep learning within Facebook.In this episode of the Data Show, I spoke with Soumith Chintala, AI research engineer at Facebook. Among his many research projects, Chintala was part of the team behind DCGAN (Deep Convolutional Generative Adversarial Networks), a widely cited paper that introduced a set of neural network architectures for unsupervised learning. Our conversation centered around PyTorch, ...more

  • How big data and AI will reshape the automotive industry

    Jul 20 2017

    The O’Reilly Data Show Podcast: Evangelos Simoudis on next-generation mobility services.In this episode of the Data Show, I spoke with Evangelos Simoudis, co-founder of Synapse Partners and a frequent contributor to O’Reilly. He recently published a book entitled The Big Data Opportunity in Our Driverless Future, and I wanted get his thoughts on the transportation industry and the role of big data and analytics in its future. Simoudis is an entrepreneur, and he also advises and invests in many t...more

  • A framework for building and evaluating data products

    Jul 06 2017

    The O’Reilly Data Show Podcast: Pinterest data scientist Grace Huang on lessons learned in the course of machine learning product launches.In this episode of the Data Show, I spoke with Grace Huang, data science lead at Pinterest. With its combination of a large social graph, enthusiastic users, and multimedia data, I’ve long regarded Pinterest as a fascinating lab for data science. Huang described the challenge of building a sustainable content ecosystem and shared lessons from the front lines ...more

  • Building a next-generation platform for deep learning

    Jun 29 2017

    The O’Reilly Data Show Podcast: Naveen Rao on emerging hardware and software infrastructure for AI.In this episode of the Data Show, I speak with Naveen Rao, VP and GM of the Artificial Intelligence Products Group at Intel. In an earlier episode, we learned that scaling current deep learning models requires innovations in both software and hardware. Through his startup Nervana (since acquired by Intel), Rao has been at the forefront of building a next generation platform for deep learning and AI...more

  • A scalable time-series database that supports SQL

    Jun 22 2017

    The O’Reilly Data Show Podcast: Michael Freedman on TimescaleDB and scaling SQL for time-series.In this episode of the Data Show, I spoke with Michael Freedman, CTO of Timescale and professor of computer science at Princeton University. When I first heard that Freedman and his collaborators were building a time-series database, my immediate reaction was: “Don’t we have enough options already?” The early incarnation of Timescale was a startup focused on IoT, and it was while building tools for th...more

  • Programming collective intelligence for financial trading

    Jun 15 2017

    The O’Reilly Data Show Podcast: Geoffrey Bradway on building a trading system that synthesizes many different models.In this episode of the Data Show, I spoke with Geoffrey Bradway, VP of engineering at Numerai, a new hedge fund that relies on contributions of external data scientists. The company hosts regular competitions where data scientists submit machine learning models for classification tasks. The most promising submissions are then added to an ensemble of models that the company uses to...more

  • Creating large training data sets quickly

    Jun 08 2017

    The O’Reilly Data Show Podcast: Alex Ratner on why weak supervision is the key to unlocking dark data.In this episode of the Data Show, I spoke with Alex Ratner, a graduate student at Stanford and a member of Christopher Ré’s Hazy research group. Training data has always been important in building machine learning algorithms, and the rise of data-hungry deep learning models has heightened the need for labeled data sets. In fact, the challenge of creating training data is ongoing for many compani...more

  • Data science and deep learning in retail

    May 25 2017

    The O’Reilly Data Show Podcast: Jeremy Stanley on hiring and leading machine learning engineers to build world-class data products.In this episode of the Data Show, I spoke with Jeremy Stanley, VP of data science at Instacart, a popular grocery delivery service that is expanding rapidly. As Stanley describes it, Instacart operates a four-sided marketplace comprised of retail stores, products within the stores, shoppers assigned to the stores, and customers who order from Instacart. The objective...more

  • Language understanding remains one of AI’s grand challenges

    May 11 2017

    The O’Reilly Data Show Podcast: David Ferrucci on the evolution of AI systems for language understanding.In this episode of the Data Show, I spoke with David Ferrucci, founder of Elemental Cognition and senior technologist at Bridgewater Associates. Ferrucci served as principal investigator of IBM’s DeepQA project and led the Watson team that became champion of the Jeopardy! quiz show. Elemental Cognition (EC) is a research group focused on building an AI system that will be equipped with state-...more

  • Data preparation in the age of deep learning

    May 04 2017

    The O’Reilly Data Show Podcast: Lukas Biewald on why companies are spending millions of dollars on labeled data sets.In this episode of the Data Show, I spoke with Lukas Biewald, co-founder and chief data scientist at CrowdFlower. In a previous episode we covered how the rise of deep learning is fueling the need for large labeled data sets and high-performance computing systems. CrowdFlower has a service that many leading companies have come to rely on to provide them with labeled data sets to t...more

  • Scaling machine learning

    Apr 20 2017

    The O’Reilly Data Show Podcast: Reza Zadeh on deep learning, hardware/software interfaces, and why computer vision is so exciting.In this episode of the Data Show, I spoke with Reza Zadeh, adjunct professor at Stanford University, co-organizer of ScaledML, and co-founder of Matroid, a startup focused on commercial applications of deep learning and computer vision. Zadeh also is the co-author of the forthcoming book TensorFlow for Deep Learning (now in early release). Our conversation took place ...more

  • Architecting and building end-to-end streaming applications

    Apr 06 2017

    The O’Reilly Data Show Podcast: Karthik Ramasamy on Heron, DistributedLog, and designing real-time applications.In this episode of the Data Show, I spoke with Karthik Ramasamy, adjunct faculty member at UC Berkeley, former engineering manager at Twitter, and co-founder of Streamlio. Ramasamy managed the team that built Heron, an open source, distributed stream processing engine, compatible with Apache Storm.  While Ramasamy has seen firsthand what it takes to build and deploy large-scale di...more

  • Becoming a machine learning engineer

    Mar 30 2017

    The O’Reilly Data Show Podcast: Aurélien Géron on enabling companies to use machine learning in real-world products.In this episode of the Data Show, I spoke with Aurélien Géron, a serial entrepreneur, data scientist, and author of a popular, new book entitled Hands-on Machine Learning with Scikit-Learn and TensorFlow. Géron’s book is aimed at software engineers who want to learn machine learning and start deploying machine learning models in real-world products. As more companies adopt big data...more

  • Natural language analysis using Hierarchical Temporal Memory

    Mar 23 2017

    The O’Reilly Data Show Podcast: Francisco Webber on building HTM-based enterprise applications.In this episode of the Data Show, I spoke with Francisco Webber, founder of Cortical.io, a startup that is applying tools based on Hierarchical Temporal Memory (HTM) to natural language understanding. While HTM has been around for more than a decade, there aren’t many companies that have released products based on it (at least compared to other machine learning methods). Numenta, an organization develo...more

  • Saving the world—or at least the world’s scientific and government data

    Mar 14 2017

    The O’Reilly Data Show Podcast: Max Ogden on data preservation, distributed trust, and bringing cutting-edge technology to journalism.In this special episode of the Data Show, O'Reilly's Jenn Webb speaks with Maxwell Ogden, director of Code for Science and Society. Recently, Ogden and Code for Science have been working on the ongoing rescue of data.gov and assisting with other data rescue projects, such as Data Refuge; they’re also the nonprofit developers supporting Dat, a data versioning and d...more