Podcast

O'Reilly Data Show - O'Reilly Media Podcast

Big data and data science interviews, insight, and analysis.

Episodes

  • Tools for generating deep neural networks with efficient network architectures

    Dec 06 2018

    The O’Reilly Data Show Podcast: Alex Wong on building human-in-the-loop automation solutions for enterprise machine learning.In this episode of the Data Show, I spoke with Alex Wong, associate professor at the University of Waterloo, and co-founder of DarwinAI, a startup that uses AI to address foundational challenges with deep learning in the enterprise. As the use of machine learning and analytics become more widespread, we’re beginning to see tools that enable data scientists and data enginee...more

  • Building tools for enterprise data science

    Nov 21 2018

    The O’Reilly Data Show Podcast: Vitaly Gordon on the rise of automation tools in data science.In this episode of the Data Show, I spoke with Vitaly Gordon, VP of data science and engineering at Salesforce. As the use of machine learning becomes more widespread, we need tools that will allow data scientists to scale so they can tackle many more problems and help many more people. We need automation tools for the many stages involved in data science, including data preparation, feature engineering...more

  • Lessons learned while helping enterprises adopt machine learning

    Nov 08 2018

    The O’Reilly Data Show Podcast: Francesca Lazzeri and Jaya Mathew on digital transformation, culture and organization, and the team data science process.In this episode of the Data Show, I spoke with Francesca Lazzeri, an AI and machine learning scientist at Microsoft, and her colleague Jaya Mathew, a senior data scientist at Microsoft. We conducted a couple of surveys this year—“How Companies Are Putting AI to Work Through Deep Learning” and “The State of Machine Learning Adoption in the Enterp...more

  • Machine learning on encrypted data

    Oct 25 2018

    The O’Reilly Data Show Podcast: Alon Kaufman on the interplay between machine learning, encryption, and security.In this episode of the Data Show, I spoke with Alon Kaufman, CEO and co-founder of Duality Technologies, a startup building tools that will allow companies to apply analytics and machine learning to encrypted data. In a recent talk, I described the importance of data, various methods for estimating the value of data, and emerging tools for incentivizing data sharing across organizatio...more

  • How social science research can inform the design of AI systems

    Oct 11 2018

    The O’Reilly Data Show Podcast: Jacob Ward on the interplay between psychology, decision-making, and AI systems.In this episode of the Data Show, I spoke with Jacob Ward, a Berggruen Fellow at Stanford University. Ward has an extensive background in journalism, mainly covering topics in science and technology, at National Geographic, Al Jazeera, Discovery Channel, BBC, Popular Science, and many other outlets. Most recently, he’s become interested in the interplay between research in psychology, ...more

  • Why it’s hard to design fair machine learning models

    Sep 27 2018

    The O’Reilly Data Show Podcast: Sharad Goel and Sam Corbett-Davies on the limitations of popular mathematical formalizations of fairness.In this episode of the Data Show, I spoke with Sharad Goel, assistant professor at Stanford, and his student Sam Corbett-Davies. They recently wrote a survey paper, “A Critical Review of Fair Machine Learning,” where they carefully examined the standard statistical tools used to check for fairness in machine learning models. It turns out that each of the standa...more

  • Using machine learning to improve dialog flow in conversational applications

    Sep 13 2018

    The O’Reilly Data Show Podcast: Alan Nichol on building a suite of open source tools for chatbot developers.In this episode of the Data Show, I spoke with Alan Nichol, co-founder and CTO of Rasa, a startup that builds open source tools to help developers and product teams build conversational applications. About 18 months ago, there was tremendous excitement and hype surrounding chatbots, and while things have quieted lately, companies and developers continue to refine and define tools for build...more

  • Building accessible tools for large-scale computation and machine learning

    Aug 30 2018

    The O’Reilly Data Show Podcast: Eric Jonas on Pywren, scientific computation, and machine learning.In this episode of the Data Show, I spoke with Eric Jonas, a postdoc in the new Berkeley Center for Computational Imaging. Jonas is also affiliated with UC Berkeley’s RISE Lab. It was at a RISE Lab event that he first announced Pywren, a framework that lets data enthusiasts proficient with Python run existing code at massive scale on Amazon Web Services. Jonas and his collaborators are working on a...more

  • Simplifying machine learning lifecycle management

    Aug 16 2018

    The O’Reilly Data Show Podcast: Harish Doddi on accelerating the path from prototype to production.In this episode of the Data Show, I spoke with Harish Doddi, co-founder and CEO of Datatron, a startup focused on helping companies deploy and manage machine learning models. As companies move from machine learning prototypes to products and services, tools and best practices for productionizing and managing models are just starting to emerge. Today’s data science and data engineering teams work wi...more

  • How privacy-preserving techniques can lead to more robust machine learning models

    Aug 02 2018

    The O’Reilly Data Show Podcast: Chang Liu on operations research, and the interplay between differential privacy and machine learning.In this episode of the Data Show, I spoke with Chang Liu, applied research scientist at Georgian Partners. In a previous post, I highlighted early tools for privacy-preserving analytics, both for improving decision-making (business intelligence and analytics) and for enabling automation (machine learning). One of the tools I mentioned is an open source project fo...more

  • Specialized hardware for deep learning will unleash innovation

    Jul 19 2018

    The O’Reilly Data Show Podcast: Andrew Feldman on why deep learning is ushering a golden age for compute architecture.In this episode of the Data Show, I spoke with Andrew Feldman, founder and CEO of Cerebras Systems, a startup in the blossoming area of specialized hardware for machine learning. Since the release of AlexNet in 2012, we have seen an explosion in activity in machine learning, particularly in deep learning. A lot of the work to date happened primarily on general purpose hardware (C...more

  • Data regulations and privacy discussions are still in the early stages

    Jul 05 2018

    The O’Reilly Data Show Podcast: Aurélie Pols on GDPR, ethics, and ePrivacy.In this episode of the Data Show, I spoke with Aurélie Pols of Mind Your Privacy, one of my go-to resources when it comes to data privacy and data ethics. This interview took place at Strata Data London, a couple of days before the EU General Data Protection Regulation (GDPR) took effect. I wanted her perspective on this landmark regulation, as well as her take on trends in data privacy and growing interest in ethics amon...more

  • Managing risk in machine learning models

    Jun 21 2018

    The O’Reilly Data Show Podcast: Andrew Burt and Steven Touw on how companies can manage models they cannot fully explain.In this episode of the Data Show, I spoke with Andrew Burt, chief privacy officer at Immuta, and Steven Touw, co-founder and CTO of Immuta. Burt recently co-authored a white paper on managing risk in machine learning models, and I wanted to sit down with them to discuss some of the proposals they put forward to organizations that are deploying machine learning. Some high-pr...more

  • The real value of data requires a holistic view of the end-to-end data pipeline

    Jun 07 2018

    The O’Reilly Data Show Podcast: Ashok Srivastava on the emergence of machine learning and AI for enterprise applications.In this episode of the Data Show, I spoke with Ashok Srivastava, senior vice president and chief data officer at Intuit. He has a strong science and engineering background, combined with years of applying machine learning and data science in industry. Prior to joining Intuit, he led the teams responsible for data and artificial intelligence products at Verizon. I wanted his pe...more

  • The evolution of data science, data engineering, and AI

    May 24 2018

    The O’Reilly Data Show Podcast: A special episode to mark the 100th episode.This episode of the Data Show marks our 100th episode. This podcast stemmed out of video interviews conducted at O’Reilly’s 2014 Foo Camp. We had a collection of friends who were key members of the data science and big data communities on hand and we decided to record short conversations with them. We originally conceived of using those initial conversations to be the basis of a regular series of video interviews. The lo...more

  • Companies in China are moving quickly to embrace AI technologies

    May 10 2018

    The O’Reilly Data Show Podcast: Jason Dai on the first year of BigDL and AI in China.In this episode of the Data Show, I spoke with Jason Dai, CTO of Big Data Technologies at Intel, and one of my co-chairs for the AI Conference in Beijing. I wanted to check in on the status of BigDL, specifically how companies have been using this deep learning library on top of Apache Spark, and discuss some newly added features. It turns out there are quite a number of companies already using BigDL in producti...more

  • Teaching and implementing data science and AI in the enterprise

    Apr 26 2018

    The O’Reilly Data Show Podcast: Jerry Overton on organizing data teams, agile experimentation, and the importance of ethics in data science.In this episode of the Data Show, I spoke with Jerry Overton, senior principal and distinguished technologist at DXC Technology. I wanted the perspective of someone who works across industries and with a variety of companies. I specifically wanted to explore the current state of data science and AI within companies and public sector agencies. As much as we t...more

  • The importance of transparency and user control in machine learning

    Apr 12 2018

    The O’Reilly Data Show Podcast: Guillaume Chaslot on bias and extremism in content recommendations.In this episode of the Data Show, I spoke with Guillaume Chaslot, an ex-YouTube engineer and founder of AlgoTransparency, an organization dedicated to helping the public understand the profound impact algorithms have on our lives. We live in an age when many of our interactions with companies and services are governed by algorithms. At a time when their impact continues to grow, there are many sett...more

  • What machine learning engineers need to know

    Mar 29 2018

    The O’Reilly Data Show Podcast: Jesse Anderson and Paco Nathan on organizing data teams and next-generation messaging with Apache Pulsar.In this episode of the Data Show, I spoke with Jesse Anderson, managing director of the Big Data Institute, and my colleague Paco Nathan, who recently became co-chair of Jupytercon. This conversation grew out of a recent email thread the three of us had on machine learning engineers, a new job role that LinkedIn recently pegged as the fastest growing job i...more

  • How to train and deploy deep learning at scale

    Mar 15 2018

    The O’Reilly Data Show Podcast: Ameet Talwalkar on large-scale machine learning.In this episode of the Data Show, I spoke with Ameet Talwalkar, assistant professor of machine learning at CMU and co-founder of Determined AI. He was an early and key contributor to Spark MLlib and a member of AMPLab. Most recently, he helped conceive and organize the first edition of SysML, a new academic conference at the intersection of systems and machine learning (ML). We discussed using and deploying deep l...more

  • Using machine learning to monitor and optimize chatbots

    Mar 06 2018

    The O’Reilly Data Show Podcast: Ofer Ronen on the current state of chatbots.In this episode of the Data Show, I spoke with Ofer Ronen, GM of Chatbase, a startup housed within Google’s Area 120. With tools for building chatbots becoming accessible, conversational interfaces are becoming more prevalent. As Ronen highlights in our conversation, chatbots are already enabling companies to automate many routine tasks (mainly in customer interaction). We are still in the early days of chatbots, but if ...more

  • Unleashing the potential of reinforcement learning

    Mar 01 2018

    The O’Reilly Data Show Podcast: Danny Lange on how reinforcement learning can accelerate software development and how it can be democratized.In this episode of the Data Show, I spoke with Danny Lange, VP of AI and machine learning at Unity Technologies. Lange previously led data and machine learning teams at Microsoft, Amazon, and Uber, where his teams were responsible for building data science tools used by other developers and analysts within those companies. When I first heard that he was mov...more

  • Graphs as the front end for machine learning

    Feb 15 2018

    The O’Reilly Data Show Podcast: Leo Meyerovich on building large-scale, interactive applications that enable visual investigations.In this episode of the Data Show, I spoke with Leo Meyerovich, co-founder and CEO of Graphistry. Graphs have always been part of the big data revolution (think of the large graphs generated by the early social media startups). In recent months, I’ve come across companies releasing and using new tools for creating, storing, and (most importantly) analyzing large graph...more

  • Machine learning needs machine teaching

    Feb 01 2018

    The O’Reilly Data Show Podcast: Mark Hammond on applications of reinforcement learning to manufacturing and industrial automation.In this episode of the Data Show, I spoke with Mark Hammond, founder and CEO of Bonsai, a startup at the forefront of developing AI systems in industrial settings. While many articles have been written about developments in computer vision, speech recognition, and autonomous vehicles, I’m particularly excited about near-term applications of AI to manufacturing, roboti...more

  • How machine learning can be used to write more secure computer programs

    Jan 18 2018

    The O’Reilly Data Show Podcast: Fabian Yamaguchi on the potential of using large-scale analytics on graph representations of code.In this episode of the Data Show, I spoke with Fabian Yamaguchi, chief scientist at ShiftLeft. His 2015 Ph.D. dissertation sketched out how the combination of static analysis, graph mining, and machine learning, can be used to develop tools to augment security analysts. In a recent post, I argued for machine learning tools to augment teams responsible for deploying an...more

  • Bringing AI into the enterprise

    Jan 04 2018

    The O’Reilly Data Show Podcast: Kris Hammond on business applications of AI technologies and educating future AI specialists.In this episode of the Data Show, I spoke with Kristian Hammond, chief scientist of Narrative Science and professor of EECS at Northwestern University. He has been at the forefront of helping companies understand the power, limitations, and disruptive potential of AI technologies and tools. In a previous post on machine learning, I listed types of uses cases (a taxonomy) f...more

  • How machine learning will accelerate data management systems

    Dec 21 2017

    The O’Reilly Data Show Podcast: Tim Kraska on why ML will change how we build core algorithms and data structures.In this episode of the Data Show, I spoke with Tim Kraska, associate professor of computer science at MIT. To take advantage of big data, we need scalable, fast, and efficient data management systems. Database administrators and users often find themselves tasked with building index structures (“indexes” in database parlance), which are needed to speed up data access.Some common exam...more

  • Machine learning at Spotify: You are what you stream

    Dec 07 2017

    The O’Reilly Data Show Podcast: Christine Hung on using data to drive digital transformation and recommenders that increase user engagement.In this episode of the Data Show, I spoke with Christine Hung, head of data solutions at Spotify. Prior to joining Spotify, she led data teams at the NY Times and at Apple (iTunes). Having led teams at three different companies, I wanted to hear her thoughts on digital transformation, and I wanted to know how she approaches the challenge of building, managin...more

  • The current state of Apache Kafka

    Nov 22 2017

    The O’Reilly Data Show Podcast: Neha Narkhede on data integration, microservices, and Kafka’s roadmap.In this episode of the Data Show, I spoke with Neha Narkhede, co-founder and CTO of Confluent. As I noted in a recent post on “the age of machine learning,” data integration and data enrichment are non-trivial and ongoing challenges for most companies. Getting data ready for analytics—including machine learning—remains an area of focus for most companies. It turns out, “data lakes” have become s...more

  • Building a natural language processing library for Apache Spark

    Nov 09 2017

    The O’Reilly Data Show Podcast: David Talby on a new NLP library for Spark, and why model development starts after a model gets deployed to production.When I first discovered and started using Apache Spark, a majority of the use cases I used it for involved unstructured text. The absence of libraries meant rolling my own NLP utilities, and, in many cases, implementing a machine learning library (this was pre deep learning, and MLlib was much smaller). I’d always wondered why no one bothered to c...more

  • Machine intelligence for content distribution, logistics, smarter cities, and more

    Oct 26 2017

    The O’Reilly Data Show Podcast: Rhea Liu on technology trends in China.In this episode of the Data Show, I spoke with Rhea Liu, analyst at China Tech Insights, a new research firm that is part of Tencent’s Online Media Group. If there’s one place where AI and machine learning are discussed even more than the San Francisco Bay Area, that would be China. Each time I go to China, there are new applications that weren’t widely available just the year before. This year, it was impossible to miss bike...more

  • Vehicle-to-vehicle communication networks can help fuel smart cities

    Oct 12 2017

    The O’Reilly Data Show Podcast: Bruno Fernandez-Ruiz on the importance of building the ground control center of the future.In this episode of the Data Show, I spoke with Bruno Fernandez-Ruiz, co-founder and CTO of Nexar. We first met when he was leading Yahoo! technical teams charged with delivering a variety of large-scale, real-time data products. His new company is helping build out critical infrastructure for the emerging transportation sector.While some question whether V2X communication is...more

  • Transforming organizations through analytics centers of excellence

    Sep 28 2017

    The O’Reilly Data Show Podcast: Carme Artigas on helping enterprises transform themselves with big data tools and technologies.In this episode of the Data Show, I spoke with Carme Artigas, co-founder and CEO of Synergic Partners (a Telefonica company). As more companies adopt big data technologies and techniques, it’s useful to remember that the end goal is to extract information and insight. In fact, as with any collection of tools and technologies, the main challenge is identifying and priorit...more

  • The state of machine learning in Apache Spark

    Sep 14 2017

    The O’Reilly Data Show Podcast: Ion Stoica and Matei Zaharia explore the rich ecosystem of analytic tools around Apache Spark.In this episode of the Data Show, we look back to a recent conversation I had at the Spark Summit in San Francisco with Ion Stoica (UC Berkeley professor and executive chairman of Databricks) and Matei Zaharia (assistant professor at Stanford and chief technologist of Databricks). Stoica and Zaharia were core members of UC Berkeley’s AMPLab, which originated Apache Spark,...more

  • Effective mechanisms for searching the space of machine learning algorithms

    Aug 31 2017

    The O’Reilly Data Show Podcast: Kenneth Stanley on neuroevolution and other principled ways of exploring the world without an objective.In this episode of the Data Show, I spoke with Ken Stanley, founding member of Uber AI Labs and associate professor at the University of Central Florida. Stanley is an AI researcher and a leading pioneer in the field of neuroevolution—a method for evolving and learning neural networks through evolutionary algorithms. In a recent survey article, Stanley went thro...more

  • How Ray makes continuous learning accessible and easy to scale

    Aug 17 2017

    The O’Reilly Data Show Podcast: Robert Nishihara and Philipp Moritz on a new framework for reinforcement learning and AI applications.In this episode of the Data Show, I spoke with Robert Nishihara and Philipp Moritz, graduate students at UC Berkeley and members of RISE Lab. I wanted to get an update on Ray, an open source distributed execution framework that makes it easy for machine learning engineers and data scientists to scale reinforcement learning and other related continuous learning alg...more

  • Why AI and machine learning researchers are beginning to embrace PyTorch

    Aug 03 2017

    The O’Reilly Data Show Podcast: Soumith Chintala on building a worthy successor to Torch and on deep learning within Facebook.In this episode of the Data Show, I spoke with Soumith Chintala, AI research engineer at Facebook. Among his many research projects, Chintala was part of the team behind DCGAN (Deep Convolutional Generative Adversarial Networks), a widely cited paper that introduced a set of neural network architectures for unsupervised learning. Our conversation centered around PyTorch, ...more

  • How big data and AI will reshape the automotive industry

    Jul 20 2017

    The O’Reilly Data Show Podcast: Evangelos Simoudis on next-generation mobility services.In this episode of the Data Show, I spoke with Evangelos Simoudis, co-founder of Synapse Partners and a frequent contributor to O’Reilly. He recently published a book entitled The Big Data Opportunity in Our Driverless Future, and I wanted get his thoughts on the transportation industry and the role of big data and analytics in its future. Simoudis is an entrepreneur, and he also advises and invests in many t...more

  • A framework for building and evaluating data products

    Jul 06 2017

    The O’Reilly Data Show Podcast: Pinterest data scientist Grace Huang on lessons learned in the course of machine learning product launches.In this episode of the Data Show, I spoke with Grace Huang, data science lead at Pinterest. With its combination of a large social graph, enthusiastic users, and multimedia data, I’ve long regarded Pinterest as a fascinating lab for data science. Huang described the challenge of building a sustainable content ecosystem and shared lessons from the front lines ...more

  • Building a next-generation platform for deep learning

    Jun 29 2017

    The O’Reilly Data Show Podcast: Naveen Rao on emerging hardware and software infrastructure for AI.In this episode of the Data Show, I speak with Naveen Rao, VP and GM of the Artificial Intelligence Products Group at Intel. In an earlier episode, we learned that scaling current deep learning models requires innovations in both software and hardware. Through his startup Nervana (since acquired by Intel), Rao has been at the forefront of building a next generation platform for deep learning and AI...more

  • A scalable time-series database that supports SQL

    Jun 22 2017

    The O’Reilly Data Show Podcast: Michael Freedman on TimescaleDB and scaling SQL for time-series.In this episode of the Data Show, I spoke with Michael Freedman, CTO of Timescale and professor of computer science at Princeton University. When I first heard that Freedman and his collaborators were building a time-series database, my immediate reaction was: “Don’t we have enough options already?” The early incarnation of Timescale was a startup focused on IoT, and it was while building tools for th...more

  • Programming collective intelligence for financial trading

    Jun 15 2017

    The O’Reilly Data Show Podcast: Geoffrey Bradway on building a trading system that synthesizes many different models.In this episode of the Data Show, I spoke with Geoffrey Bradway, VP of engineering at Numerai, a new hedge fund that relies on contributions of external data scientists. The company hosts regular competitions where data scientists submit machine learning models for classification tasks. The most promising submissions are then added to an ensemble of models that the company uses to...more

  • Creating large training data sets quickly

    Jun 08 2017

    The O’Reilly Data Show Podcast: Alex Ratner on why weak supervision is the key to unlocking dark data.In this episode of the Data Show, I spoke with Alex Ratner, a graduate student at Stanford and a member of Christopher Ré’s Hazy research group. Training data has always been important in building machine learning algorithms, and the rise of data-hungry deep learning models has heightened the need for labeled data sets. In fact, the challenge of creating training data is ongoing for many compani...more

  • Data science and deep learning in retail

    May 25 2017

    The O’Reilly Data Show Podcast: Jeremy Stanley on hiring and leading machine learning engineers to build world-class data products.In this episode of the Data Show, I spoke with Jeremy Stanley, VP of data science at Instacart, a popular grocery delivery service that is expanding rapidly. As Stanley describes it, Instacart operates a four-sided marketplace comprised of retail stores, products within the stores, shoppers assigned to the stores, and customers who order from Instacart. The objective...more

  • Language understanding remains one of AI’s grand challenges

    May 11 2017

    The O’Reilly Data Show Podcast: David Ferrucci on the evolution of AI systems for language understanding.In this episode of the Data Show, I spoke with David Ferrucci, founder of Elemental Cognition and senior technologist at Bridgewater Associates. Ferrucci served as principal investigator of IBM’s DeepQA project and led the Watson team that became champion of the Jeopardy! quiz show. Elemental Cognition (EC) is a research group focused on building an AI system that will be equipped with state-...more

  • Data preparation in the age of deep learning

    May 04 2017

    The O’Reilly Data Show Podcast: Lukas Biewald on why companies are spending millions of dollars on labeled data sets.In this episode of the Data Show, I spoke with Lukas Biewald, co-founder and chief data scientist at CrowdFlower. In a previous episode we covered how the rise of deep learning is fueling the need for large labeled data sets and high-performance computing systems. CrowdFlower has a service that many leading companies have come to rely on to provide them with labeled data sets to t...more

  • Scaling machine learning

    Apr 20 2017

    The O’Reilly Data Show Podcast: Reza Zadeh on deep learning, hardware/software interfaces, and why computer vision is so exciting.In this episode of the Data Show, I spoke with Reza Zadeh, adjunct professor at Stanford University, co-organizer of ScaledML, and co-founder of Matroid, a startup focused on commercial applications of deep learning and computer vision. Zadeh also is the co-author of the forthcoming book TensorFlow for Deep Learning (now in early release). Our conversation took place ...more

  • Architecting and building end-to-end streaming applications

    Apr 06 2017

    The O’Reilly Data Show Podcast: Karthik Ramasamy on Heron, DistributedLog, and designing real-time applications.In this episode of the Data Show, I spoke with Karthik Ramasamy, adjunct faculty member at UC Berkeley, former engineering manager at Twitter, and co-founder of Streamlio. Ramasamy managed the team that built Heron, an open source, distributed stream processing engine, compatible with Apache Storm.  While Ramasamy has seen firsthand what it takes to build and deploy large-scale di...more

  • Becoming a machine learning engineer

    Mar 30 2017

    The O’Reilly Data Show Podcast: Aurélien Géron on enabling companies to use machine learning in real-world products.In this episode of the Data Show, I spoke with Aurélien Géron, a serial entrepreneur, data scientist, and author of a popular, new book entitled Hands-on Machine Learning with Scikit-Learn and TensorFlow. Géron’s book is aimed at software engineers who want to learn machine learning and start deploying machine learning models in real-world products. As more companies adopt big data...more

  • Natural language analysis using Hierarchical Temporal Memory

    Mar 23 2017

    The O’Reilly Data Show Podcast: Francisco Webber on building HTM-based enterprise applications.In this episode of the Data Show, I spoke with Francisco Webber, founder of Cortical.io, a startup that is applying tools based on Hierarchical Temporal Memory (HTM) to natural language understanding. While HTM has been around for more than a decade, there aren’t many companies that have released products based on it (at least compared to other machine learning methods). Numenta, an organization develo...more

  • Saving the world—or at least the world’s scientific and government data

    Mar 14 2017

    The O’Reilly Data Show Podcast: Max Ogden on data preservation, distributed trust, and bringing cutting-edge technology to journalism.In this special episode of the Data Show, O'Reilly's Jenn Webb speaks with Maxwell Ogden, director of Code for Science and Society. Recently, Ogden and Code for Science have been working on the ongoing rescue of data.gov and assisting with other data rescue projects, such as Data Refuge; they’re also the nonprofit developers supporting Dat, a data versioning and d...more

  • Deep learning that's easy to implement and easy to scale

    Mar 09 2017

    The O’Reilly Data Show Podcast: Anima Anandkumar on MXNet, tensor computations and deep learning, and techniques for scaling algorithms.In this episode of the Data Show, I spoke with Anima Anandkumar, a leading machine learning researcher, and currently a principal research scientist at Amazon. I took the opportunity to get an update on the latest developments on the use of tensors in machine learning. Most of our conversation centered around MXNet—an open source, efficient, scalable deep learni...more

  • Building machine learning solutions that can withstand adversarial attacks

    Feb 23 2017

    The O’Reilly Data Show Podcast: Parvez Ahammad on minimal supervision, and the importance of explainability, interpretability, and security.In this episode of the Data Show, I spoke with Parvez Ahammad, who leads the data science and machine learning efforts at Instart Logic. He has applied machine learning in a variety of domains, most recently to computational neuroscience and security. Along the way, he has assembled and managed teams of data scientists and has had to grapple with issues like...more

  • Deep learning for Apache Spark

    Feb 09 2017

    The O’Reilly Data Show Podcast: Jason Dai on BigDL, a library for deep learning on existing data frameworks.In this episode of the Data Show, I spoke with Jason Dai, CTO of big data technologies at Intel, and co-chair of Strata + Hadoop World Beijing. Dai and his team are prolific and longstanding contributors to the Apache Spark project. Their early contributions to Spark tended to be on the systems side and included Netty-based shuffle, a fair-scheduler, and the “yarn-client” mode. Recently, t...more

  • The key to building deep learning solutions for large enterprises

    Jan 26 2017

    The O’Reilly Data Show Podcast: Adam Gibson on the importance of ROI, integration, and the JVM.As data scientists add deep learning to their arsenals, they need tools that integrate with existing platforms and frameworks. This is particularly important for those who work in large enterprises. In this episode of the Data Show, I spoke with Adam Gibson, co-founder and CTO of Skymind, and co-creator of Deeplearning4J (DL4J). Gibson has spent the last few years developing the DL4J library and commun...more

  • How big compute is powering the deep learning rocket ship

    Jan 12 2017

    The O’Reilly Data Show Podcast: Greg Diamos on building computer systems for deep learning and AI.Specialists describe deep learning as akin to a rocketship that needs a really big engine (a model) and a lot of fuel (the data) in order to go anywhere interesting. To get a better understanding of the issues involved in building compute systems for deep learning, I spoke with one of the foremost experts on this subject: Greg Diamos, senior researcher at Baidu. Diamos has long worked to combine adv...more

  • 2017 will be the year the data science and big data community engage with AI technologies

    Dec 29 2016

    The O’Reilly Data Show Podcast: A look at some trends we’re watching in 2017.This episode consists of excerpts from a recent talk I gave at a conference commemorating the end of the UC Berkeley AMPLab project. This section pertained to some recent trends in Data and AI. For a complete list of trends we’re watching in 2017, as well as regular doses of highly curated resources, subscribe to our Data and AI newsletters.As 2016 draws to a close, I see the big data and data science community beginnin...more

  • Data is only as valuable as the decisions it enables

    Dec 15 2016

    The O’Reilly Data Show Podcast: Ion Stoica on building intelligent and secure applications on live data.In this episode I spoke with Ion Stoica, cofounder and chairman of Databricks. Stoica is also a professor of computer science at UC Berkeley, where he serves as director of the new RISE Lab (the successor to AMPLab). Fresh off the incredible success of AMPLab, RISE seeks to build tools and platforms that enable sophisticated real-time applications on live data, while maintaining strong securit...more

  • Introducing model-based thinking into AI systems

    Dec 01 2016

    The O’Reilly Data Show Podcast: Vikash Mansinghka on recent developments in probabilistic programming.In this episode I spoke with Vikash Mansinghka, research scientist at MIT, where he leads the Probabilistic Computing Project, and co-founder of Empirical Systems. I’ve long wanted to introduce listeners to recent developments in probabilistic programming, and I found the perfect guide in Mansinghka.Probability is the mathematical language to represent, model, and manipulate uncertainty, and pro...more

  • Building the next-generation big data analytics stack

    Nov 17 2016

    The O’Reilly Data Show Podcast: Michael Franklin on the lasting legacy of AMPLab.In this episode I spoke with Michael Franklin, co-director of UC Berkeley’s AMPLab and chair of the Department of Computer Science at the University of Chicago. AMPLab is well-known in the data community for having originated Apache Spark, Alluxio (formerly Tachyon) and many other open source tools. Today marks the start of a two-day symposium commemorating the end of AMPLab, and we took the opportunity to reflect o...more