Skip to content
Data-centric AI Summit by YData

What to expect from the Data-Centric AI Summit

Data-Centric AI is here to stay, and experts will tell you why.

If you are working in the AI / ML industry in 2022, there is no way you have not heard about the Data-Centric AI idea: introduced recently by Andrew Ng, this approach implies focusing on data-centric operations in opposition to model-centric ones. In other words, it’s all about cleaning and improving your data instead of solely focusing on tuning hyperparameters and choosing a proper model architecture.

The idea of data-centricity has taken the industry by storm, leading to the creation of several data-centric platforms, writing countless tutorials and blog posts, and of course, organizing specialized events.

The first edition of the Data-Centric AI Summit takes place on September 29 and 30 and promises to be the biggest and the most important DCAI event of 2022. It is co-organized by the Data-Centric AI Community and AI Infrastructure Alliance, it’s free and online.

Let’s look at the stats first:

  • 30+ sessions, including talks on the DCAI foundations and core concepts, use-case, in-depth demos of OSS and proprietary solutions, and expert panels.
  • 40+ speakers from various sides of the AI / ML world — practitioners, tool makers, academia, and venture capital.
  • 20+ hours of content and networking, with 3 tracks on the first day and 2 tracks on the second one.

Quite impressive, isn’t it?

These are just some of the reasons that thousands of participants registered when it went live a couple of weeks ago! The main reasons, are the experts and content itself! If you have signed up already, you’ll be able to see the agenda for yourself and be amazed with all the luminary speakers and thought leaders in the space.

For the ones still thinking about it, let me give you a sneak peak some of the talks at this gorgeous event. Let’s go!

🔷 Christoph Schuhmann, LAION. Democratizing AI: Mastering the Massive Open Datasets that Power Imagen and Stable Diffusion

My first pick is about something you have came across recently, because it was just impossible to miss it: stable diffusion became viral instantly and now new business opportunities are emerging! Gartner had predicted that most data science project will be using synthetic data but they we’re not predicting that it was this fast!

Have you ever wondered how to create a huge image-text pairs dataset sufficient to train Imagen and Stable Diffusion? You probably believe that one mega-corporations can do that, and at a great cost. But the truth is much more fascinating. I’m not going to spoil it: just come and listen yourself!

🔷 Bernease Hermann, WhyLabs. Can we adapt experimental data-centric AI principles for production ML systems?

Bernease gives a great foundational talk on why data observability is an important part of the ML flow and how to apply its principles in production. Indeed, it all looks great on marketing slides and pitch decks, but how do you make it work for your business?

🔷 Fabiana Clemente, YData. Hands-on Data-Centric AI: Data preparation tuning — why and how?

Fabiana addresses the question of why should we adopt data-centric AI principles and how to use them as a data scientist — her session is really hands-on and code-driven! Fabiana takes an example of Credit Fraud detection and demonstrates the whole pipeline, from data ingestion to model training and evaluation on top of YData Fabric, data-centric platform. More importantly, it is emphasized the importance of iterating the data, improving the training dataset over time and its impacts in model performance. A mandatory talk for the ones that are working in the data science field.

This is just scratching a surface. Apart from these wonderful presenters, you will see speakers from Made With ML, MLCommons, NVIDIA, Collibra, DataRobot, Pachyderm, ClearML, Modulos, Galileo, and many others.

Secure your spot at Data-Centric AI Summit now, and see you there!

Gonçalo Martins Ribeiro is CEO at YData, working on accelerating AI with improved datasets