Resources

June 11, 2023

Unlocking the Power of a Data Catalog for Your Business

The importance of data quality & profiling for the success of Machine Learning In today's world, businesses around the globe are generating a vast amount of data. To be able to adopt a data-driven initiative, organizations must manage data...

Read More
Unlocking the Power of a Data Catalog for Your Business
AI industry with real-world data

A Data Scientist’s Guide to Identify and Resolve Data Quality Issues

Doing this early for your next project will save you weeks of effort and stress If you've worked in the AI industry with real-world data, you’d understand the pain. No matter how streamlined the data collection process is, the data we’re...

Read More
Measure Data Quality

How Can I Measure Data Quality?

Introducing YData Quality: An open-source package for comprehensive Data Quality. Flag all your data quality issues by priority in a few lines of code “Everyone wants to do the model work, not the data work” — Google Research According to...

Read More
Synthetic Data logo and people with their arms raised

Introducing the Synthetic Data Community

A vibrant community pioneering an essential to the data science toolkit Photo by Dylan Gillis on Unsplash According to a 2017 Harvard Business Review study, only 3% of companies’ data meets basic quality standards. Based on a 2020 YData...

Read More
Baseline results using a tree-based algorithm on the imbalanced dataset

High-quality data meets enterprise MLOps

According to the 2021 enterprise trends in machine learning report by Algorithmia, 83% of all organizations have increased their AI/ML budgets year-on-year, and the average number of data scientists employed has grown by 76% over the same...

Read More
From model-centric to data-centric

From model-centric to data-centric

A new paradigm for AI development — focused on data quality In my last blog post I’ve covered the rise of DataPrepOps and the importance of data preparation to achieve optimized results from Machine Learning based solutions. The stakes of...

Read More
The rise of DataPrepOps

The rise of DataPrepOps

Modern data development tools and how data quality impacts ML results ML is all around us! From healthcare to education, it is being applied in many domains that affect our daily activities and it’s able to deliver many benefits. Data...

Read More
How to go from raw data to production like a pro

How to go from raw data to production like a pro

An odyssey on improving data quality with synthetic data and model delivery with MLOps Machine Learning and AI are two concepts that definitely have changed our way of thinking in the last decade, and will probably change even more in the...

Read More
Time-series Synthetic Data: A GAN approach

Time-series Synthetic Data: A GAN approach

Generate synthetic sequential data with TimeGAN Time-series or sequential data can be defined as any data that has time dependency. Cool, huh, but where can I find sequential data? Well, a bit everywhere, from credit card transactions, my...

Read More
Data Pipeline Selection and Optimization

Data Pipeline Selection and Optimization

In recent years, machine learning has revolutionized how businesses and organizations operate. However, one aspect that is often overlooked is the importance of data pipelines in influencing machine learning performance. In this paper, the...

Read More
Learn from Data Science

What we have learned from talking with 100+ data scientists

One good thing about the current pandemic (probably the only good thing) is that everyone stopped spending time commuting and got to spend that time on something else. We’re glad that some of those people were kind enough to spend that...

Read More
Do something great

Startups portuguesas receberam mais de 275 milhões de euros (PT)

Mais de metade das startups financeiras portuguesas estão sediadas em Lisboa, 19% escolheu outros países da Europa e 18% está no Porto. Investidores apontam localização das sedes como um obstáculo. O Top 30 das startups de tecnologia...

Read More
data science focused on data, container, Kubernetes

Should Data Science teams use Kubernetes? Hell no!

Data science teams should focus on analysing data and building models, not infrastructure management. Kubernetes is great! 1. “Kubernetes is a future proof solution.” Because it is super cool to say “future proof”. Nobody knows how the...

Read More

Subscribe our newsletter for latest updates