Introduction to Feature Store for Machine Learning

If you have a little idea of machine learning and data science you must have heard the word “feature”. In this blog post, we

  • define feature store
  • explain why we need feature store
  • components of feature store
  • compare feature store and other data storage
  • feature store tools

To get the best from the article, all you need is a beginner-level understanding of data science.

What is Feature Store?

Before we can define a feature store, we need to understand what features are. Features are the input to a machine learning model. A feature is an attribute of a dataset. The diagram below shows 5 features of a bank loan defaulter dataset on Kaggle.

A major process in machine learning lifecycle is feature engineering, this involves cleaning and transforming raw data like the data from kaggle above to generate features which are in the right format and makes it easy for a machine learning model to better understand patterns and signals in the data. Feature engineering is time-consuming and requires a lot of work. Feature engineering can easily become complex if you need to do it for your training data, test data, inference data for every project.

Feature Store are tools used for feature management and to simplify feature engineering.

Why Feature Store?

The advantages of feature store are numerous, I will only share a few

  • makes it easy to share feature across several teams and machine learning applications
  • ensure same features are used during training, testing and inferencing
  • reduce duplication of data engineering effort since data engineers don’t need to create multiple pipelines
  • Speed up machine learning lifecycle and deployment of machine learning applications
  • track feature version, linkages and metadata
  • provide a single source of truth and easy access to features
  • monitor the health of features in production to avoid data drift, concept drift and maintain data quality

Component of feature store

Feature registry — provide a way to maintain and identify several features this includes feature definition and feature metadata.

Transformation — orchestrate and manage the transformation of raw data into features

Storage — features are sometimes not required immediate, feature store provide storage to store feature for future use

Serving — feature store has the capability to serve features at inference stage.

Monitoring — monitor change in the feature to avoid concept drift and data drift

Feature store can be divided into 2

  • online feature stores used for providing data for online applications at low latency
  • Offline feature stores used for providing data while developing AI models and for batch application

Differences between feature store and other data warehouses and storages

Sometimes feature stores can be defined as a data warehouse for machine learning features. It is important not to confuse feature store with other data storage tools.

  1. Feature store is a dual database, one database(row-oriented) for serving features to online applications at low latency and the second database(column-oriented) for storing features that can be used to create training data and for batch applications
  2. Feature store is specific for managing features used by machine learning models. Other data storage and data warehouses manage raw data.
  3. Data warehouses can be used with visualization tools such as Power BI and Tableau. Most Feature store tools as integrated visualization, you don’t need external tools
  4. Input to a feature store are data warehouses
  5. Feature store is used by data scientists while data warehouse is used by data analysts
  6. Other data storage can be used for generating reports and dashboards. Feature store is focused on providing features for machine learning models
  7. Feature store doesn’t require SQL queries, other data storage requires a form of SQL to query data

Feature store tools

  • Feast
  • Databricks feature store
  • AWS sagemaker feature store
  • Google cloud Vertex AI feature store
  • Iguazio feature store
  • Hopsworks feature store
  • Tecton

In conclusion, I hope you were able to understand the concept of feature store and why it is important in machine learning.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store