Introduction to Feature Store for Machine Learning

Robert John
4 min readDec 19, 2021

--

Source Unsplash by Fikri Rasyid

If you have a little idea of machine learning and data science you must have heard the word “feature”. In this blog post, we

  • define feature store
  • explain why we need feature store
  • components of feature store
  • compare feature store and other data storage
  • feature store tools

To get the best from the article, all you need is a beginner-level understanding of data science.

What is Feature Store?

Before we can define a feature store, we need to understand what features are. Features are the input to a machine learning model. A feature is an attribute of a dataset. The diagram below shows 5 features of a bank loan defaulter dataset on Kaggle.

Bank loan defaulter dataset. Source kaggle

A major process in machine learning lifecycle is feature engineering, this involves cleaning and transforming raw data like the data from kaggle above to generate features which are in the right format and makes it easy for a machine learning model to better understand patterns and signals in the data. Feature engineering is time-consuming and requires a lot of work. Feature engineering can easily become complex if you need to do it for your training data, test data, inference data for every project.

Feature Store are tools used for feature management and to simplify feature engineering.

Component of Feature Store. Source

Why Feature Store?

The advantages of feature store are numerous, I will only share a few

  • makes it easy to share feature across several teams and machine learning applications
  • ensure same features are used during training, testing and inferencing
  • reduce duplication of data engineering effort since data engineers don’t need to create multiple pipelines
  • Speed up machine learning lifecycle and deployment of machine learning applications
  • track feature version, linkages and metadata
  • provide a single source of truth and easy access to features
  • monitor the health of features in production to avoid data drift, concept drift and maintain data quality

Component of feature store

Feature registry — provide a way to maintain and identify several features this includes feature definition and feature metadata.

Transformation — orchestrate and manage the transformation of raw data into features

Storage — features are sometimes not required immediate, feature store provide storage to store feature for future use

Serving — feature store has the capability to serve features at inference stage.

Monitoring — monitor change in the feature to avoid concept drift and data drift

Feature store can be divided into 2

  • online feature stores used for providing data for online applications at low latency
  • Offline feature stores used for providing data while developing AI models and for batch application
Feature store in a data pipeline. Source

Differences between feature store and other data warehouses and storages

Sometimes feature stores can be defined as a data warehouse for machine learning features. It is important not to confuse feature store with other data storage tools.

  1. Feature store is a dual database, one database(row-oriented) for serving features to online applications at low latency and the second database(column-oriented) for storing features that can be used to create training data and for batch applications
  2. Feature store is specific for managing features used by machine learning models. Other data storage and data warehouses manage raw data.
  3. Data warehouses can be used with visualization tools such as Power BI and Tableau. Most Feature store tools as integrated visualization, you don’t need external tools
  4. Input to a feature store are data warehouses
  5. Feature store is used by data scientists while data warehouse is used by data analysts
  6. Other data storage can be used for generating reports and dashboards. Feature store is focused on providing features for machine learning models
  7. Feature store doesn’t require SQL queries, other data storage requires a form of SQL to query data

Feature store tools

Feature Store Milestones. Source
  • Feast
  • Databricks feature store
  • AWS sagemaker feature store
  • Google cloud Vertex AI feature store
  • Iguazio feature store
  • Hopsworks feature store
  • Tecton

In conclusion, I hope you were able to understand the concept of feature store and why it is important in machine learning.

--

--

Robert John
Robert John

Written by Robert John

I develop machine learning models and deploy them to production using cloud services.

No responses yet