3 Important considerations before you embark on Feature Store journey!

Prasun Mishra
4 min readOct 1, 2021

--

Quick introduction: An ML Feature Store sounds promising idea for any organization developing and productionizing ML models. Feature Store helps to avoid a ton of duplicate work (for example feature engineering), establish a standard nomenclature and feature description standard across the org, and bring consistency between online and offline feature processing and serving. Needless to say that saved effort and time can be invested to take org’s ML initiatives to the next level. Diagram below explains key features of a feature store:

Image courtesy tecton.ai

A Feature Store provides capabilities (from basic to advance in increasing priority order) such as :

1. Feature Registry

2. Feature Discovery

3. Transformation

4. Storage

5. Serving for training and inference (offline/online)

6. Monitoring

Every organization has its strength and priorities hence it makes sense to consider pros and cons before embarking on your Feature Store journey. ’Build vs Buy’ analysis is very important as building a Feature Store might sound a sexy idea today but you may not want to invest continuously in upgrading, enhancing and maintain your home grown Feature Store for ever. Remember, Feature Stores are still evolving and in long run it could be huge time and effort commitment, dragging you far off from your original product mission. In this article , we will discuss 3 important considerations which may help you further.

№1 You may not need a full-fledged feature store right now but eventually will: This also means that rather than an ‘all or nothing’ approach , you can plan your journey in phases. An honest need assessment is definitely a good idea. For example, organizations upto 10 ML models in production, can easily adopt a simple stepwise approach. Remember, A Feature Store will require significant efforts and resources, which may not be a good idea at your stage. For your state, a Feature Registry may be a the best place to start with. With 10 models in production, your Data Science team should be of reasonable size (4–5 members) and communication / collaboration among the team members won’t be an issue. Hence, starting with a Feature Registry with an standard Feature naming & description template will be a good start. As a next logical step, you can use git repo to store, version and share code libraries for feature pipeline so that they become reusable across the org.

№2 Open-source Feature Stores like Feast and Hopsworks are promising, but still evolving: Featurestore.org is a good place to develop comparative understanding of various licensed or open source Feature Stores. To me, Feast seems better than other open-source options and can be a good starting point. Still , it has some gaps. First of all, Feast stack is highly aligned with GCP due to all right reasons but that takes it far from other cloud environments like AWS and Azure. Feast relied heavily on BigQuery, Apache Beam on Dataflow, and Apache Kafka to deliver functionality quickly but its has created a kind of ‘lock’ with GCP at the moment. As per their blog, Feast team has acknowledged this fact and acting to be more multi cloud compliant solution. Recent release 0.12 (11 Aug 2021), added AWS Redshift as offline data store and Dynamo DB as online store . Here is a great article to understand the direction of Feast. I tried capturing key items in the images below:

Image courtesy Feast.dev

№3 AWS has a built in Feature Store and can be good starting point for all those who are already on this platform: With an incremental cost, you can get started pretty quickly without investing a lot of time. Feature pipelines can be created using Step Functions , SageMaker Pipelines or Apache Airflow.

Image courtesy AWS.Amazon.com

Its noteworthy how smoothly SageMaker Pipelines help you to leverage best in class CI/CD and DevOps practices (aka MLOps)and you can create complete workflow through visual interface or python SDK. AWS Feature Store comes packed with Feature authoring, processing, discovery, training /batch scoring and online inference modules. I must admit that AWS has done an amazing job of providing all encompassing capabilities right from Data Ingestion, Processing, Feature Engineering, Model Training, Testing, Deployment , Lifecycle management and Monitoring in a single platform. Hence, if you are already on AWS, considering AWS Feature Store may make perfect sense for you.

--

--

Prasun Mishra

Hands-on ML practitioner. AWS Certified ML Specialist. Kaggle expert. BIPOC DS Mentor. Working on an interesting NLP use cases!