Git for Data - lakeFS

Data Stack
Data LakeData ModelingData Governance
Status
Paying Customers
Summary

Git for Data - lakeFS allows you to manage your data as code, enabling reproducible and high-quality data pipelines. It supports all standard computation engines and is format agnostic, making it versatile for any data type. With support for major object stores and seamless integration with your data stack, lakeFS is the ideal solution for data engineers and data scientists. Visit lakefs.io for more information.

Who's using?
VolvoWindwardEpcorPaigeenigmaKariusToyotaBAE SystemsAppsFlyerAir AsiaNetflixTomraDaimlerapexsimilarwebContext LabsTerrameraProton Mail

Manage your data as code using Git-like operations and achieve reproducible, high-quality data pipelines. Available Open Source or on the Cloud.

Run open source locally

Take control of your data

COMPUTE ENGINES

lakeFS supports all standard computation engines.

lakefs

lakeFS uses metadata to manage data versions. Its versioning engine is highly scalable with minor impact to storage performance

formats

lakeFS is format agnostic, regardless of format type be it structured, unstructured, open table, or anything else.

Object Storage

lakeFS supports data in all object stores including all major cloud providers S3, Azure Blob, GCP, and on prem MinIO, Ceph, Dell EMC and any other S3 compatible storage.

Use Cases

lakeFS helps data engineers and data scientists in every field manage their data like code — at scale

  • Data Science
  • Data engineering
  • Data Ops

lakeFS is already helping thousands of developers

UP TO 80%

Reduce storage costs

2X

Double efficiency

UP TO 99%

Increase production

outage recovery

Here's what ML and Data Engineers using lakeFS have to say

Run locally

Official partners

Seamless integration with

all your data stack

Best Practices Best Practices Data Engineering, Thought Leadership