It’s one thing for your business to collect data, but it’s another altogether if you aren’t able to read and analyze it in a way that benefits the company. That is where the modern data pipeline comes into play. Whether you’re a startup or large business, being able to scale your data and draw conclusions from it is an absolute must. If you want to fully grasp the potential that exists with your data, being able to manage and use scalable pipelines will allow you to gain insights you never thought possible.
Uber began developing its modern data pipeline, Michelangelo, in 2015, which allows the internal teams at Uber to build, deploy, and operate machine learning solutions. It was designed with the goal of being able to cover the Machine Learning workflow, which consists of managing data, while also training, evaluating, and deploying models. It can also make and monitor predictions based on this data.
Before Uber had Michelangelo, they struggled with building and deploying machine learning models at the size and scale of their operations. This limited the impact of machine learning at Uber to only what a few select data scientists and engineers could build in a short window of time. This platform not only manages UberEATS but also dozens of similar models across the company with predictive use cases.
Using UberEATs as an example, Michelangelo covers meal delivery time predictions, restaurant rankings, search rankings, and search autocomplete. This delivery model shows the consumer how much time a meal will take to prepare and deliver before the order is placed and again at each stage in the delivery process. To do so on Michelangelo, the UberEATs data scientists deploy gradient boosted decision tree regression models to predict this end-to-end delivery time. Factors used to come to this prediction include data like the time of day, the average meal prep time within the last seven days, and the average meal prep time within the last hour.
As the world’s largest social fundraising company, GoFundMe has raised over $3 billion has more than 25 million donors. Despite this, they were missing a central warehouse to store the data from its backend relational databases, online events and metrics, support service, and other sources, which came out to roughly one billion events every month. Without this centralization, these analytics were isolated, preventing their IT staff from getting a comprehensive view of where their business was going.
GoFundMe knew they needed a flexible and adaptable data pipeline built to obtain this view. In the end, their pipeline had the connectors they needed for all of their data sources, while also providing the ability to write custom Python scripts to modify their data as needs arose, giving them complete control. In addition to flexibility, this pipeline also gave GoFundMe integrity, as it consisted of safeguards to avoid custom ETL scripts that can corrupt data. One in which is the ability to re-stream data so their it doesn’t get lost, broken, or duplicated within the pipeline.
As a major telecommunications company, millions of residents, federal and local government agencies, and large businesses rely on AT&T for their communication and television needs. To do so, they heavily depend on a seamless data flow in order to provide these services. They also house many customers’ data centers, provide cloud services, and support interactive voice response solutions. When a change was made to the Federal Trade Commission Telemarketing Sales Rule, AT&T was instructed to maintain these records for a longer period of time, which meant they needed to re-think their data movement technology.
Because AT&T now needed reliable data movement and high-speed file transfer to move these audio recordings but were faced with a challenge due to the fact that they outsource most of their call center services to third-party vendors, and they needed to securely move this information from 17 call centers to their data centers. In the end, AT&T needed a pipeline that was not only elevated transmission speed and capacity but was also easy to use and install, automate, and schedule.
AT&T’s new pipeline insured they could comply with the Federal Trade Commision mandate as it made it easy to transfer tens of thousands of massive audio files a day. It also features comprehensive reporting, advanced security, a multitude of configuration options, and customizable workflows.
Create Data Pipelines with Egen
At Egen, we recognize the importance of data in all of its forms. Because of this, we are experts in the modern pipelines to ensure you’re able to use the data collected in the ways that matter most for your business. No matter if you’re a startup or a large business, the ability to read and scale one’s own data is the first step to gaining an edge over the competition. If you are unsure of how to go about this, or what the process consists of, contact Egen to get started and learn more about our own data pipeline, Kernel, and how it can work for your business.
- Data Engineering
- Data Science