I'm Learning Data Engineering By Building Real Products (Here's What No One Told Me)

I'm going to be honest with you: I'm pretty new to data engineering.

Eight months ago, I completed the Data Analytics Pathway through Persevere—a program for justice-impacted individuals learning tech. I could analyze data in Tableau and Power BI. I understood SQL well enough to pull what I needed from a database. (I was actually pretty good at it.) But I had never built an actual pipeline that functioned as an automated data cleaner.

I was completely lost.

Now, eight months later, I'm working on ETL pipelines for real products at Banyan Labs, including data infrastructure for HRPR—software that helps cities like Nashville manage housing resources. And I'm still learning every single day from senior engineers way smarter than me. Let me share what that learning curve actually looks like—maybe it helps someone else starting out.

The Gap Between "Learning Data" and "Building Pipelines"

Here's what surprised me most about moving from analytics to engineering: Knowing how to work WITH data is completely different from knowing how to BUILD systems that move, clean, and transform data automatically.

In the Data Analytics Pathway at Persevere, I learned to query databases and visualize trends. I could showcase data using Tableau and Power BI. I was comfortable with SQL.

But my first task at Banyan Labs? Build an ETL process for Nashville housing data for HRPR. (I had no idea where to start.) The manual tasks I once did in data analysis—cleaning messy datasets, reformatting inconsistent entries, merging data from different sources—now needed to happen automatically, at scale, without me clicking through spreadsheets.

According to recent industry data, manual ETL maintenance consumes 60-80% of data engineering time, meaning teams spend four hours maintaining existing pipelines for every hour building new capabilities (Hevo Data, 2026). I quickly learned why: building systems that run reliably without human intervention is exponentially harder than analyzing data manually.

The Moment It Finally Started to Click

I spent my first two weeks at Banyan Labs completely overwhelmed. Stephen and Andrew— both senior engineers on our team—walked me through ETL architecture over and over. But I kept getting lost in the details. Where does the data come from? What transforms it? Where does it go? How does it know when to run? Then Stephen drew me a diagram. (It was surprisingly simple.) He showed me the three core pieces:

Extract: Pull data from sources (databases, APIs, files)

Transform: Clean it, reshape it, make it useful

Load: Put it where it needs to go

Once I understood what each piece did and was responsible for, it began to make sense.

Not that I could build one yet. But at least I could see the shape of what I was trying to build.

(That was Week 3. I'm now on Month 8. I'm still learning what "production-grade" actually means.)

The Mistake I Keep Making

Here's the pattern I've noticed in myself—and maybe you'll recognize it too if you're learning:

I move too fast. I try to build before I understand the foundational architecture. When I first started on the Nashville data pipeline, I wanted to jump straight into writing code. I had used Python in my analytics work, so I figured I could just... start building things. Andrew stopped me. (Thank goodness.) "Before you write any code," he said, "draw the architecture. Where's the data coming from? What happens if the source is down? What happens if the data format changes? How do we know if it fails?"

I hadn't thought about any of that.

The modern ETL landscape is shifting rapidly—by 2026, data engineers are transitioning from manual coding to strategic roles where they "architect systems, validate AI-generated code and play a greater role in business decisions" (Snowflake, 2025). But none of that matters if you don't understand the fundamentals first.

I'm learning that the hard way.

What I'm Still Struggling With Right Now

Eight months in, here's what still confuses me: I'm struggling with understanding how data lakes and Redis work in a multi-tenancy architecture. Our lead architect explains it. I think I get it. Then I sit down to actually implement something and realize I don't understand at all. The good news? Stephen and Andrew are patient. They've both told me: "You're not supposed to understand everything in eight months. Keep asking questions." So I do. Constantly.

How I Actually Spend My Time Now

From a data engineering perspective, I use these skills mostly for marketing automation at Banyan Labs. I've built pipelines that automate routine and clerical tasks—pulling data from different sources, cleaning it, and making it accessible for decision-making. It's not the massive-scale data engineering you read about in tech blogs. But it's real, it's production, and people use it every day. (And when it breaks, I'm the one who has to fix it. Which is its own kind of learning.)

The Resources That Actually Helped

Here's what's been most useful for me:

Design Patterns (and anything else by the Gang of Four)—Stephen recommended this early on, and it's been invaluable for understanding how to structure code that doesn't fall apart.

Pair programming with senior engineers—I learn more in 30 minutes working alongside Andrew than in hours of solo work.

Actually breaking things—Every time a pipeline fails, I learn something I couldn't have learned from a book.

There are endless online courses about data engineering. (I've tried several.) But nothing beats building real infrastructure with real consequences when it fails, guided by people who've done it for years.

What I'd Tell Myself Eight Months Ago

If I could go back to Day 1, here's what I'd say: It takes as long as it takes. Don't try to rush it. (I'm still learning this one.) I wanted to be good immediately. I wanted to build impressive pipelines my first month. I felt behind because I came from a non-traditional background—a bootcamp program instead of a CS degree. But eight months in, I've learned something important: the engineers I respect most are the ones who admit what they don't know and keep learning anyway. I don't have years of experience. I'm not building the kind of massive-scale systems you read about on tech blogs. But I'm building real products for real users, and I'm learning from people way smarter than me. That's enough for right now.

Before You Start Your Own Learning Journey

If you're thinking about learning data engineering—especially from a non-traditional background like mine—here's what I wish someone had told me:You don't need to know everything before you start. (You literally can't. You learn by building.) Find people who will teach you. Stephen and Andrew have been incredibly patient with my questions. That mentorship matters more than any course. Build something real, even if it's small. My first pipeline was tiny. It still is, compared to what senior engineers build. But it taught me more than months of tutorials. Get comfortable asking "dumb" questions. (They're usually not as dumb as you think.) It takes as long as it takes. I'm eight months in and still feel like a beginner most days. Apparently, that's normal.

What I'm Still Figuring Out

I've completed one real ETL pipeline. I'm working on others when I'm not wearing different hats. I test products like JONA to understand how data flows through production systems. But honestly? I'm still in the early stages of understanding what data engineering actually is.

The difference is that now I know what I don't know. I can see the shape of the knowledge I'm working toward. I have senior engineers who are willing to teach me. And I've learned that "not knowing yet" doesn't mean "can't learn." If you've been doing data engineering for years, you probably already know everything I just wrote. (And if I'm missing something important—which I probably am—I genuinely want to hear about it.) But if you're just starting out? If you're wondering whether someone from a non-traditional background can actually learn this stuff? Maybe my eight-month journey helps answer that question. What did you struggle with most when you were learning data engineering? What do you wish someone had told you on Day 1?

___________________________________________________________________________

References

Hevo Data. (2026, January 10). ETL trends 2026: Key shifts reshaping data integration. https://hevodata.com/learn/etl-trends/

Snowflake. (2025, December 19). From ETL to autonomy: Data engineering in 2026. The New Stack. https://thenewstack.io/from-etl-to-autonomy-data-engineering-in-2026/