Data Engineer Things Newsletter #20
Numaflow webinar, free ebook on Airflow 3, volunteering, and Seattle coffee social.
Hey there,
I’m writing to you from San Francisco, while attending the Data + AI Summit this week! I had the honor of attending the Open Lakehouse Mini Summit on Monday, a mini-conference for open-source contributors. I enjoyed learning about the Apache Spark 4.x roadmap, including real-time streaming mode, variant type, DataSource API v2, and declarative pipelines, and having open discussions on yet-to-be-solved problems like handling multimodal data and BLOB at scale.
🗓 DET Webinar on Numaflow on June 18
DET Webinar is back! In the upcoming session, you will learn about Numaflow, an open-source framework from Intuit that rethinks real-time event processing for a broader audience of engineers and developers. Whether you're working on real-time processing, designing event-driven applications, or powering ML workflows, Numaflow offers a simple, flexible way to connect streaming sources, transform events, and move data in real time. Join us to see how modern stream processing is becoming more accessible and why it matters more than ever.
Speakers: Sri Harsha Yayi (Staff Product Manager, Intuit) and Vigith Maurice (Principal Software Engineer, Intuit)
When: 10 am - 10:45 am on Wed, June 18th (PT)
Where: Google Meet
👉🏼 RSVP
🎬 Free eBook: Practical Guide to Apache Airflow 3
Apache Airflow is one of the most popular open-source frameworks for data orchestration. Whether you are a new or experienced Airflow user, this book will be your guide for getting started with Airflow 3. You'll learn how to:
Set up a local development environment and write your first pipeline.
Use new Airflow 3.0 features, including DAG versioning, backfill, asset-oriented syntax, and the new UI.
Prepare your DAGs for a smooth upgrade from Airflow 2 to 3.
👉🏼 Read the book HERE.
(This message is sponsored by Astronomer.)
🙋🏻♀️ Volunteers Needed
We are looking for passionate and dedicated volunteers to join the DET team and shape the future of our community! As a volunteer, you will help us build the new mentorship program, design a one-stop community site, curate content for the newsletter, or host community events. This will be a fantastic opportunity to make meaningful connections, learning technical and soft skills, and enrich your professional portfolio!
👉🏼 Apply to be a volunteer HERE.
☕️ Seattle Coffee Social on June 13
Meet me and Saransh Arora (DET Seattle Lead) at Downtown Seattle for a casual social time over coffee:
Where: Mr West Cafe Bar Downtown, 720 Olive Wy, Seattle, WA 98101
When: 5 pm - 6 pm, Friday, June 13
👉🏼 RSVP
📚 Articles of the Week
How We Implemented a Custom B-tree to Handle 10TB of Time-Series Data by Coders Stop
Why Are There So Many Databases? by Cai Parry-Jones
Data Quality With Airflow SQL Check Operators: A Step-by-Step Guide by Lorena Gongang (You might think you need some advanced framework for Data Quality, but sometimes Airflow could be all you need. This is a comprehensive follow-along guide for any engineer who wants to implement DQ in their pipelines.)
Prompting Strategies by Google Cloud (aka. How to clearly communicate your requirements to GenAI to get exactly what you need.)
Have a great week and see you next time!
Cheers,
Xinran Waibel
Head of the Data Engineer Things community
(💡 Don’t forget to follow us on LinkedIn and join the DET Slack community!)