Leverage Amazon SageMaker Processing to run serverless Spark applications from AWS Lambda.

Data illustrations by Storyset

A widely known big data processing framework such as Apache Spark needs no introduction. If you are reading this post, you most likely know what you are getting into, and just like me, you are curious to know if it is possible to run serverless Spark jobs from an AWS Lambda function.

That also means you are familiar with AWS and serverless services such as AWS Lambda.

That being said, we all know that a little bit of context “never hurt nobody”. So let’s start with Spark!


Learn how to package AWS Lambda functions as container images with AWS CDK and Python.

Web illustrations by Storyset

AWS Lambda is one of the most commonly known services from AWS. As per the AWS website, it is a compute service that lets you run code without provisioning or managing servers. You will only run your code when needed and it scales automatically from few requests per day to thousands per second.

You can write your code for AWS Lambda using different runtimes. As of the time of writing the supported runtimes are: Node.js, Python, Java, C#, Ruby, Go, and PowerShell.

Recently, AWS announced…

Reduce Sagemaker Studio cost by shutting down apps during idle periods.

Technology illustrations by Storyset

Amazon SageMaker Studio lets you manage your entire ML workflow, by providing features that improve the overall ML engineering experience. It provides one-click Jupyter notebooks that can be spun up easily with fully elastic compute resources.

The ability to provision infrastructure with ease is extremely useful for a data scientist, but it should be handled with care as these resources do not scale down automatically when idle.

Users need to shutdown instances manually to reduce the cost, as in Sagemaker studio idle resources will incur in costs as much…

Learn key principles and best practices to optimize your cost on AWS.

Photo by Scott Graham on Unsplash

As a public cloud provider, Amazon Web Services (AWS) offers scalable, reliable, and relatively inexpensive cloud computing services. But whether you are starting a new project, or you are migrating your current organization to AWS, it is easy to overlook one of the most important factors of your AWS journey, your AWS cost.

Managing AWS cost is one of the most common problems AWS customers faced today. Luckily, to avoid situations where you regularly spend far more than you should, AWS provides a set of solutions to help you control and optimize your spending with tools, resources, and services to…

Learn how to use Amazon SNS to create email and phone notifications with Python.


AWS describes Amazon Simple Notification Service (Amazon SNS) as a fully managed messaging service for both application-to-application and application-to-person communication. Amazon SNS allows us to distribute messages to a large number of subscriber systems, this includes many AWS services such as AWS Lambda functions, Amazon SQS queues, HTTPS endpoints, and Amazon Kinesis Data Firehose.

In this post, you will learn how to use Amazon SNS to create email and phone notifications with Python. …

Use a familiar programming language like Python to create and access Amazon S3 resources.

Photo by Chris Ried on Unsplash


In part 1 of this series, you can learn how to interact with Amazon S3 buckets using the AWS SDK Python (boto3). We were able to create and list buckets, as well as upload and download files from Amazon S3.

In this post, we will describe how to use Python to grant temporary access to the buckets and objects for users who do not have AWS credentials or permission to access S3 objects via Presigned URLs. …

Use a familiar programming language like Python to create and access Amazon S3 resources.

Photo by Chris Ried on Unsplash


Amazon Web Services (AWS) describes Amazon Simple Storage Service (Amazon S3) as an object storage service that offers scalability, data availability, security, and performance. It serves a myriad of use cases for companies of all sizes and industries from data lakes, mobile applications, websites, backups, archives, enterprise applications and many more.

In this post, we will describe how to use Python (AWS SDK) to perform common operations on S3 buckets.

Create an Amazon S3 bucket

Before creating an Amazon S3 bucket, there are a few thing to consider:

  • The name of…

Extend the capabilities of Sagemaker Studio container images with new libraries.

In the following post, you will learn how to extend the Sagemaker Studio Spark container image to incorporate additional libraries and interact with Google Cloud Services such as BigQuery. We will then create a notebook to retrieve data from a BigQuery table using Amazon Sagemaker Studio.

SageMaker Studio (Image by author)


On December 3, 2019, AWS introduced Amazon SageMaker Studio as The First Fully Integrated Development Environment For Machine Learning. …

Create serverless services in a reliable and reproducible manner using AWS CDK and Python.

Illustration by Freepik Storyset

Serverless computing alleviates operational overhead for developers by allowing cloud providers to handle infrastructure management tasks such as capacity provisioning. Developers can focus on their requirements and business goals, while only paying for the compute resources they use.

AWS Lambda and AWS Fargate are prime examples of serverless services with features like automatic scaling, built-in high availability, and a pay-for-value billing model. These services increase agility, reduce cost, and time to market for the solutions we create.

An efficient way to build and deploy serverless services…

This is the way!

AWS Cloud Development Kit (Image by author)

Infrastructure as code (IaC) enables management and provisioning of infrastructure through machine-readable definition files. IaC provides consistency by having files as descriptive models be the single source of truth.

For cloud providers such as AWS, CloudFormation is the IaC enabler. CloudFormation helps model resources by describing them in templates that can be deployed as stacks. It provides the ability to easily build, manage, change, and destroy resources in your infrastructure through resource definition files.

Unfortunately, despite all the advantages CloudFormation bring to the table, designing complex infrastructures using CloudFormation templates becomes a tedious process with manual actions and long yaml/json…

Ramon Marrero

Head of Data Engineering | AWS Community Builder | AWS Certified Solutions Architect

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store