Friday, March 25, 2022
HomeBig DataHow ENGIE scales their information ingestion pipelines utilizing Amazon MWAA

How ENGIE scales their information ingestion pipelines utilizing Amazon MWAA


ENGIE—one of many largest utility suppliers in France and a worldwide participant within the zero-carbon power transition—produces, transports, and offers electrical energy, gasoline, and power providers. With 160,000 staff worldwide, ENGIE is a decentralized group and operates 25 enterprise items with a excessive stage of delegation and empowerment. ENGIE’s decentralized world buyer base had collected plenty of information, and it required a better, distinctive strategy and resolution to align its initiatives and supply information that’s ingestible, organizable, governable, sharable, and actionable throughout its world enterprise items.

In 2018, the corporate’s enterprise management determined to speed up its digital transformation via information and innovation by changing into a data-driven firm. Yves Le Gélard, chief digital officer at ENGIE, explains the corporate’s objective: “Sustainability for ENGIE is the alpha and the omega of every part. That is our raison d’être. We assist massive firms and the most important cities on earth of their makes an attempt to transition to zero carbon as shortly as attainable as a result of it’s really the primary query for humanity at present.”

ENGIE, as with every different huge enterprise, is utilizing a number of extract, remodel, and cargo (ETL) instruments to ingest information into their information lake on AWS. However, they often have costly licensing plans. “The corporate wanted a uniform methodology of amassing and analyzing information to assist clients handle their worth chains,” says Gregory Wolowiec, the Chief Know-how Officer who leads ENGIE’s information program. ENGIE wished a free-license utility, properly built-in with a number of applied sciences and with a steady integration, steady supply (CI/CD) pipeline to extra simply scale all their ingestion course of.

ENGIE began utilizing Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to unravel this problem and began transferring numerous information sources from on-premise functions and ERPs, AWS providers like Amazon Redshift, Amazon Relational Database Service (Amazon RDS), Amazon DynamoDB, exterior providers like Salesforce, and different cloud suppliers to a centralized information lake on high of Amazon Easy Storage Service (Amazon S3).

Amazon MWAA is utilized in specific to gather and retailer harmonized operational and company information from totally different on-premises and software program as a service (SaaS) information sources right into a centralized information lake. The aim of this information lake is to create a “group efficiency cockpit” that allows an environment friendly, data-driven evaluation and considerate decision-making by the Engie Administration board.

On this publish, we share how ENGIE created a CI/CD pipeline for an Amazon MWAA mission template utilizing an AWS CodeCommit repository and plugged it into AWS CodePipeline to construct, take a look at, and bundle the code and customized plugins. On this use case, we developed a customized plugin to ingest information from Salesforce based mostly on the Airflow Salesforce open-source plugin.

Answer overview

The next diagrams illustrate the answer structure defining the applied Amazon MWAA atmosphere and its related pipelines. It additionally describes the client use case for Salesforce information ingestion into Amazon S3.

The next diagram exhibits the structure of the deployed Amazon MWAA atmosphere and the applied pipelines.

The previous structure is absolutely deployed through infrastructure as code (IaC). The implementation consists of the next:

  • Amazon MWAA atmosphere – A customizable Amazon MWAA atmosphere packaged with plugins and necessities and configured in a safe method.
  • Provisioning pipeline – The admin group can handle the Amazon MWAA atmosphere utilizing the included CI/CD provisioning pipeline. This pipeline features a CodeCommit repository plugged into CodePipeline to constantly replace the atmosphere and its plugins and necessities.
  • Challenge pipeline – This CI/CD pipeline comes with a CodeCommit repository that triggers CodePipeline to constantly construct, take a look at and deploy DAGs developed by customers. As soon as deployed, these DAGs are made obtainable within the Amazon MWAA atmosphere.

The next diagram exhibits the info ingestion workflow, which incorporates the next steps:

  1. The DAG is triggered by Amazon MWAA manually or based mostly on a schedule.
  2. Amazon MWAA initiates information assortment parameters and calculates batches.
  3. Amazon MWAA distributes processing duties amongst its employees.
  4. Information is retrieved from Salesforce in batches.
  5. Amazon MWAA assumes an AWS Identification and Entry Administration (IAM) position with the required permissions to retailer the collected information into the goal S3 bucket.

This AWS Cloud Improvement Equipment (AWS CDK) assemble is applied with the next safety greatest practices:

  • With the precept of least privilege, you grant permissions to solely the sources or actions that customers must carry out duties.
  • S3 buckets are deployed with safety compliance guidelines: encryption, versioning, and blocking public entry.
  • Authentication and authorization administration is dealt with with AWS Single Signal-On (AWS SSO).
  • Airflow shops connections to exterior sources in a safe method both in Airflow’s default secrets and techniques backend or another secrets and techniques backend comparable to AWS Secrets and techniques Supervisor or AWS Programs Supervisor Parameter Retailer.

For this publish, we step via a use case utilizing the info from Salesforce to ingest it into an ENGIE information lake so as to remodel it and construct enterprise experiences.

Conditions for deployment

For this walkthrough, the next are conditions:

  • Fundamental information of the Linux working system
  • Entry to an AWS account with administrator or energy person (or equal) IAM position insurance policies connected
  • Entry to a shell atmosphere or optionally with AWS CloudShell

Deploy the answer

To deploy and run the answer, full the next steps:

  1. Set up AWS CDK.
  2. Bootstrap your AWS account.
  3. Outline your AWS CDK atmosphere variables.
  4. Deploy the stack.

Set up AWS CDK

The described resolution is absolutely deployed with AWS CDK.

AWS CDK is an open-source software program improvement framework to mannequin and provision your cloud utility sources utilizing acquainted programming languages. If you wish to familiarize your self with AWS CDK, the AWS CDK Workshop is a good place to begin.

Set up AWS CDK utilizing the next instructions:

npm set up -g aws-cdk
# To test the set up
cdk --version

Bootstrap your AWS account

First, you’ll want to be certain the atmosphere the place you’re planning to deploy the answer to has been bootstrapped. You solely want to do that one time per atmosphere the place you need to deploy AWS CDK functions. If you happen to’re uncertain whether or not your atmosphere has been bootstrapped already, you may at all times run the command once more:

cdk bootstrap aws://YOUR_ACCOUNT_ID/YOUR_REGION

Outline your AWS CDK atmosphere variables

On Linux or MacOS, outline your atmosphere variables with the next code:

export CDK_DEFAULT_ACCOUNT=YOUR_ACCOUNT_ID
export CDK_DEFAULT_REGION=YOUR_REGION

On Home windows, use the next code:

setx CDK_DEFAULT_ACCOUNT YOUR_ACCOUNT_ID
setx CDK_DEFAULT_REGION YOUR_REGION

Deploy the stack

By default, the stack deploys a primary Amazon MWAA atmosphere with the related pipelines described beforehand. It creates a brand new VPC so as to host the Amazon MWAA sources.

The stack could be personalized utilizing the parameters listed within the following desk.

To cross a parameter to the assemble, you should use the AWS CDK runtime context. If you happen to intend to customise your atmosphere with a number of parameters, we suggest utilizing the cdk.json context file with model management to keep away from sudden adjustments to your deployments. All through our instance, we cross just one parameter to the assemble. Due to this fact, for the simplicity of the tutorial, we use the the --context or -c choice to the cdk command, as within the following instance:

cdk deploy -c paramName=paramValue -c paramName=paramValue ...

Parameter Description Default Legitimate values
vpcId VPC ID the place the cluster is deployed. If none, creates a brand new one and desires the parameter cidr in that case. None VPC ID
cidr The CIDR for the VPC that’s created to host Amazon MWAA sources. Used provided that the vpcId just isn’t outlined. 172.31.0.0/16 IP CIDR
subnetIds Comma-separated checklist of subnets IDs the place the cluster is deployed. If none, appears for personal subnets in the identical Availability Zone. None Subnet IDs checklist (coma separated)
envName Amazon MWAA atmosphere identify MwaaEnvironment String
envTags Amazon MWAA atmosphere tags None See the next JSON instance: '{"Surroundings":"MyEnv", "Software":"MyApp", "Cause":"Airflow"}'
environmentClass Amazon MWAA atmosphere class mw1.small mw1.small, mw1.medium, mw1.massive
maxWorkers Amazon MWAA most employees 1 int
webserverAccessMode Amazon MWAA atmosphere entry mode (non-public or public) PUBLIC_ONLY PUBLIC_ONLY, PRIVATE_ONLY
secretsBackend Amazon MWAA atmosphere secrets and techniques backend Airflow SecretsManager

Clone the GitHub repository:

git clone https://github.com/aws-samples/cdk-amazon-mwaa-cicd

Deploy the stack utilizing the next command:

cd mwaairflow && 
pip set up . && 
cdk synth && 
cdk deploy -c vpcId=YOUR_VPC_ID

The next screenshot exhibits the stack deployment:

The next screenshot exhibits the deployed stack:

Create resolution sources

For this walkthrough, you need to have the next conditions:

If you happen to don’t have a Salesforce account, you may create a SalesForce developer account:

  1. Join a developer account.
  2. Copy the host from the e-mail that you simply obtain.
  3. Log in into your new Salesforce account
  4. Select the profile icon, then Settings.
  5. Select Reset my Safety Token.
  6. Test your e-mail and duplicate the safety token that you simply obtain.

After you full these conditions, you’re able to create the next sources:

  • An S3 bucket for Salesforce output information
  • An IAM position and IAM coverage to jot down the Salesforce output information on Amazon S3
  • A Salesforce connection on the Airflow UI to have the ability to learn from Salesforce
  • An AWS connection on the Airflow UI to have the ability to write on Amazon S3
  • An Airflow variable on the Airflow UI to retailer the identify of the goal S3 bucket

Create an S3 bucket for Salesforce output information

To create an output S3 bucket, full the next steps:

  1. On the Amazon S3 console, select Create bucket.

The Create bucket wizard opens.

  1. For Bucket identify, enter a DNS-compliant identify to your bucket, comparable to airflow-blog-post.
  2. For Area, select the Area the place you deployed your Amazon MWAA atmosphere, for instance, US East (N. Virginia) us-east-1.
  3. Select Create bucket.

For extra data, see Making a bucket.

Create an IAM position and IAM coverage to jot down the Salesforce output information on Amazon S3

On this step, we create an IAM coverage that enables Amazon MWAA to jot down in your S3 bucket.

  1. On the IAM console, within the navigation pane, select Insurance policies.
  2. Select Create coverage.
  3. Select the JSON tab.
  4. Enter the next JSON coverage doc, and exchange airflow-blog-post together with your bucket identify:
    {
      "Model": "2012-10-17",
      "Assertion": [
        {
          "Effect": "Allow",
          "Action": ["s3:ListBucket"],
          "Useful resource": ["arn:aws:s3:::airflow-blog-post"]
        },
        {
          "Impact": "Permit",
          "Motion": [
            "s3:PutObject",
            "s3:GetObject",
            "s3:DeleteObject"
          ],
          "Useful resource": ["arn:aws:s3:::airflow-blog-post/*"]
        }
      ]
    }

  5. Select Subsequent: Tags.
  6. Select Subsequent: Evaluation.
  7. For Identify, select a reputation to your coverage (for instance, airflow_data_output_policy).
  8. Select Create coverage.

Let’s connect the IAM coverage to a brand new IAM position that we use in our Airflow connections.

  1. On the IAM console, select Roles within the navigation pane after which select Create position.
  2. Within the Or choose a service to view its use circumstances part, select S3.
  3. For Choose your use case, select S3.
  4. Seek for the identify of the IAM coverage that we created within the earlier step (airflow_data_output_role) and choose the coverage.
  5. Select Subsequent: Tags.
  6. Select Subsequent: Evaluation.
  7. For Position identify, select a reputation to your position (airflow_data_output_role).
  8. Evaluation the position after which select Create position.

You’re redirected to the Roles part.

  1. Within the search field, enter the identify of the position that you simply created and select it.
  2. Copy the position ARN to make use of later to create the AWS connection on Airflow.

Create a Salesforce connection on the Airflow UI to have the ability to learn from Salesforce

To learn information from Salesforce, we have to create a connection utilizing the Airflow person interface.

  1. On the Airflow UI, select Admin.
  2. Select Connections, after which the plus signal to create a brand new connection.
  3. Fill within the fields with the required data.

The next desk gives extra details about every worth.

Subject Necessary Description Values
Conn Id Sure Connection ID to outline and for use later within the DAG For instance, salesforce_connection
Conn Kind Sure Connection kind HTTP
Host Sure Salesforce host identify host-dev-ed.my.salesforce.com or host.lightning.power.com. Change the host together with your Salesforce host and don’t add the http:// as prefix.
Login Sure The Salesforce person identify. The person will need to have learn entry to the salesforce objects. admin@instance.com
Password Sure The corresponding password for the outlined person. MyPassword123
Port No Salesforce occasion port. By default, 443. 443
Additional Sure Specify the additional parameters (as a JSON dictionary) that can be utilized within the Salesforce connection. security_token is the Salesforce safety token for authentication. To get the Salesforce safety token in your e-mail, you could reset your safety token. {"security_token":"AbCdE..."}

Create an AWS connection within the Airflow UI to have the ability to write on Amazon S3

An AWS connection is required to add information into Amazon S3, so we have to create a connection utilizing the Airflow person interface.

  1. On the Airflow UI, select Admin.
  2. Select Connections, after which select the plus signal to create a brand new connection.
  3. Fill within the fields with the required data.

The next desk gives extra details about the fields.

Subject Necessary Description Worth
Conn Id Sure Connection ID to outline and for use later within the DAG For instance, aws_connection
Conn Kind Sure Connection kind Amazon Internet Providers
Additional Sure It’s required to specify the Area. You additionally want to offer the position ARN that we created earlier.
{
"area":"eu-west-1",
"role_arn":"arn:aws:iam::123456789101:position/airflow_data_output_role "
}

Create an Airflow variable on the Airflow UI to retailer the identify of the goal S3 bucket

We create a variable to set the identify of the goal S3 bucket. This variable is utilized by the DAG. So, we have to create a variable utilizing the Airflow person interface.

  1. On the Airflow UI, select Admin.
  2. Select Variables, then select the plus signal to create a brand new variable.
  3. For Key, enter bucket_name.
  4. For Val, enter the identify of the S3 bucket that you simply created in a earlier step (airflow-blog-post).

Create and deploy a DAG in Amazon MWAA

To have the ability to ingest information from Salesforce into Amazon S3, we have to create a DAG (Directed Acyclic Graph). To create and deploy the DAG, full the next steps:

  1. Create an area Python DAG.
  2. Deploy your DAG utilizing the mission CI/CD pipeline.
  3. Run your DAG on the Airflow UI.
  4. Show your information in Amazon S3 (with S3 Choose).

Create an area Python DAG

The offered SalesForceToS3Operator lets you ingest information from Salesforce objects to an S3 bucket. Consult with customary Salesforce objects for the total checklist of objects you may ingest information from with this Airflow operator.

On this use case, we ingest information from the Alternative Salesforce object. We retrieve the final 6 months’ information in month-to-month batches and we filter on a selected checklist of fields.

The DAG offered within the pattern in GitHub repository imports the final 6 months of the Alternative object (one file by month) by filtering the checklist of retrieved fields.

This operator takes two connections as parameters:

  • An AWS connection that’s used to add ingested information into Amazon S3.
  • A Salesforce connection to learn information from Salesforce.

The next desk gives extra details about the parameters.

Parameter Kind Necessary Description
sf_conn_id string Sure Identify of the Airflow connection that has the next data:

  • person identify
  • password
  • safety token
sf_obj string Sure Identify of the related Salesforce object (Account, Lead, Alternative)
s3_conn_id string Sure The vacation spot S3 connection ID
s3_bucket string Sure The vacation spot S3 bucket
s3_key string Sure The vacation spot S3 key
sf_fields string No The (non-compulsory) checklist of fields that you simply need to get from the thing (Id, Identify, and so forth).
If none (the default), then this will get all fields for the thing.
fmt string No The (non-compulsory) format that the S3 key of the info must be in.
Doable values embody CSV (default), JSON, and NDJSON.
from_date date format No A particular date-time (non-compulsory) formatted enter to run queries from for incremental ingestion.
Evaluated in opposition to the SystemModStamp attribute.
Not suitable with the question parameter and must be in date-time format (for instance, 2021-01-01T00:00:00Z).
Default: None
to_date date format No A particular date-time (non-compulsory) formatted enter to run queries to for incremental ingestion.
Evaluated in opposition to the SystemModStamp attribute.
Not suitable with the question parameter and must be in date-time format (for instance, 2021-01-01T00:00:00Z).
Default: None
question string No A particular question (non-compulsory) to run for the given object.
This overrides default question creation.
Default: None
relationship_object string No Some queries require relationship objects to work, and these will not be the identical names because the Salesforce object.
Specify that relationship object right here (non-compulsory).
Default: None
record_time_added boolean No Set this non-compulsory worth to true if you wish to add a Unix timestamp area to the ensuing information that marks when the info was fetched from Salesforce.
Default: False
coerce_to_timestamp boolean No Set this non-compulsory worth to true if you wish to convert all fields with dates and datetimes into Unix timestamp (UTC).
Default: False

Step one is to import the operator in your DAG:

from operators.salesforce_to_s3_operator import SalesforceToS3Operator

Then outline your DAG default ARGs, which you should use to your widespread job parameters:

# These args will get handed on to every operator
# You'll be able to override them on a per-task foundation throughout operator initialization
default_args = {
    'proprietor': 'user-demo@instance.com',
    'depends_on_past': False,
    'start_date': days_ago(2),
    'retries': 0,
    'retry_delay': timedelta(minutes=1),
    'sf_conn_id': 'salesforce_connection',
    's3_conn_id': 'aws_connection',
    's3_bucket': 'salesforce-to-s3',
}
...

Lastly, you outline the duties to make use of the operator.

The next examples illustrate some use circumstances.

Salesforce object full ingestion

This job ingests all of the content material of the Salesforce object outlined in sf_obj. This selects all the thing’s obtainable fields and writes them into the outlined format in fmt. See the next code:

...
salesforce_to_s3 = SalesforceToS3Operator(
    task_id="Opportunity_to_S3",
    sf_conn_id=default_args["sf_conn_id"],
    sf_obj="Alternative",
    fmt="ndjson",
    s3_conn_id=default_args["s3_conn_id"],
    s3_bucket=default_args["s3_bucket"],
    s3_key=f"salesforce/uncooked/dt={s3_prefix}/{desk.decrease()}.json",
    dag=salesforce_to_s3_dag,
)
...

Salesforce object partial ingestion based mostly on fields

This job ingests particular fields of the Salesforce object outlined in sf_obj. The chosen fields are outlined within the non-compulsory sf_fields parameter. See the next code:

...
salesforce_to_s3 = SalesforceToS3Operator(
    task_id="Opportunity_to_S3",
    sf_conn_id=default_args["sf_conn_id"],
    sf_obj="Alternative",
    sf_fields=["Id","Name","Amount"],
    fmt="ndjson",
    s3_conn_id=default_args["s3_conn_id"],
    s3_bucket=default_args["s3_bucket"],
    s3_key=f"salesforce/uncooked/dt={s3_prefix}/{desk.decrease()}.json",
    dag=salesforce_to_s3_dag,
)
...

Salesforce object partial ingestion based mostly on time interval

This job ingests all of the fields of the Salesforce object outlined in sf_obj. The time interval could be relative utilizing from_date or to_date parameters or absolute by utilizing each parameters.

The next instance illustrates relative ingestion from the outlined date:

...
salesforce_to_s3 = SalesforceToS3Operator(
    task_id="Opportunity_to_S3",
    sf_conn_id=default_args["sf_conn_id"],
    sf_obj="Alternative",
    from_date="YESTERDAY",
    fmt="ndjson",
    s3_conn_id=default_args["s3_conn_id"],
    s3_bucket=default_args["s3_bucket"],
    s3_key=f"salesforce/uncooked/dt={s3_prefix}/{desk.decrease()}.json",
    dag=salesforce_to_s3_dag,
)
...

The from_date and to_date parameters assist Salesforce date-time format. It may be both a selected date or literal (for instance TODAY, LAST_WEEK, LAST_N_DAYS:5). For extra details about date codecs, see Date Codecs and Date Literals.

For the total DAG, seek advice from the pattern in GitHub repository.

This code dynamically generates duties that run queries to retrieve the info of the Alternative object within the type of 1-month batches.

The sf_fields parameter permits us to extract solely the chosen fields from the thing.

Save the DAG regionally as salesforce_to_s3.py.

Deploy your DAG utilizing the mission CI/CD pipeline

As a part of the CDK deployment, a CodeCommit repository and CodePipeline pipeline had been created so as to constantly construct, take a look at, and deploy DAGs into your Amazon MWAA atmosphere.

To deploy the brand new DAG, the supply code must be dedicated to the CodeCommit repository. This triggers a CodePipeline run that builds, assessments, and deploys your new DAG and makes it obtainable in your Amazon MWAA atmosphere.

  1. Register to the CodeCommit console in your deployment Area.
  2. Beneath Supply, select Repositories.

It’s best to see a brand new repository mwaaproject.

  1. Push your new DAG within the mwaaproject repository below dags. You’ll be able to both use the CodeCommit console or the Git command line to take action:
    1. CodeCommit console:
      1. Select the mission CodeCommit repository identify mwaaproject and navigate below dags.
      2. Select Add file after which Add file and add your new DAG.
    2. Git command line:
      1. To have the ability to clone and entry your CodeCommit mission with the Git command line, be certain Git shopper is correctly configured. Consult with Organising for AWS CodeCommit.
      2. Clone the repository with the next command after changing <area> together with your mission Area:
        git clone https://git-codecommit.<area>.amazonaws.com/v1/repos/mwaaproject

      3. Copy the DAG file below dags and add it with the command:
        git add dags/salesforce_to_s3.py

      4. Commit your new file with a message:
        git commit -m "add salesforce DAG"

      5. Push the native file to the CodeCommit repository:

The brand new commit triggers a brand new pipeline that builds, assessments, and deploys the brand new DAG. You’ll be able to monitor the pipeline on the CodePipeline console.

  1. On the CodePipeline console, select Pipeline within the navigation pane.
  2. On the Pipelines web page, you need to see mwaaproject-pipeline.
  3. Select the pipeline to show its particulars.

After checking that the pipeline run is profitable, you may confirm that the DAG is deployed to the S3 bucket and due to this fact obtainable on the Amazon MWAA console.

  1. On the Amazon S3 console, search for a bucket beginning with mwaairflowstack-mwaaenvstackne and go below dags.

It’s best to see the brand new DAG.

  1. On the Amazon MWAA console, select DAGs.

It’s best to be capable to see the brand new DAG.

Run your DAG on the Airflow UI

Go to the Airflow UI and toggle on the DAG.

This triggers your DAG routinely.

Later, you may proceed manually triggering it by selecting the run icon.

Select the DAG and Graph View to see the run of your DAG.

When you have any problem, you may test the logs of the failed duties from the duty occasion context menu.

Show your information in Amazon S3 (with S3 Choose)

To show your information, full the next steps:

  1. On the Amazon S3 console, within the Buckets checklist, select the identify of the bucket that incorporates the output of the Salesforce information (airflow-blog-post).
  2. Within the Objects checklist, select the identify of the folder that has the thing that you simply copied from Salesforce (alternative).
  3. Select the uncooked folder and the dt folder with the newest timestamp.
  4. Choose any file.
  5. On the Actions menu, select Question with S3 Choose.
  6. Select Run SQL question to preview the info.

Clear up

To keep away from incurring future costs, delete the AWS CloudFormation stack and the sources that you simply deployed as a part of this publish.

  1. On the AWS CloudFormation console, delete the stack MWAAirflowStack.

To scrub up the deployed sources utilizing the AWS Command Line Interface (AWS CLI), you may merely run the next command:

cdk destroy MWAAirflowStack

Be sure to are within the root path of the mission once you run the command.

After confirming that you simply need to destroy the CloudFormation stack, the answer’s sources are deleted out of your AWS account.

The next screenshot exhibits the method of deploying the stack:

The next screenshot confirms the stack is undeployed.

  1. Navigate to the Amazon S3 console and find the 2 buckets containing mwaairflowstack-mwaaenvstack and mwaairflowstack-mwaaproj that had been created throughout the deployment.
  2. Choose every bucket delete its contents, then delete the bucket.
  3. Delete the IAM position created to jot down on the S3 buckets.

Conclusion

ENGIE found important worth by utilizing Amazon MWAA, enabling its world enterprise items to ingest information in additional productive methods. This publish offered how ENGIE scaled their information ingestion pipelines utilizing Amazon MWAA. The primary a part of the publish described the structure elements and easy methods to efficiently deploy a CI/CD pipeline for an Amazon MWAA mission template utilizing a CodeCommit repository and plug it into CodePipeline to construct, take a look at, and bundle the code and customized plugins. The second half walked you thru the steps to automate the ingestion course of from Salesforce utilizing Airflow with an instance. For the Airflow configuration, you used Airflow variables, however you can too use Secrets and techniques Supervisor with Amazon MWAA utilizing the secretsBackend parameter when deploying the stack.

The use case mentioned on this publish is only one instance of how you should use Amazon MWAA to make it simpler to arrange and function end-to-end information pipelines within the cloud at scale. For extra details about Amazon MWAA, try the Consumer Information.


Concerning the Authors

Anouar Zaaber is a Senior Engagement Supervisor in AWS Skilled Providers. He leads inner AWS, exterior companion, and buyer groups to ship AWS cloud providers that allow the purchasers to comprehend their enterprise outcomes.

Amine El Mallem is a Information/ML Ops Engineer in AWS Skilled Providers. He works with clients to design, automate, and construct options on AWS for his or her enterprise wants.

Armando Segnini is a Information Architect with AWS Skilled Providers. He spends his time constructing scalable huge information and analytics options for AWS Enterprise and Strategic clients. Armando additionally likes to journey along with his household all around the globe and take photos of the locations he visits.

Mohamed-Ali Elouaer is a DevOps Guide with AWS Skilled Providers. He’s a part of the AWS ProServe group, serving to enterprise clients remedy advanced issues associated to automation, safety, and monitoring utilizing AWS providers. In his free time, he likes to journey and watch films.

Julien Grinsztajn is an Architect at ENGIE. He’s a part of the Digital & IT Consulting ENGIE IT group engaged on the definition of the structure for advanced tasks associated to information integration and community safety. In his free time, he likes to journey the oceans to satisfy sharks and different marine creatures.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments