Anusha T

Anusha T Email and Phone Number

Data Engineer | Open to Work | MS IT and Management @ Cardinal Health
dublin, ohio, united states
Anusha T's Location
Little Elm, Texas, United States, United States
About Anusha T

Actively seeking a jobAround 9 years of experience in Data Engineering, Data Pipeline Design, and Implementation as a Sr. Data Engineer/Data Developer and Data Modeler• Proficient in configuring and supporting Kafka, Spark, HBase, and HDFS with Zookeeper.• Experienced in setting up Azure infrastructure for optimized utilization of analytical requirements.• Skilled in JSON script generation, UNIX shell scripting, and ETL with Sqoop.• Proficient in Data Analysis and Data wrangling with R and Python.• Knowledgeable in NoSQL databases like HBase, Cassandra, and MongoDB.• Developed log producer in Scala for Kafka-based log collection platform.• Experienced with Oozie, CI/CD with Azure DevOps, and version control with Git.• Skilled in Hadoop, Hive, Spark SQL, and other ecosystem tools for big data processing.• Proficient in building and maintaining environments on Azure IAAS and PAAS.• Hands-on experience with Kafka producers and consumers for high-throughput streaming.• Experienced in working with AWS, EMR, S3, and CloudWatch for Hadoop and Spark jobs.• Skilled in ETL with Sqoop to ingest data from RDBMS to Hive and HDFS.• Implemented sentiment analysis and text analytics using Scala and Python.• Proficient in installing and configuring Hadoop ecosystem components.• Experienced in real-time data processing with Spark streaming and Kafka.• Skilled in pipeline development with Apache Airflow, Kafka, and NiFi.• Proficient in Perl, Python, Scala, and Java.• Successfully migrated projects from Cloudera Hadoop Hive to Azure Data Lake Store.• Familiar with various databases, version control, and web technologies.• Knowledgeable in SDLC, Agile, UML, and Design Patterns.• Experience with ETL tools like DataStage and Informatica

Anusha T's Current Company Details
Cardinal Health

Cardinal Health

View
Data Engineer | Open to Work | MS IT and Management
dublin, ohio, united states
Employees:
27278
Anusha T Work Experience Details
  • Cardinal Health
    Azure Data Engineer
    Cardinal Health Jan 2022 - Present
    Dublin, Ohio, United States
    • Worked on building the data pipelines (ELT/ETL Scripts), extracting the data from different sources (MySQL, AWS S3 files), transforming and loading the data to the Data Warehouse (AWS Redshift)• Worked on adding the Rest API layer to the ML models built using Python, Flask& deploying the models in AWS Beanstalk Environment using Docker containers.• Worked on developing & adding a few Analytical dashboards using Looker product.• Worked on building the aggregate tables & de-normalized tables, populating the data using ETL to improve thelooker analytical dashboard performance and to help data scientist and analysts to speed up the ML model training & analysis.• Created New Dashboards, reports, scheduled searches, and alerts using splunk.• Integrated Pager duty with Splunk to generate the Incidents from Splunk.• Developed and build data engineering pipelines using PySpark/Python in AWS using services like Lambda, S3, Glue, Step Functions, DynamoDB, Athena, Catalog.• Developed custom Jenkins jobs/pipelines that contained Bash shell scripts utilizing the AWS CLI to automate infrastructure provisioning.• Developed a user-eligibility library using Python to accommodate the partner filters and exclude these users from receiving credit products.• Built the data pipelines to aggregate the user click stream session data using spark streaming module which reads the click stream data from Kinesis streams and store the aggregate results in S3 and data and eventually loaded to AWS Redshift warehouse.• Knowledge and experience on using Python Numpy, Pandas, Sci-kit Learn, Onyx & Machine Learning• Worked on building the data pipelines using PySpark (AWS EMR), processing the data files present in S3 and loading it to Redshift• Experience working with tools like Airflow for scheduling jobs and ad-hoc manual jobs.• Developed Spark Applications by using Python and Implemented Apache Spark data processing Project tohandle data from various RDBMS and Streaming sources.
  • Hsbc
    Aws Data Engineer
    Hsbc Feb 2018 - Dec 2021
    Mclean, Virginia, United States
    • Worked on building the data pipelines (ELT/ETL Scripts), extracting the data from different sources (MySQL, AWS S3 files), transforming and loading the data to the Data Warehouse (AWS Redshift)• Worked on adding the Rest API layer to the ML models built using Python, Flask& deploying the models in AWS Beanstalk Environment using Docker containers.• Worked on developing & adding a few Analytical dashboards using Looker product.• Worked on building the aggregate tables & de-normalized tables, populating the data using ETL to improve thelooker analytical dashboard performance and to help data scientist and analysts to speed up the ML model training & analysis.• Created New Dashboards, reports, scheduled searches, and alerts using splunk.• Integrated Pager duty with Splunk to generate the Incidents from Splunk.• Developed and build data engineering pipelines using PySpark/Python in AWS using services like Lambda, S3,Glue, Step Functions, DynamoDB, Athena, Catalog.• Developed custom Jenkins jobs/pipelines that contained Bash shell scripts utilizing the AWS CLI to automate infrastructure provisioning.• Developed a user-eligibility library using Python to accommodate the partner filters and exclude these users.from receiving credit products.• Built the data pipelines to aggregate the user click stream session data using spark streaming module which reads the click stream data from Kinesis streams and store the aggregate results in S3 and data and eventually loaded to AWS Redshift warehouse.• Knowledge and experience on using Python Numpy, Pandas, Sci-kit Learn, Onyx & Machine Learning• Worked on building the data pipelines using PySpark (AWS EMR), processing the data files present in S3 andloading it to Redshift• Experience working with tools like Airflow for scheduling jobs and ad-hoc manual jobs.• Developed Spark Applications by using Python and Implemented Apache Spark data processing Project tohandle data from various RDBMS and Streaming sources.
  • Mars
    Data Warehouse Developer
    Mars Apr 2016 - Feb 2018
    Mclean, Virginia
    • Leveraged Azure cloud components, including Databricks, Data Lake, Blob Storage, Data Factory, Storage Explorer, SQL DB, SQL DWH, and Cosmos DB, to comprehensively store, process, and analyze data. • Utilized Databricks and Spark cluster capabilities to examine data from Azure data storage, contributing to data-driven decision-making. • Oversaw the end-to-end development and optimization of Extract, Transform, Load (ETL) processes, ensuring seamless data extraction from diverse source systems, transforming data to OMOP format, and loading into the OMOP-compliant data repository. • Orchestrated data extraction, transformation, and loading processes through Azure Data Factory, Databricks, PySpark, Spark SQL, and U-SQL Azure Data Lake Analytics, achieving smooth integration with Azure Data Storage services. • Managed pipelines in Azure Data Factory, including Linked Services, Datasets, and Pipeline components, facilitating efficient ETL processes from Azure SQL, Blob storage, and Azure SQL Data Warehouse. • Demonstrated expertise in Snowflake and Azure, instrumental in driving data-driven decision-making and supporting the organization's overall data strategy. • Leveraged Azure BLOB and Data Lake storage to successfully load data into Azure SQL Synapse analytics (DW), enhancing analytical capabilities. • Constructed efficient data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL. Engineered a real-time data streaming solution using Azure EventHub, ensuring efficient and reliable data processing for timely insights. • Developed and deployed Spark Streaming applications to process real-time data from various sources like Kafka and Azure Event Hubs, enabling valuable real-time analytics.• Implemented partitioned and bucketed Hive tables in Parquet File Formats with Snappy compression, optimizing data storage and enabling faster querying.
  • Indusind Bank
    Data Engineer
    Indusind Bank May 2015 - Mar 2016
    Bengaluru, Karnataka, India
    • Used cloud-native tools and technology to drive data integrity and accessibility for the company by optimizing data processing and storage.• Responsible for the execution of big data analytics, predictive analytics, and machine learning initiatives.• Built real time data pipelines by developing Kafka producers and Spark Streaming applications for processing large-scale data from oil and gas operations.• Utilize AWS services with focus on big data architecture/analytics/enterprise Data warehouse and business intelligence solutions.• Experience in AWS services like EC2, EMR, S3, DynamoDB, Athena, RedShift, Glue.• Developed Scala scripts, UDF’s using data frames/SQL and RDD in spark for data aggregation, queries and writing to S3 bucket.• Filtering and cleaning the data using Scala and SQL queries.• Automated data workflows using Python and Apache Airflow, resulting in increased efficiency, and reduced manual errors.• Collaborated with data analysts and data scientists to provide high-quality data for business intelligence and machine learning models.• Developed Spark SQL scripts using PySpark to perform transformations and actions on Data Frames, Data Set in Spark for faster data processing.• Implemented Spark RDD transformations and actions and Automated scripts and workflows using Apache Airflow and shell scripting.• Experience in loading data from Hive to S3, RedShift using Spark API.• Implemented partitions, bucketing concepts in Hive for query optimization and designed both Managed and External tables in Hive to optimize performance.• Design and implement data pipelines and ETL processes using Apache NiFi.• Created data pipelines for extracting, transforming, and loading data from various sources, including internal and external APIs.• Processed batch and streaming data load pipeline using Snow Pipe.• Created scripts to read and load JSON and parquet files using Python.

Frequently Asked Questions about Anusha T

What company does Anusha T work for?

Anusha T works for Cardinal Health

What is Anusha T's role at the current company?

Anusha T's current role is Data Engineer | Open to Work | MS IT and Management.

Who are Anusha T's colleagues?

Anusha T's colleagues are Troy Landry, Josh Edwards, Augustine Maestas, Ying Sixlin, Timothy Gabel, Afghan Samiullah, Dan Ourada.

Not the Anusha T you were looking for?

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles
Get direct phone numbers & mobile contacts
Access company data & employee information
Works directly on LinkedIn - no copy/paste needed
Get Chrome Extension - Free

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.