Anusha T Email and Phone Number
Actively seeking a jobAround 9 years of experience in Data Engineering, Data Pipeline Design, and Implementation as a Sr. Data Engineer/Data Developer and Data Modeler• Proficient in configuring and supporting Kafka, Spark, HBase, and HDFS with Zookeeper.• Experienced in setting up Azure infrastructure for optimized utilization of analytical requirements.• Skilled in JSON script generation, UNIX shell scripting, and ETL with Sqoop.• Proficient in Data Analysis and Data wrangling with R and Python.• Knowledgeable in NoSQL databases like HBase, Cassandra, and MongoDB.• Developed log producer in Scala for Kafka-based log collection platform.• Experienced with Oozie, CI/CD with Azure DevOps, and version control with Git.• Skilled in Hadoop, Hive, Spark SQL, and other ecosystem tools for big data processing.• Proficient in building and maintaining environments on Azure IAAS and PAAS.• Hands-on experience with Kafka producers and consumers for high-throughput streaming.• Experienced in working with AWS, EMR, S3, and CloudWatch for Hadoop and Spark jobs.• Skilled in ETL with Sqoop to ingest data from RDBMS to Hive and HDFS.• Implemented sentiment analysis and text analytics using Scala and Python.• Proficient in installing and configuring Hadoop ecosystem components.• Experienced in real-time data processing with Spark streaming and Kafka.• Skilled in pipeline development with Apache Airflow, Kafka, and NiFi.• Proficient in Perl, Python, Scala, and Java.• Successfully migrated projects from Cloudera Hadoop Hive to Azure Data Lake Store.• Familiar with various databases, version control, and web technologies.• Knowledgeable in SDLC, Agile, UML, and Design Patterns.• Experience with ETL tools like DataStage and Informatica
Cardinal Health
View- Website:
- cardinalhealth.com
- Employees:
- 27278
-
Azure Data EngineerCardinal Health Jan 2022 - PresentDublin, Ohio, United States• Worked on building the data pipelines (ELT/ETL Scripts), extracting the data from different sources (MySQL, AWS S3 files), transforming and loading the data to the Data Warehouse (AWS Redshift)• Worked on adding the Rest API layer to the ML models built using Python, Flask& deploying the models in AWS Beanstalk Environment using Docker containers.• Worked on developing & adding a few Analytical dashboards using Looker product.• Worked on building the aggregate tables & de-normalized tables, populating the data using ETL to improve thelooker analytical dashboard performance and to help data scientist and analysts to speed up the ML model training & analysis.• Created New Dashboards, reports, scheduled searches, and alerts using splunk.• Integrated Pager duty with Splunk to generate the Incidents from Splunk.• Developed and build data engineering pipelines using PySpark/Python in AWS using services like Lambda, S3, Glue, Step Functions, DynamoDB, Athena, Catalog.• Developed custom Jenkins jobs/pipelines that contained Bash shell scripts utilizing the AWS CLI to automate infrastructure provisioning.• Developed a user-eligibility library using Python to accommodate the partner filters and exclude these users from receiving credit products.• Built the data pipelines to aggregate the user click stream session data using spark streaming module which reads the click stream data from Kinesis streams and store the aggregate results in S3 and data and eventually loaded to AWS Redshift warehouse.• Knowledge and experience on using Python Numpy, Pandas, Sci-kit Learn, Onyx & Machine Learning• Worked on building the data pipelines using PySpark (AWS EMR), processing the data files present in S3 and loading it to Redshift• Experience working with tools like Airflow for scheduling jobs and ad-hoc manual jobs.• Developed Spark Applications by using Python and Implemented Apache Spark data processing Project tohandle data from various RDBMS and Streaming sources. -
Aws Data EngineerHsbc Feb 2018 - Dec 2021Mclean, Virginia, United States• Worked on building the data pipelines (ELT/ETL Scripts), extracting the data from different sources (MySQL, AWS S3 files), transforming and loading the data to the Data Warehouse (AWS Redshift)• Worked on adding the Rest API layer to the ML models built using Python, Flask& deploying the models in AWS Beanstalk Environment using Docker containers.• Worked on developing & adding a few Analytical dashboards using Looker product.• Worked on building the aggregate tables & de-normalized tables, populating the data using ETL to improve thelooker analytical dashboard performance and to help data scientist and analysts to speed up the ML model training & analysis.• Created New Dashboards, reports, scheduled searches, and alerts using splunk.• Integrated Pager duty with Splunk to generate the Incidents from Splunk.• Developed and build data engineering pipelines using PySpark/Python in AWS using services like Lambda, S3,Glue, Step Functions, DynamoDB, Athena, Catalog.• Developed custom Jenkins jobs/pipelines that contained Bash shell scripts utilizing the AWS CLI to automate infrastructure provisioning.• Developed a user-eligibility library using Python to accommodate the partner filters and exclude these users.from receiving credit products.• Built the data pipelines to aggregate the user click stream session data using spark streaming module which reads the click stream data from Kinesis streams and store the aggregate results in S3 and data and eventually loaded to AWS Redshift warehouse.• Knowledge and experience on using Python Numpy, Pandas, Sci-kit Learn, Onyx & Machine Learning• Worked on building the data pipelines using PySpark (AWS EMR), processing the data files present in S3 andloading it to Redshift• Experience working with tools like Airflow for scheduling jobs and ad-hoc manual jobs.• Developed Spark Applications by using Python and Implemented Apache Spark data processing Project tohandle data from various RDBMS and Streaming sources. -
Data Warehouse DeveloperMars Apr 2016 - Feb 2018Mclean, Virginia• Leveraged Azure cloud components, including Databricks, Data Lake, Blob Storage, Data Factory, Storage Explorer, SQL DB, SQL DWH, and Cosmos DB, to comprehensively store, process, and analyze data. • Utilized Databricks and Spark cluster capabilities to examine data from Azure data storage, contributing to data-driven decision-making. • Oversaw the end-to-end development and optimization of Extract, Transform, Load (ETL) processes, ensuring seamless data extraction from diverse source systems, transforming data to OMOP format, and loading into the OMOP-compliant data repository. • Orchestrated data extraction, transformation, and loading processes through Azure Data Factory, Databricks, PySpark, Spark SQL, and U-SQL Azure Data Lake Analytics, achieving smooth integration with Azure Data Storage services. • Managed pipelines in Azure Data Factory, including Linked Services, Datasets, and Pipeline components, facilitating efficient ETL processes from Azure SQL, Blob storage, and Azure SQL Data Warehouse. • Demonstrated expertise in Snowflake and Azure, instrumental in driving data-driven decision-making and supporting the organization's overall data strategy. • Leveraged Azure BLOB and Data Lake storage to successfully load data into Azure SQL Synapse analytics (DW), enhancing analytical capabilities. • Constructed efficient data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL. Engineered a real-time data streaming solution using Azure EventHub, ensuring efficient and reliable data processing for timely insights. • Developed and deployed Spark Streaming applications to process real-time data from various sources like Kafka and Azure Event Hubs, enabling valuable real-time analytics.• Implemented partitioned and bucketed Hive tables in Parquet File Formats with Snappy compression, optimizing data storage and enabling faster querying. -
Data EngineerIndusind Bank May 2015 - Mar 2016Bengaluru, Karnataka, India• Used cloud-native tools and technology to drive data integrity and accessibility for the company by optimizing data processing and storage.• Responsible for the execution of big data analytics, predictive analytics, and machine learning initiatives.• Built real time data pipelines by developing Kafka producers and Spark Streaming applications for processing large-scale data from oil and gas operations.• Utilize AWS services with focus on big data architecture/analytics/enterprise Data warehouse and business intelligence solutions.• Experience in AWS services like EC2, EMR, S3, DynamoDB, Athena, RedShift, Glue.• Developed Scala scripts, UDF’s using data frames/SQL and RDD in spark for data aggregation, queries and writing to S3 bucket.• Filtering and cleaning the data using Scala and SQL queries.• Automated data workflows using Python and Apache Airflow, resulting in increased efficiency, and reduced manual errors.• Collaborated with data analysts and data scientists to provide high-quality data for business intelligence and machine learning models.• Developed Spark SQL scripts using PySpark to perform transformations and actions on Data Frames, Data Set in Spark for faster data processing.• Implemented Spark RDD transformations and actions and Automated scripts and workflows using Apache Airflow and shell scripting.• Experience in loading data from Hive to S3, RedShift using Spark API.• Implemented partitions, bucketing concepts in Hive for query optimization and designed both Managed and External tables in Hive to optimize performance.• Design and implement data pipelines and ETL processes using Apache NiFi.• Created data pipelines for extracting, transforming, and loading data from various sources, including internal and external APIs.• Processed batch and streaming data load pipeline using Snow Pipe.• Created scripts to read and load JSON and parquet files using Python.
Frequently Asked Questions about Anusha T
What company does Anusha T work for?
Anusha T works for Cardinal Health
What is Anusha T's role at the current company?
Anusha T's current role is Data Engineer | Open to Work | MS IT and Management.
Who are Anusha T's colleagues?
Anusha T's colleagues are Troy Landry, Josh Edwards, Augustine Maestas, Ying Sixlin, Timothy Gabel, Afghan Samiullah, Dan Ourada.
Not the Anusha T you were looking for?
Free Chrome Extension
Find emails, phones & company data instantly
Download 750 million emails and 100 million phone numbers
Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.
Start your free trial