Geetha M

Geetha M Email and Phone Number

Senior QA Analyst @ Thomson Reuters
United States
Geetha M's Location
Minneapolis, Minnesota, United States, United States
About Geetha M

• Around 8+ years of IT experience in Analysis, data engineering, developing, optimizing, and maintaining scalable data pipelines for large datasets in cloud-based environments.• Experience managing the full lifecycle of application and software development, including planning, design, implementation, testing, and deployment.• Installing Packages and setting up an CDH cluster coordinating with Zookeeper, Spark, Kafka, HDFS.• Experience in data analysis using HIVE, PIG LATIN, HBASE and custom Map Reduce programs in Java.• Experience in writing custom UDFs in JAVA and SCALA for HIVE and PIG TO EXTEND THE FUNCTIONALITY.• Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations• Proficient in leveraging cloud platforms and technologies such as Docker, Kubernetes, AWS, Azure, and Snowflake for scalable data storage, processing, and application deployment.• Experience with Cloudera and Horton works distributions.• Over2+ years’ experience on SPARK, SCALA, HBASE and KAFKA.• Developed analytical components using KAFKA, SCALA, SPARK, HBASE and SPARK STREAM.• Experience in working with Flume to load the log data from multiple sources directly into HDFS.• Pretty Good knowledge On the Hortonworks administration and security things such as Apache Ranger,Knox Gateway, HighAvailability.• Proficient in using industry-standard IDEs such as Eclipse, IntelliJ, and JBoss for efficient application development, debugging, and testing.• Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.• Involved in creating HDINSIGHT cluster in MICROSOFT AZURE PORTAL also created EVENTSHUB and AZURE SQL DATABASES.• Worked on a clustered Hadoop for Windows Azure using HDInsight and HORTONWORKS Data Platform for Windows.• Built real time pipeline for streaming data using EVENTSHUB/MICROSOFT AZURE Queue and SPARK STREAMING.• Read the data from HBase to Spark toperform Join on different tables.• Created the HBase tables for validation, audit and offset management table.• Created logical view instead of tables in order to enhance the performance of hive queries.• Involved in developing Hive DDLS to create, alter and drop Hive tables• Spark Streaming collects this data from Kafka in near - real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in Cassandra cluster.

Geetha M's Current Company Details
Thomson Reuters

Thomson Reuters

View
Senior QA Analyst
United States
Website:
tr.com
Employees:
35941
Geetha M Work Experience Details
  • Thomson Reuters
    Senior Qa Analyst
    Thomson Reuters
    United States
  • Thomson Reuters
    Data Engineer
    Thomson Reuters Jul 2022 - Present
    Eagan, Minnesota, United States
    • New Development and enhancement to the ETL Development using Apache Nifi• Worked in Agile environments, collaborating with cross-functional teams to deliver high-quality software through iterative development cycles and continuous improvement.• Writing complex SQL to generate reports / extracts• Involved in managing and reviewing Hadoop log files.• Imported data using Sqoop to load data from MySQL to HDFS on regular basis.• Developing Scripts and Batch Job to schedule various Hadoop Program.• Written Hive queries for data analysis to meet the business requirements.• Creating Hive tables and working on them using Hive QL.• Querying the structured tables such as My SQL.• Importing and exporting data into HDFS and Hive using Sqoop. Experienced in defining job flows.• Troubleshooting the Production Issues• Helping in Preparing Test case design, testing and documenting detailed enhancement.• Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.• Developed framework to check data quality of datasets, schema defined in cloud. worked on Amazon Web service (AWS) to integrate EMR with Spark 2 and S3 storage and Snowflake.• Configured Spark streaming to receive real time data from the Kafka and store the stream data into AWS S3 using Scala.• Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.• Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Empower Retirement
    Data Engineer
    Empower Retirement Oct 2020 - Jun 2022
    Greenwood Village, Colorado, United States
    • Data Ingestion from relational databases into HDFS using Sqoop import/export and also created Sqoop Job, Evaluate, and incremental jobs.• Developed and executed data validation and verification processes, enhancing data integrity within the MDM environment.• Developed end-to-end ETL pipelines using DBT and Azure Data Factory.• Configured and monitored Hadoop cluster environments for data ingestion tasks.• Designed and maintained data pipelines integrating AWS Lambda and DynamoDB.• Automated data validation and anomaly detection using Python and Spark.• Managed and monitored CI/CD pipelines for deployment consistency.• Built secure data solutions with AWS S3 and fine-grained access control policies.• Developed data governance frameworks to ensure compliance.• Designed real-time analytics using Kafka with Spark Streaming for dynamic reporting.
  • Nationwide
    Big Data Developer
    Nationwide Mar 2018 - Sep 2020
    • Gather requirements from Business and Engineering Stewards in regard to the Initial Data Dictionary and Final Data Dictionary.• Migrated SQL Server and Oracle databases to Azure Data Lake and AWS S3.• Built ETL pipelines using Kafka and Spark for faster data processing.• Implemented CI/CD pipelines for data flow deployment and versioning.• Automated data quality checks using Hive queries and Python scripts.• Created detailed documentation for ingestion and processing workflows.• Designed custom dashboards in Power BI for real-time monitoring.• Developed secure access policies for managing sensitive data in cloud environments.• Conducted regression testing for ingestion pipelines and workflows• Writing python and shell wrapper scripts based on input source data like (tar.gz, txt and csv) files.• Installed and configured Hadoop and Ecosystem components in Cloudera and Hortonworks environments.• Designed and completed a new 3 pages ETL job dashboard with Power BI.• Developed SAS programs using SAS/BASE and SAS/SQL for preparing analysis and reports from databases.• Created MongoDB clusters and hands on experience with complex MongoDB aggregate functions and mapping.• Spearheaded the development of ETL processes using .NET technologies to extract, transform, and load large volumes of data into the data warehouse, resulting in a 20% improvement in data processing efficiency.• Worked with Linux systems and MySQL database on a regular basis.• Developed Jenkins pipelines for continuous integration and deployment purpose.
  • Ceequence Technologies Pvt Ltd
    Java Developer
    Ceequence Technologies Pvt Ltd Jul 2015 - Dec 2017
    India
    • Design, build, and launch extremely efficient and reliable data pipelines to move data across several platforms including Data Warehouse, online caches, and real-time systems.• Experience working with Azure Function Apps and App Services• Experience on Data Bricks, scripting language like Scala, Shell, PowerShell.• Worked with azure data bricks notebooks to validate the inbound/out bound from an external source like Amperity.• Experience in creating a reliable Data pipeline to move the data across several platforms including snowflake, Azure Delta Lake, blob, External Dashboards.• Experience in crating spark/Scala application on ETL operations.• Source to Target mapping/streaming of different data transfer via API or Azure Data Factory (ADF) Pipelines and to troubleshoot or implement different logics based on different requirements.• Writing complex SQL queries to drive analysis and insights.• Building the Pipelines with Spark Jar / notebook activities.• Used Spark-Streaming APIs to perform required transformations and actions on the learner data model which gets the data from Kafka in near real time.• Worked on stream sets, on reading and writing continuously from the Kafka cluster.• Developed Data CI/CD pipeline using Data controller UI and Control Hub.• Performed required Transformation, by configuring Evaluation Processor. worked on the installed transformer, configuring spark streaming to perform required transformations.

Geetha M Education Details

Frequently Asked Questions about Geetha M

What company does Geetha M work for?

Geetha M works for Thomson Reuters

What is Geetha M's role at the current company?

Geetha M's current role is Senior QA Analyst.

What schools did Geetha M attend?

Geetha M attended Jntuh College Of Engineering Hyderabad.

Who are Geetha M's colleagues?

Geetha M's colleagues are Casey Teichman, Gordon Aitchison, Sahana Bhat, Harini V, Osiel Do Couto, Gomes Gilberto, Winnay Vuppula.

Not the Geetha M you were looking for?

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles
Get direct phone numbers & mobile contacts
Access company data & employee information
Works directly on LinkedIn - no copy/paste needed
Get Chrome Extension - Free

Aero Online

Your AI prospecting assistant

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.