Surya Teja

Surya Teja Email and Phone Number

Data Engineer at AgFirst @ AgFirst Farm Credit Bank
columbia, south carolina, united states
Surya Teja's Location
Edison, New Jersey, United States, United States
About Surya Teja

• As a Data Engineer with 6+ years of experience, I specialize in Hybrid Cloud (AWS, GCP) Data warehousing, Data engineering, Feature engineering, Hadoop big data, ETL/ELT, and Business Intelligence. I have expertise in AWS and GCP data pipelines, Cloudera, Hadoop Ecosystem, Spark, Data bricks, Redshift, Snowflake, relational databases, and tools like Tableau, Airflow, DBT, Data Pipelines etc. I also have expert level programming skills in Python and SQL.• My experience includes building data solutions using SQL Server, AWS and GCP. I have hands-on experience on Google Cloud Platform (GCP) in all the big data products such as BigQuery, CloudData Proc, Google Cloud Storage, and Composer (Air Flow as a service), etc.• I have in-depth understanding of the strategy and practical implementation of AWS Cloud-Specific technologies such as EC2, EBS, S3, VPC, RDS, SES, ELB, EMR, ECS, CloudFront, Cloud Formation, Elastic Cache, Cloud Watch, Red Shift, Lambda, SNS, DynamoDB, Sagemaker, Kinesis etc.• I have experience in Hadoop Ecosystem components like Hive, HDFS, Sqoop, Spark, and Kafka. I am skilled in designing, installation, configuration, and management of Apache Hadoop Clusters, MapR, Horton works & Cloudera Hadoop Distribution. I have a good understanding of Hadoop architecture and Hadoop components such as Resource Manager, Node Manager, Name Node, Data Node, and Map Reduce concepts and HDFS.• I have implemented Datawarehouse solutions using Snowflake Product and been involved in all phases of ETL life cycle from scope analysis, design, and build through production support.• I have also worked with Machine Learning algorithms with good understanding of various ML techniques including knowledge of high level statistics.Main Tech Stack:Relational DB: Oracle, MySQL, MS SQL ServerWarehouses: BigQuery, Redshift, SnowflakeCloud: AWS, Google CloudOrchestration: AirflowETL: SQL, SparkInfrastructure: TerraformCI/CD: Jenkins, AWS CodePipeline, Google Cloud BuildLanguages: Python, SQL

Surya Teja's Current Company Details
AgFirst Farm Credit Bank

Agfirst Farm Credit Bank

View
Data Engineer at AgFirst
columbia, south carolina, united states
Website:
agfirst.com
Employees:
511
Surya Teja Work Experience Details
  • Agfirst Farm Credit Bank
    Data Engineer
    Agfirst Farm Credit Bank Jun 2022 - Present
    Columbia, South Carolina, United States
    • Responsible for setting up technical and functional requirements, data pipelines, data preparation, design, development, modelling, testing and deployment of advanced models in the AWS cloud.• Conducted exhaustive study on various available Transformers models and estimated costing, system load, model performance, cost to performance metrics to decide on the best model that would yield the best results in terms of Development, Optimization and Maintenance parameters.• Exported data from Snowflake to S3 and created on-demand tables using AWS Lambda Functions and AWS Glue using Python and PySpark.• Designed and implemented ETL pipelines on S3 files on data lake using AWS Glue.• Leveraged Pandas, NumPy and other preprocessing libraries for Data Cleaning and Feature Engineering to incorporate the same into the pipeline component of the NLP workflow.• Worked with 7TB of Unstructured Data for a classification problem and built NLP Pipelines which includes preprocessing, modeling and testing components.• Performed text preprocessing like Tokenization, Lemmatization, etc. and Feature Engineering/Extraction on multi label imbalanced data followed by applying various sampling techniques.• Built SpaCy Huggingface transformers models like BERT, ALBERT and RoBERT with de-identified transcript.• Built metrics like Confusion Matrix and Model Performance via dashboards and integrated into sisence for real time visibility.• Finetuned BERT models on de-identified client data to streamline development & deployment in cloud.• Saved $500,000 in costs to the client by developing the advanced models for setting up processing in place for CI/CD.• Achieved an overall Model Accuracy of 93% which was a 25%-point increase from old models resulting in Significant Cost Savings to the client.Environment: AWS Cloud, Python, SQL, ETL, jupyter notebook, Pyspark, Pandas, Snowflake, Sagemaker, Huggingface, SpaCy, Transfer learning, Git, JIRA, Agile, Windows and Linux.
  • Macy'S
    Data Scientist
    Macy'S Aug 2021 - May 2022
    New York, United States
    • Estimated Kickstarter project consistency at 7% based on project description by Conducting a Similarity Analysis in python.• Applied Text Pre-Processing Techniques on the data and did a similarity analysis on two of the key evaluation features which shows the project goals consistency on Kickstarter.• Estimated that 36% of reddit content was regarding health issues in 2020 in python using NLP Techniques like LDA, TF-IDF etc.• Applied text pre-processing techniques on the data and did Topic Modelling Analysis which gave a clear picture of the nature of content posted on reddit.• Developed automated scripts to Scrape Kickstarter and Reddit Text data and build a corpus.• Saved the web scraped data to GCS, and built ETL process to automated the data loading process.• Employed multi core processing and leveraged GPU to reduce latency while scarping data and save to the cloud and built Data Pipelines in Airflow in GCP.• Experience in building and architecting Multiple Data Pipelines, end to end ETL process for Data Ingestion and Transformations in GCP and coordinate tasks among the team.• Integrated Git with GCP for version control and collaborative development effort.• Leveraged Pandas, NumPy and other preprocessing libraries for Data Cleaning and Feature Engineering to incorporate the same into the pipeline component of the NLP workflow.• Performed data Pre-Processing and Vectorization using NLTK and Gensim libraries in python.• Built the data warehouse with 3TB of Unstructured Data for a semantic analysis problem and built NLP Pipelines which includes preprocessing, modelling and testing components.• Utilized Agile process and JIRA issue management to track sprint cycles and prepared documentation for future quantitative analysis.Environment: Google Cloud, Python, SQL, ETL, jupyter notebook, Pandas, Snowflake, Gensim, NLTK, SpaCy, Git, JIRA, Agile, Windows and Linux.
  • University Of South Florida
    Data Engineer
    University Of South Florida Jan 2021 - Jul 2021
    Tampa, Florida, United States
    • Responsible for building the data warehouse, data pipelines, data preparation, design, development, modelling, testing, deployment and integration with dashboarding tool in the Google Cloud.• Developed automated scripts to scrape SEC 10k and YouTube data and build the data warehouse for companies in IT domain.• Saved the data to GCS, created tables and saved to BigQuery and built ETL process to automated the data loading process.• Employed multi core processing and leveraged GPU to reduce latency while scarping data and save to the cloud and built Data Pipelines in Airflow in GCP.• Experience in building and architecting Multiple Data Pipelines, end to end ETL process for Data Ingestion and Transformations in GCP and coordinate tasks among the team.• Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.• Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators incorporating old and newer operators.• Created BigQuery authorized views for row level security or exposing the data to other teams.• Experience in building multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.• Design and implement various layer of Data Lake, Design star schema in Big Query.• Using g-cloud function with Python to load data into BigQuery for on-arrival csv files in GCS bucket.• Process and load bound and unbound data from Google pub/sub topic to BigQuery using cloud Dataflow with Python.• Developed and Demonstrated the POC, to migrate on-prem workload to Google Cloud Platform using GCS, Big Query, Cloud SQL and Cloud DataProc.Environment: Google Cloud, BigQuery, Python, SQL, ETL, jupyter notebook, Pandas, Snowflake, SpaCy, Git, JIRA, Agile, Windows and Linux.
  • Tampa General Hospital
    Machine Learning Engineer
    Tampa General Hospital Feb 2020 - Dec 2020
    Tampa, Florida, United States
    • Extracted data from multiple source systems S3, Redshift, RDS and Created Multiple Tables/Databases in Glue Catalog by creating Glue Crawlers.• Used AWS Data Pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python matplot library.• Responsible for gathering requirements, analysis, design, development, model building, testing and deploying model in AWS Cloud Environment.• Used AWS Glue Catalog with crawler to get the data from S3 and perform SQL query operations and JSON schema to Define Table and Column Mapping from S3 data to Redshift.• In Tableau development environment, Supported Customer Service designing ETL Jobs, Dashboards utilizing data from Redshift.• Wrote scripts for Schema Validation, Transformations and Logging and connected the same to a dashboarding tool.• Built Random Forest on key features from patient data which Improved clinic Operating Efficiency by 38%.• Deployed the model using AWS Sagemaker and built a simple front end user interface to check the daily status of clinic performance.• Reduced the Insurance Processing Time by 7 days and improved the efficiency by 200%.• Used GitHub for version control and team collaboration.• Utilized Agile process and JIRA issue management to track sprint cycles.Environment: AWS EC2, Python, MySQL, jupyter notebook, Sagemaker, Ensemble Methods, Git, JIRA, Agile, Windows and Linux.
  • Amazon
    Data Engineer
    Amazon Sep 2017 - Jul 2019
    Hyderabad, Telangana, India
    • Translate functional and technical requirements into detail Specifications running on AWS using services EC2, ECS, RDS Aurora MySQL, SQS, SNS, KMS, Athena.• Involved in gathering business requirements, logical modeling, database design, data sourcing and data transformation, data loading, SQL, and performance tuning.• Structured multiple projects in Object Oriented Programming in python, developed dashboards in Tableau and executed cloud instances.• Created Stored Procedures, Views and User Defined Functions to support the front-end application.• Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats.• Preprocessed Data using pandas dataframes for generating throughput capacity, product to zip mappings based on inventory breadth constraints, initial inventory, and other flow related data.• Developed and maintained PAN-EU Inbound Analytics and Reporting tools that facilitated freight planning and scheduling.• Predicted labor productivity with 82% accuracy which led to Freight Optimization across regions in EU.• Identified 50% more KPI’s from queried data using SQL after Implementing Data Cleaning techniques in Python.• Collaborated with supply chain for inception and Deployment of 3PL sites with 92% success.• Concatenated Multiple Data Tables to forecast Supply Chain Dynamics (~27M orders) across various time frames.• Parallelized the Data Flow Leveraging AWS Cloud tools like EC2, redshift, kinesis, Glue etc. to build data pipelines and added logging functionality.• Used GitHub for version control, team collaboration and utilized Agile process and JIRA issue management to track sprint cycles.Environment: Python, jupyter notebook, Tableau, Pandas, numpy, JSON, CICD script, Agile, SQL, Unix, AWS Services – S3, EC2, Redshift.
  • Mahindra Group
    Junior Data Engineer
    Mahindra Group Jun 2015 - Aug 2017
    Hyderabad, Telangana, India
    • Used various AWS Services including S3, EC2, AWS Glue, Athena, RedShift, EMR, MS, Kinesis etc.• Extracted data from multiple source systems S3, Redshift, RDS and Created Multiple Tables/Databases in Glue Catalog by creating Glue Crawlers.• Used AWS Data Pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python matplot library.• Saved Over $90,000 in Costs by utilizing Lean-Agile methods and by leveraging data achieved a 15.3% reduction in rollout times.• Exported data from Redshift to S3 using AWS Glue Jobs, Pyspark Context, boto3 and repartition procedures to convert to parquet format using create dynamic frames.• Visualized the data using Seaborn and Matplotlib to understand the relation between features and analyzed the statistical distribution of data.• Designed and Implemented ETL Pipelines on S3 parquet files on data lake using AWS Glue.• Managed Storage in AWS using Elastic Block Storage, S3, created Volumes and Configured Snapshots.• Experience Configuring AWS S3 and their lifecycle policies and to backup files and archive files in Amazon Glacier.• Experience in Creating and Maintaining the Databases in AWS using RDS.• Utilized Agile process and JIRA issue management to track sprint cycles.Environment: Python, jupyter notebook, Pandas, Amazon Web Services (AWS), Git, JIRA, SQL, Agile, Tableau, Windows and Linux.

Surya Teja Education Details

Frequently Asked Questions about Surya Teja

What company does Surya Teja work for?

Surya Teja works for Agfirst Farm Credit Bank

What is Surya Teja's role at the current company?

Surya Teja's current role is Data Engineer at AgFirst.

What schools did Surya Teja attend?

Surya Teja attended University Of South Florida, Gitam Deemed University.

Who are Surya Teja's colleagues?

Surya Teja's colleagues are Amber Young, Jake Loadholdt, Rob Nettles, Cj Ladson, Kimberly Johnson, Pam Ulmer, Steve Francis.

Not the Surya Teja you were looking for?

  • Surya Teja

    Actively Looking For Iam Engineer | Expert In Identity And Access Management Solutions | Specializing In #Sailpoint, #Cyberark, #Azure Ad, #Ping And Sso Integrations "Open To Work"
    United States
  • Surya Teja

    Charlotte, Nc
    3
    yahoo.com, loomissayles.com, hotmail.com
  • Surya Teja

    Looking For Sr Us It Recruiter Wfh
    Norcross, Ga
  • Surya Teja

    Full Stack Developer At Optum| Java | Reactjs | Azure | Aws | Javascript | Python | Sql| Sprinboot | Web Services | Kafka | Microservices| Ex-Optum
    Plano, Tx
  • Surya Teja

    Full Stack Java Software Developer At Walmart
    Chicago, Il

Free Chrome Extension

Find emails, phones & company data instantly

Find verified emails from LinkedIn profiles
Get direct phone numbers & mobile contacts
Access company data & employee information
Works directly on LinkedIn - no copy/paste needed
Get Chrome Extension - Free

Aero Online

Your AI prospecting assistant

Download 750 million emails and 100 million phone numbers

Access emails and phone numbers of over 750 million business users. Instantly download verified profiles using 20+ filters, including location, job title, company, function, and industry.