Job Vacancies

Senior Data Engineer

    Shineteck Inc. seeks a Senior Data Engineer in Mechanicsburg, PA

    Job Description:

Shineteck Inc. seeks a Senior Data Engineer in Mechanicsburg, PA to establishes database management systems, standards, guidelines, and quality assurance for database deliverables, such as conceptual design, logical database, capacity planning, external data interface specification, data loading plan, data maintenance plan and security policy. Documents and communicates database design. Evaluates and installs database management systems. Codes complex programs and derives logical processes on technical platforms. Builds windows, screens, and reports. Assists in the design of user interface and business application prototypes. Participates in quality assurance and develops test application code in client server environment. Provides expertise in devising, negotiating, and defending the tables and fields provided in the database. Adapts business requirements, developed by modeling/development staff and systems engineers, and develops the data, database specifications, and table and element attributes for an application. At more experienced levels, helps to develop an understanding of client's original data and storage mechanisms. Determines appropriateness of data for storage and optimum storage organization. Determines how tables relate to each other and how fields interact within the tables for a relational model. Developing Spark RDD transformations, actions, and Data Frame's, case classes, Datasets for the required input data and performed the data transformations using Spark-Core. Migrate Data pipelines Orchestration from self-hosted Airflow environment to Kubernetes based Managed Airflow Platform. Use various DML and DDL commands like Select, Insert, Update, Sub Queries, Inner Joins, Outer Joins, Union, Advanced SQL etc. for the Data retrieval and manipulation. Automate nightly build to run quality control using Python with BOTO3 library to make sure pipeline does not fail which reduces the effort by 70%. Create AWS Lambda, EC2 instances provisioning on AWS environment and implemented security groups, administered Amazon VPC's. On demand secure EMR cluster launcher with custom spark submit steps using S3 Event, SNS, KMS and Lambda function. Use AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3), DynamoDB and Redshift. Create monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using CloudWatch and used AWS Glue for the data transformation, validate and data cleansing. Handle different file formats conversion like CSV, XML, JSON and loaded them into S3 which are queried from Athena. Write Custom AWS Lambda functions to filter raw data from Kinesis Streams and put into S3 Buckets for Spark to read. Configure step functions to orchestrate the multiple EMR tasks for data processing. Use Amazon MSK (Managed Streaming for Kafka) as a distributed stream processing platform throughout application to process real time data. Work on building Tableau worksheets, Dashboards for reporting by connecting to customer specific Redshift Data Mart.

Travel: Must be willing to travel and relocate to unanticipated client locations in the U.S.

Job Requirements: Must possess a Master’s Degree in Computer Science, Engineering, or a related field. Must also possess experience with Hadoop, Hive, Oozie, Spark, Shell, HBase, Kakfa, Scala, Python, Elasticsearch, Splunk and JIRA.