Shineteck Inc. seeks a Senior Data Engineer in Mechanicsburg, PA
Shineteck Inc. seeks a Senior Data Engineer in Mechanicsburg, PA to establishes database management
systems, standards, guidelines, and quality assurance for database deliverables, such as conceptual
design, logical database, capacity planning, external data interface specification, data loading plan, data
maintenance plan and security policy. Documents and communicates database design. Evaluates and
installs database management systems. Codes complex programs and derives logical processes on
technical platforms. Builds windows, screens, and reports. Assists in the design of user interface and
business application prototypes. Participates in quality assurance and develops test application code in
client server environment. Provides expertise in devising, negotiating, and defending the tables and fields
provided in the database. Adapts business requirements, developed by modeling/development staff and
systems engineers, and develops the data, database specifications, and table and element attributes for
an application. At more experienced levels, helps to develop an understanding of client's original data and
storage mechanisms. Determines appropriateness of data for storage and optimum storage organization.
Determines how tables relate to each other and how fields interact within the tables for a relational model.
Developing Spark RDD transformations, actions, and Data Frame's, case classes, Datasets for the
required input data and performed the data transformations using Spark-Core. Migrate Data pipelines
Orchestration from self-hosted Airflow environment to Kubernetes based Managed Airflow Platform. Use
various DML and DDL commands like Select, Insert, Update, Sub Queries, Inner Joins, Outer Joins,
Union, Advanced SQL etc. for the Data retrieval and manipulation. Automate nightly build to run quality
control using Python with BOTO3 library to make sure pipeline does not fail which reduces the effort by
70%. Create AWS Lambda, EC2 instances provisioning on AWS environment and implemented security
groups, administered Amazon VPC's. On demand secure EMR cluster launcher with custom spark submit
steps using S3 Event, SNS, KMS and Lambda function. Use AWS EMR to transform and move large
amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage
Service (Amazon S3), DynamoDB and Redshift. Create monitors, alarms, notifications and logs for
Lambda functions, Glue Jobs, EC2 hosts using CloudWatch and used AWS Glue for the data
transformation, validate and data cleansing. Handle different file formats conversion like CSV, XML,
JSON and loaded them into S3 which are queried from Athena. Write Custom AWS Lambda functions to
filter raw data from Kinesis Streams and put into S3 Buckets for Spark to read. Configure step functions
to orchestrate the multiple EMR tasks for data processing. Use Amazon MSK (Managed Streaming for
Kafka) as a distributed stream processing platform throughout application to process real time data. Work
on building Tableau worksheets, Dashboards for reporting by connecting to customer specific Redshift
Data Mart.
Travel: Must be willing to travel and relocate to unanticipated client locations in the U.S.
Job Requirements: Must possess a Master’s Degree in Computer Science, Engineering, or a related
field. Must also possess experience with Hadoop, Hive, Oozie, Spark, Shell, HBase, Kakfa, Scala,
Python, Elasticsearch, Splunk and JIRA.