Data Platform & ArchitectureView profile
As part of our Dive into Data initiative, we like to showcase the many avenues and routes, far beyond the perception that it’s all about being a genius in maths, engineering and science-oriented qualifications and how data brilliance is also fuelled by creative and business orientated minds.
In this part of the series, we’d like to feature the background and career journey of Nicholas Bull, Data Engineer at Lloyds Banking Group.
Our chat here, to feature a real-life example. How to get into data and an example of how a career path within that field can unfold….
Nicholas: The position of a data engineer is a relatively new one and therefore I don’t think the job role is generally well defined across the industry. I expect my experience and responsibilities as a data engineer are very different to others, and in fact, my first experience as a data engineer in the startup world was vastly different to my time at Lloyds.
At the startup, I was responsible for the whole data journey: sourcing information, the ETL process and the data warehousing. I designed the systems and was responsible for maintaining them. I also did a lot of reporting and data visualisation work on top of this.
At Lloyds the systems and ETL processes etc. are mostly managed by IT. I’m also in a team with people that focus solely on data science, visualisation and reporting, and so the job of a data engineer is really to facilitate their work. That often means sourcing information from across the Group into one central and coherent place, or it could be manipulating the data we already control into a more useful structure, for example picking out the most relevant pieces of data and creating an event layer for a particular project.
Nicholas: I did a Masters’ in Physics and Astronomy and as part of the course took some introduction to Programming in Python courses. At the time I didn’t think it would be useful outside of academia – looking back almost 10 years later, I couldn’t have been more wrong!
After university, I was hired as a data analyst at a startup, but quickly realised that aside from Google Analytics, there wasn’t much data readily available to analyse! A lot of what I needed was in a legacy database or in text files across the business, so I started writing rudimentary scripts to extract data and load it into a managed AWS database.
Without realising it at the time, I was essentially building data pipelines! As time progressed and the complexity of my code increased, I discovered a lot of the problems I was facing had already been addressed by great open-source data pipelining projects such as Luigi and Airflow. These tools and the community and support around them massively accelerated my development, and before I knew it, a large part of my job had become data engineering!
Nicholas: I’d like to say I chose data engineering over software engineering, but the reality is I fell into it. That said, I think the lines are quite blurred between data engineering and any developer role these days, especially with such a widely used language like Python. Indeed beyond data engineering, I’ve used it for creating websites, web scraping, automated testing, and many other smaller ad-hoc projects, so I definitely don’t feel pigeon-holed into data engineering for the rest of my career.
Nicholas: Curiosity and not being daunted by software you’re unfamiliar with. You’ll often discover that the information you need is located in a system you know nothing about – that could be a legacy database (Foxpro anyone? No thought not); it could be from a cloud vendor you’re not familiar with, or it might not be in any structured database – I learnt web scraping because that was the only source for the information we needed.
Nicholas: I’ve found that having good interpersonal skills is more beneficial than you might think for a data engineer. This is especially true at a large organisation where data is managed by hundreds, if not thousands of people! You need to engage with these people before you can start a project, and in some instances persuade them to prioritise getting extracts or access for you over and above what else they might be doing.
Teamwork is important; if you’re working in a data-focussed team you might have data scientists, data visualisation experts, other data engineers, and project managers. You’ll need to be able to communicate in broad big-picture terms with project managers or perhaps non-technical stakeholders, but then be able to dive into the minutiae to facilitate the data science or visualisation work.
Thanks so much for sharing!