Industry:

Financial Institution

Technologies:

Bigdata, Apache Airflow, Amazon S3, AWS Glue

Company Introduction

RenMoney is a microfinance bank that offers various financial services for its customers such as loans, savings and investment. Renmoeny is one of the biggest fintechs in the space of lending to both individuals for personal use and to businesses. They are licensed and regulated by the Central Bank of Nigeria (CBN) and insured by the Nigeria Deposit Insurance Corporation (NDIC).

The Challenge

RenMoney gives out soft loans to alot of its customers, from time to time, they need to be able to corrobarate all this data to be able to analyze the pattern of loans, know the customers that are defautling and get in-depth understanding of they payments made by customers. They also have different channels of collecting loan-repayments from different customers, that requires analysis also.

All these put together are thousands to millions of transactions and this data is stored in different locations, systems, and in different data formats. The data is stored in; structured, semi-structured and unstructured methods. They are also stored in different database technologies.

Solution

Create a 360-degree profile of customers by consolidating all the data that exist in different platform, to give the business analytics team data to perform quick and very informationtive decisions and standing for different customers, loan status and other relevant business information.

We architected a robust data pipeline that involved different stages of data processing, riding on the standard ETL (Extraction Transformation and Loading) process of data preparation. It involved using the following technologies to perform the ETL process

  • Apache Airflow: for extracting data from different sources using DAG
  • AWS Glue: Transform data that has been extracted and compress, coaslesce and clean the data
  • Amazon S3: Serve as datalake and landing point for data

Methodology

The process involves three stages;

  • Extraction; of data from various data sources of Renmoney database. Which include, pure databases system, software applications, data stored on third-party solutions used for the day-to-day analysis of customer loans and customer activities and dump the data in a particular format in Amazon S3. This involved pulling the data either periodically on a schedule
  • Tranformation; after the relation and non-relational data has been extracted it then goes to the next stage of processing. At this stage AWS Glue which is a serverless ETL solution from AWS, is used to compress the data into a more format that is faster to query and analyze, and then we start looking for relationships within the data based on the data that has been extracted into Amazon S3. Some form of normalization also happens at this point to give the data some consistency across different platforms in which it has been collected.
  • Loading/Storage: When the data has been transformed, it was then loaded into a storage engine. For this solution, S3 again was used for the storage of the cleaned data, putting it in a partitioned format for easy query and analysis
  • Analytics: PowerBI was used by RenMoney to connect to the data through Amazon Athena which is a serverless data analytics tool that uses SQL to query semi-structured data in flat-files and can also connect different relational database engines

Results

RenMoney was able to analyze data faster with this solution and get insights faster because the data from different sources became more connected

Technologies Used

AWS Glue, Amazon S3, AWS Lambda, Apache Airflow, Python, Amazon EC2