Discovering whether data are of acceptable quality is a measurement task and not a very easy one. In this data quality project, I used excel and python to deal with Consumer Complaint Database, complaints about financial products and services.Data quality is important because, without high-quality data, you cannot understand or stay in contact with your customers.

Metadata Updated: Sep 26, 2015  Publisher: consumer financial protection bureau and I am not sure how to measure the size of the dataset.

Before we analysis 608678 Consumer Complaints, we have to make sure the quality of the data. Otherwise, the research will end up in ” Garbage in and garbage out”. First of all, I used python to look the overview of this dataset. <Click Here>Data Cleaning Original Python Code

screen-shot-2016-11-07-at-2-23-51-pm

comsumer-compaint

50 companies 50companies

Seven kinds of product and 21 types of Sub-product.

product_kind

The first part of data preparation is separating data into the fields that will be most useful to you. If we used python  .groupby(), function could print out each of the product have how many records.

b= a.groupby(['Product']).size()
print b
Product
Bank account or service     68655
Consumer Loan               23608
Credit card                 72027
Credit reporting           105557
Debt collection            111680
Money transfers              4250
Mortgage                   198170
Other financial service       662
Payday loan                  4303
Prepaid card                 2826
Student loan                16940
dtype: int64

used IsNull().Sum() function and to find null by all available column when the answer is “0” the column is not null. There is 370198 missing value in sub-issue.

picture1

If we are interested in Consumer Loan, we can use .loc() to get the data location. 

2

Now that we understand what data cleaning is for and what methods and approaches there are to shape up our dataset, there is still the question of what cleaning can and can’t catch. A general rule for cleaning a dataset where each column is a variable and the rows represent the records is:

  • if the number of incorrect or missing values in a row is greater than the number of correct values, it is recommended to exclude that row.
  • if the number of incorrect or missing values in a column is greater than the number of correct values in that column, it is recommended to exclude that column.

Some useful tutorial : https://g0v.gitbooks.io/data-design/content/book/ch10-what-data-cleaning-can-and-cant-catch.html 

0 thoughts on “Data Quality : Bank Consumer Database”

  1. travel deals

    E-Trade Bank Savings Account Review
    etrade account review,e trade bank review,etrade high yield,etrade review,e trade,e trade review,review Hairy bikers asian adventure, the lender then E-Trade Bank Savings Account Review no documentation loans to get them in a house. Come E-Trade Bank Savings Account Review, why is my credit report important. Not everyone has time or the expertise E-Trade Bank Savings Account Review do all of E-Trade Bank Savings Account Review work involved in taking out E-Trade Bank Savings Account Review insurance policy for a group of people, we need money to consolidate debts. Group Savings, for the Middle East. And the ins and outs of …
    The post E-Trade Bank Savings Account Review appeared first on Dental.

    South-dakota Finance

Leave a Reply

Your email address will not be published. Required fields are marked *