Getting started with Python!

So this week, I wrote my first program in Python for analyzing frequency distribution of data. Unfortunately, the data I had selected did not lend well to frequency distribution. The main reason for this was that none of the variables in the data set was categorical. Therefore the results are not very meaningful at this point. Nevertheless, my first program did run without any errors and displayed the intended results.

My program:

# -*- coding: utf-8 -*-
“””
Spyder Editor

This is a temporary script file.
“””
# This is my “Hello World Code”

import pandas
import numpy

data = pandas.read_csv(‘data.csv’)

#data[‘YISM2534’] = data[‘YISM2534’].convert_objects(convert_numeric=True)
#data[‘YISW2534’] = data[‘YISW2534’].convert_objects(convert_numeric=True)
#data[‘DBA’] = data[‘DBA’].convert_objects(convert_numeric=True)

data[‘YISM2534’] = pandas.to_numeric(data[‘YISM2534’])
data[‘YISW2534’] = pandas.to_numeric(data[‘YISW2534’])
data[‘DBA’] = pandas.to_numeric(data[‘DBA’])

print(“Number of rows in the data file” )
print(len(data))
print(“Number of variables”)
print(len(data.columns))

print(“All data is from the year 2008”)

print(“Frequency of countries in terms of the average number of years of school including primary, secondary and tertiary education attended by all men between 25 and 34 years.”)
YISM2534FD = data.groupby(‘YISM2534’).size()
print(YISM2534FD)

print(“Percentage of countries in terms of the average number of years of school including primary, secondary and tertiary education attended by all men between 25 and 34 years.”)
YISM2534Percentage = data.groupby(‘YISM2534’).size() * 100 / len(data)
print(YISM2534Percentage)

print(“Frequency of countries in terms of the average number of years of school including primary, secondary and tertiary education attended by all Women between 25 and 34 years.”)
YISW2534FD = data.groupby(‘YISW2534’).size()
print(YISW2534FD)

print(“Percentage of countries in terms of the average number of years of school including primary, secondary and tertiary education attended by all Women between 25 and 34 years.”)
YISW2534Percentage = data.groupby(‘YISW2534’).size() * 100 / len(data)
print(YISW2534Percentage)

print(“Frequency of countries in terms of the average number of deaths in new born due to asphyxia.”)
dbaFD = data.groupby(‘DBA’).size()
print(dbaFD)

print(“Percentage of countries in terms of the average number of deaths in new born due to asphyxia.”)
dbaPercentage = data.groupby(‘DBA’).size() * 100 / len(data)
print(dbaPercentage)

Screen shot of the program

Screenshot1

Results

runfile(‘C:/Bella/Data Analysis/Spyder Working Files/MyFirst.py’, wdir=’C:/Bella/Data Analysis/Spyder Working Files’)

Number of rows in the data file

212

Number of variables

11

All data is from the year 2008

Frequency of countries in terms of the average number of years of school including primary, secondary and tertiary education attended by all men between 25 and 34 years.

YISM2534

2.6     2

2.9     1

3.3     2

3.6     1

3.9     1

4.1     1

4.2     1

4.3     1

4.4     1

4.5     2

4.8     1

5.1     1

5.5     1

5.7     3

5.8     1

5.9     1

6.0     2

6.1     2

6.2     1

6.3     2

6.4     2

6.5     2

6.6     1

6.7     3

6.8     2

6.9     1

7.0     1

7.1     1

7.2     3

7.3     3

..

10.7    4

10.9    2

11.0    2

11.1    1

11.2    5

11.3    4

11.4    1

11.5    3

11.7    2

11.9    3

12.0    5

12.1    3

12.2    3

12.3    2

12.4    5

12.5    3

12.6    3

12.8    1

12.9    3

13.0    3

13.1    2

13.2    1

13.3    1

13.4    3

13.5    1

13.6    2

13.7    1

14.3    1

14.7    1

14.8    1

dtype: int64

Percentage of countries in terms of the average number of years of school including primary, secondary and tertiary education attended by all men between 25 and 34 years.

YISM2534

2.6     0.943396

2.9     0.471698

3.3     0.943396

3.6     0.471698

3.9     0.471698

4.1     0.471698

4.2     0.471698

4.3     0.471698

4.4     0.471698

4.5     0.943396

4.8     0.471698

5.1     0.471698

5.5     0.471698

5.7     1.415094

5.8     0.471698

5.9     0.471698

6.0     0.943396

6.1     0.943396

6.2     0.471698

6.3     0.943396

6.4     0.943396

6.5     0.943396

6.6     0.471698

6.7     1.415094

6.8     0.943396

6.9     0.471698

7.0     0.471698

7.1     0.471698

7.2     1.415094

7.3     1.415094

 

10.7    1.886792

10.9    0.943396

11.0    0.943396

11.1    0.471698

11.2    2.358491

11.3    1.886792

11.4    0.471698

11.5    1.415094

11.7    0.943396

11.9    1.415094

12.0    2.358491

12.1    1.415094

12.2    1.415094

12.3    0.943396

12.4    2.358491

12.5    1.415094

12.6    1.415094

12.8    0.471698

12.9    1.415094

13.0    1.415094

13.1    0.943396

13.2    0.471698

13.3    0.471698

13.4    1.415094

13.5    0.471698

13.6    0.943396

13.7    0.471698

14.3    0.471698

14.7    0.471698

14.8    0.471698

dtype: float64

Frequency of countries in terms of the average number of years of school including primary, secondary and tertiary education attended by all Women between 25 and 34 years.

YISW2534

0.6     1

1.2     1

1.3     2

1.4     1

1.6     2

1.7     1

2.0     2

2.1     1

2.3     1

2.5     1

2.7     1

2.9     2

3.1     2

3.2     1

3.4     1

3.5     1

3.8     2

3.9     1

4.1     1

4.2     2

4.4     2

4.6     2

4.7     1

5.0     1

5.2     1

5.3     2

5.4     2

5.5     2

5.6     2

5.7     1

..

11.3    1

11.4    2

11.5    3

11.6    3

11.7    1

11.8    3

11.9    1

12.0    2

12.1    4

12.2    3

12.3    3

12.4    4

12.5    3

12.6    3

12.7    1

12.8    2

12.9    2

13.0    3

13.1    3

13.2    2

13.3    3

13.4    1

13.5    2

13.6    1

13.7    3

13.8    3

13.9    1

14.5    1

14.8    1

15.0    1

dtype: int64

Percentage of countries in terms of the average number of years of school including primary, secondary and tertiary education attended by all Women between 25 and 34 years.

YISW2534

0.6     0.471698

1.2     0.471698

1.3     0.943396

1.4     0.471698

1.6     0.943396

1.7     0.471698

2.0     0.943396

2.1     0.471698

2.3     0.471698

2.5     0.471698

2.7     0.471698

2.9     0.943396

3.1     0.943396

3.2     0.471698

3.4     0.471698

3.5     0.471698

3.8     0.943396

3.9     0.471698

4.1     0.471698

4.2     0.943396

4.4     0.943396

4.6     0.943396

4.7     0.471698

5.0     0.471698

5.2     0.471698

5.3     0.943396

5.4     0.943396

5.5     0.943396

5.6     0.943396

5.7     0.471698

 

11.3    0.471698

11.4    0.943396

11.5    1.415094

11.6    1.415094

11.7    0.471698

11.8    1.415094

11.9    0.471698

12.0    0.943396

12.1    1.886792

12.2    1.415094

12.3    1.415094

12.4    1.886792

12.5    1.415094

12.6    1.415094

12.7    0.471698

12.8    0.943396

12.9    0.943396

13.0    1.415094

13.1    1.415094

13.2    0.943396

13.3    1.415094

13.4    0.471698

13.5    0.943396

13.6    0.471698

13.7    1.415094

13.8    1.415094

13.9    0.471698

14.5    0.471698

14.8    0.471698

15.0    0.471698

dtype: float64

Frequency of countries in terms of the average number of deaths in new born due to asphyxia.

DBA

0         1

1         4

2         2

3         1

4         5

6         1

7         2

8         3

9         1

11        1

12        2

13        1

14        2

15        1

16        2

17        1

20        1

21        2

22        1

23        2

24        1

26        3

28        1

29        1

30        1

34        2

35        3

36        1

38        1

39        1

..

4908      1

4927      1

5440      1

5753      1

5845      1

6176      1

6308      1

6312      1

6382      1

6558      1

6598      1

6789      1

6803      1

6941      1

9358      1

9483      1

9528      1

12742     1

13109     1

14508     1

16084     1

16958     1

17428     1

32214     1

35865     1

36113     1

59591     1

67275     1

84990     1

189447    1

dtype: int64

Percentage of countries in terms of the average number of deaths in new born due to asphyxia.

DBA

0         0.471698

1         1.886792

2         0.943396

3         0.471698

4         2.358491

6         0.471698

7         0.943396

8         1.415094

9         0.471698

11        0.471698

12        0.943396

13        0.471698

14        0.943396

15        0.471698

16        0.943396

17        0.471698

20        0.471698

21        0.943396

22        0.471698

23        0.943396

24        0.471698

26        1.415094

28        0.471698

29        0.471698

30        0.471698

34        0.943396

35        1.415094

36        0.471698

38        0.471698

39        0.471698

 

4908      0.471698

4927      0.471698

5440      0.471698

5753      0.471698

5845      0.471698

6176      0.471698

6308      0.471698

6312      0.471698

6382      0.471698

6558      0.471698

6598      0.471698

6789      0.471698

6803      0.471698

6941      0.471698

9358      0.471698

9483      0.471698

9528      0.471698

12742     0.471698

13109     0.471698

14508     0.471698

16084     0.471698

16958     0.471698

17428     0.471698

32214     0.471698

35865     0.471698

36113     0.471698

59591     0.471698

67275     0.471698

84990     0.471698

189447    0.471698

dtype: float64

Screen shots of the results

Screenshot2 Screenshot7 Screenshot6 Screenshot5 Screenshot4 Screenshot3

 

In the above program, I have analyzed the frequency distribution of three variables:

YISM2534 :

The average number of years of school attended by all people in the age and gender group specified, including primary, secondary and tertiary education.

Since the variable takes an infinite number of values, it is not possible to summarize the results. Taking only the first three lines of the results (frequency and percentage) as a sample, it can be interpreted as follows:

YISM2534 (frequency)

2.6     2 (In two of the countries, the average years of education for men between the age group 25 and 34 is 2.6 years)

2.9     1 (In one country, the average years of education for men between the age group 25 and 34 is 2.9 years)

3.3     2 (In two of the countries, the average years of education for men between the age group 25 and 34 is 3.3 years)

—–

YISM2534 (Percentage)

2.6     0.943396 (In 0.94% of the countries, the average years of education for men is 2.6 years)

2.9     0.471698 (In 0.74% of the countries, the average years of education for men is 2.9 years)

3.3     0.943396 (In 0.94% of the countries, the average years of education for men is 3.3 years)

YISMW2534

The average number of years of school attended by all people in the age and gender group specified, including primary, secondary and tertiary education.

Since the variable takes an infinite number of values, it is not possible to summarize the results. Taking only the first three lines of the results (frequency and percentage) as a sample, it can be interpreted as follows:

YISM2534 (frequency)

0.6     1   (In one country, the average years of education for women between the age group 25 and 34 is 0.6 years)

1.2     1 (In one country, the average years of education for women between the age group 25 and 34 is 1.2 years)

1.3     2  (In two of the countries, the average years of education for women between the age group 25 and 34 is 1.3 years)

YISM2534 (Percentage)

0.6     0.471698 (In 0.47% of the countries, the average years of education for women is 0.6 years)

1.2     0.471698 (In 0.47% of the countries, the average years of education for women is 1.2 years)

1.3     0.943396 (In 0.94% of the countries, the average years of education for women is 1.3 years)

DBA:

Number of new borns that died specifically of asphyxia

Since this variable also takes an infinite number of values, it is not possible to summarize the results. Taking only the first three lines of the results (frequency and percentage) as a sample, it can be interpreted as follows:

DBA (frequency)

0     1  (In one country, the number of deaths by asphyxia is 0)

1     4 (In four of the countries, the number of deaths by asphyxia is 1)

2     2  (In two of the countries,the number of deaths by asphyxia is 2)

DBA (Percentage)

0         0.471698 (in 0.47% of the countries, there were 0 deaths by asphyxia)

1         1.886792 (in 1.8% of the countries, there was 1 deaths by asphyxia)

2         0.943396 (in 0.94% of the countries, there were 2 deaths by asphyxia)

As already mentioned, since none of the variables were categorical in nature, it was not possible to draw meaningful information from the data using only the frequency analysis.

This entry was posted in Big Data, Data Analysis. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *