Real data



Dataset statistics

Number of variables 15
Number of observations 32560
Missing cells 0
Missing cells (%) 0.0%
Duplicate rows 24
Duplicate rows (%) 0.1%
Total size in memory 3.7 MiB
Average record size in memory 120.0 B

Variable types

CAT 9
NUM 6

Warnings

Dataset has 24 (0.1%) duplicate rows Duplicates
2174 has 29849 (91.7%) zeros Zeros
0 has 31041 (95.3%) zeros Zeros

Reproduction

Analysis started 2021-01-22 10:23:48.908520
Analysis finished 2021-01-22 10:23:57.692360
Duration 8.78 seconds
Software version pandas-profiling v2.10.0
Download configuration config.yaml

Variables

39
Real number (ℝ≥0)

Distinct 73
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 38.58163391
Minimum 17
Maximum 90
Zeros 0
Zeros (%) 0.0%
Memory size 254.4 KiB
2021-01-22T10:23:57.795172image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum 17
5-th percentile 19
Q1 28
median 37
Q3 48
95-th percentile 63
Maximum 90
Range 73
Interquartile range (IQR) 20

Descriptive statistics

Standard deviation 13.64064183
Coefficient of variation (CV) 0.3535527256
Kurtosis -0.1662122267
Mean 38.58163391
Median Absolute Deviation (MAD) 10
Skewness 0.5587376395
Sum 1256218
Variance 186.0671095
Monotocity Not monotonic
2021-01-22T10:23:58.084041image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
36 898
 
2.8%
31 888
 
2.7%
34 886
 
2.7%
23 877
 
2.7%
35 876
 
2.7%
33 875
 
2.7%
28 867
 
2.7%
30 861
 
2.6%
37 858
 
2.6%
25 841
 
2.6%
Other values (63) 23833
73.2%
Value Count Frequency (%)
17 395
1.2%
18 550
1.7%
19 712
2.2%
20 753
2.3%
21 720
2.2%
Value Count Frequency (%)
90 43
0.1%
88 3
 
< 0.1%
87 1
 
< 0.1%
86 1
 
< 0.1%
85 3
 
< 0.1%

State-gov
Categorical

Distinct 9
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 254.4 KiB
22696 
2541 
 
2093
 
1836
 
1297
 
2097

Length

Max length 17
Median length 8
Mean length 8.864373464
Min length 2

Characters and Unicode

Total characters 288624
Distinct characters 29
Distinct categories 5 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%
Value Count Frequency (%)
22696
69.7%
2541
 
7.8%
2093
 
6.4%
1836
 
5.6%
1297
 
4.0%
1116
 
3.4%
960
 
2.9%
14
 
< 0.1%
7
 
< 0.1%
2021-01-22T10:23:58.497849image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-01-22T10:23:58.635609image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Value Count Frequency (%)
22696
69.7%
2541
 
7.8%
2093
 
6.4%
1836
 
5.6%
1297
 
4.0%
1116
 
3.4%
960
 
2.9%
14
 
< 0.1%
7
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
33248
11.5%
32560
11.3%
27859
9.7%
27060
9.4%
27053
9.4%
26367
9.1%
23670
8.2%
22696
7.9%
14226
 
4.9%
9005
 
3.1%
Other values (19) 44880
15.5%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 209278
72.5%
Space Separator 32560
 
11.3%
Uppercase Letter 30724
 
10.6%
Dash Punctuation 14226
 
4.9%
Other Punctuation 1836
 
0.6%

Most frequent character per category

Value Count Frequency (%)
33248
15.9%
27859
13.3%
27060
12.9%
27053
12.9%
26367
12.6%
23670
11.3%
9005
 
4.3%
6710
 
3.2%
6198
 
3.0%
5750
 
2.7%
Other values (10) 16358
7.8%
Value Count Frequency (%)
22696
73.9%
4954
 
16.1%
2093
 
6.8%
960
 
3.1%
14
 
< 0.1%
7
 
< 0.1%
Value Count Frequency (%)
32560
100.0%
Value Count Frequency (%)
14226
100.0%
Value Count Frequency (%)
1836
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 240002
83.2%
Common 48622
 
16.8%

Most frequent character per script

Value Count Frequency (%)
33248
13.9%
27859
11.6%
27060
11.3%
27053
11.3%
26367
11.0%
23670
9.9%
22696
9.5%
9005
 
3.8%
6710
 
2.8%
6198
 
2.6%
Other values (16) 30136
12.6%
Value Count Frequency (%)
32560
67.0%
14226
29.3%
1836
 
3.8%

Most occurring blocks

Value Count Frequency (%)
ASCII 288624
100.0%

Most frequent character per block

Value Count Frequency (%)
33248
11.5%
32560
11.3%
27859
9.7%
27060
9.4%
27053
9.4%
26367
9.1%
23670
8.2%
22696
7.9%
14226
 
4.9%
9005
 
3.1%
Other values (19) 44880
15.5%

77516
Real number (ℝ≥0)

Distinct 21647
Distinct (%) 66.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 189781.8144
Minimum 12285
Maximum 1484705
Zeros 0
Zeros (%) 0.0%
Memory size 254.4 KiB
2021-01-22T10:23:59.007241image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum 12285
5-th percentile 39458.6
Q1 117831.5
median 178363
Q3 237054.5
95-th percentile 379686.3
Maximum 1484705
Range 1472420
Interquartile range (IQR) 119223

Descriptive statistics

Standard deviation 105549.7649
Coefficient of variation (CV) 0.5561637466
Kurtosis 6.218940105
Mean 189781.8144
Median Absolute Deviation (MAD) 59891.5
Skewness 1.446972243
Sum 6179295876
Variance 1.114075288 × 1010
Monotocity Not monotonic
2021-01-22T10:23:59.369461image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
203488 13
 
< 0.1%
123011 13
 
< 0.1%
164190 13
 
< 0.1%
121124 12
 
< 0.1%
148995 12
 
< 0.1%
126675 12
 
< 0.1%
113364 12
 
< 0.1%
102308 11
 
< 0.1%
120277 11
 
< 0.1%
123983 11
 
< 0.1%
Other values (21637) 32440
99.6%
Value Count Frequency (%)
12285 1
< 0.1%
13769 1
< 0.1%
14878 1
< 0.1%
18827 1
< 0.1%
19214 1
< 0.1%
Value Count Frequency (%)
1484705 1
< 0.1%
1455435 1
< 0.1%
1366120 1
< 0.1%
1268339 1
< 0.1%
1226583 1
< 0.1%

Bachelors
Categorical

Distinct 16
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 254.4 KiB
10501 
7291 
5354 
1723 
1382 
6309 

Length

Max length 13
Median length 8
Mean length 9.433691646
Min length 4

Characters and Unicode

Total characters 307161
Distinct characters 32
Distinct categories 5 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%
Value Count Frequency (%)
10501
32.3%
7291
22.4%
5354
16.4%
1723
 
5.3%
1382
 
4.2%
1175
 
3.6%
1067
 
3.3%
933
 
2.9%
646
 
2.0%
576
 
1.8%
Other values (6) 1912
 
5.9%
2021-01-22T10:24:00.012522image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
10501
32.3%
7291
22.4%
5354
16.4%
1723
 
5.3%
1382
 
4.2%
1175
 
3.6%
1067
 
3.3%
933
 
2.9%
646
 
2.0%
576
 
1.8%
Other values (6) 1912
 
5.9%

Most occurring characters

Value Count Frequency (%)
32560
 
10.6%
29414
 
9.6%
26423
 
8.6%
21964
 
7.2%
20563
 
6.7%
19058
 
6.2%
18618
 
6.1%
18583
 
6.0%
17792
 
5.8%
17792
 
5.8%
Other values (22) 84394
27.5%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 205888
67.0%
Uppercase Letter 38859
 
12.7%
Space Separator 32560
 
10.6%
Dash Punctuation 21964
 
7.2%
Decimal Number 7890
 
2.6%

Most frequent character per category

Value Count Frequency (%)
29414
14.3%
26423
12.8%
20563
10.0%
19058
9.3%
18618
9.0%
18583
9.0%
17792
8.6%
14493
7.0%
11568
 
5.6%
11162
 
5.4%
Other values (4) 18214
8.8%
Value Count Frequency (%)
3884
49.2%
933
 
11.8%
646
 
8.2%
646
 
8.2%
514
 
6.5%
433
 
5.5%
333
 
4.2%
333
 
4.2%
168
 
2.1%
Value Count Frequency (%)
17792
45.8%
10501
27.0%
5354
 
13.8%
2449
 
6.3%
1723
 
4.4%
627
 
1.6%
413
 
1.1%
Value Count Frequency (%)
32560
100.0%
Value Count Frequency (%)
21964
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 244747
79.7%
Common 62414
 
20.3%

Most frequent character per script

Value Count Frequency (%)
29414
12.0%
26423
10.8%
20563
8.4%
19058
 
7.8%
18618
 
7.6%
18583
 
7.6%
17792
 
7.3%
17792
 
7.3%
14493
 
5.9%
11568
 
4.7%
Other values (11) 50443
20.6%
Value Count Frequency (%)
32560
52.2%
21964
35.2%
3884
 
6.2%
933
 
1.5%
646
 
1.0%
646
 
1.0%
514
 
0.8%
433
 
0.7%
333
 
0.5%
333
 
0.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 307161
100.0%

Most frequent character per block

Value Count Frequency (%)
32560
 
10.6%
29414
 
9.6%
26423
 
8.6%
21964
 
7.2%
20563
 
6.7%
19058
 
6.2%
18618
 
6.1%
18583
 
6.0%
17792
 
5.8%
17792
 
5.8%
Other values (22) 84394
27.5%

13
Real number (ℝ≥0)

Distinct 16
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 10.08058968
Minimum 1
Maximum 16
Zeros 0
Zeros (%) 0.0%
Memory size 254.4 KiB
2021-01-22T10:24:00.151567image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 5
Q1 9
median 10
Q3 12
95-th percentile 14
Maximum 16
Range 15
Interquartile range (IQR) 3

Descriptive statistics

Standard deviation 2.572708968
Coefficient of variation (CV) 0.2552141343
Kurtosis 0.6235250276
Mean 10.08058968
Median Absolute Deviation (MAD) 1
Skewness -0.3116298916
Sum 328224
Variance 6.618831435
Monotocity Not monotonic
2021-01-22T10:24:00.278443image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
Value Count Frequency (%)
9 10501
32.3%
10 7291
22.4%
13 5354
16.4%
14 1723
 
5.3%
11 1382
 
4.2%
7 1175
 
3.6%
12 1067
 
3.3%
6 933
 
2.9%
4 646
 
2.0%
15 576
 
1.8%
Other values (6) 1912
 
5.9%
Value Count Frequency (%)
1 51
 
0.2%
2 168
 
0.5%
3 333
1.0%
4 646
2.0%
5 514
1.6%
Value Count Frequency (%)
16 413
 
1.3%
15 576
 
1.8%
14 1723
 
5.3%
13 5354
16.4%
12 1067
 
3.3%

Never-married
Categorical

Distinct 7
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 254.4 KiB
14976 
10682 
4443 
 
1025
 
993
 
441

Length

Max length 22
Median length 14
Mean length 15.41409705
Min length 8

Characters and Unicode

Total characters 501883
Distinct characters 25
Distinct categories 4 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%
Value Count Frequency (%)
14976
46.0%
10682
32.8%
4443
 
13.6%
1025
 
3.1%
993
 
3.0%
418
 
1.3%
23
 
0.1%
2021-01-22T10:24:00.954311image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-01-22T10:24:01.216446image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Value Count Frequency (%)
14976
46.0%
10682
32.8%
4443
 
13.6%
1025
 
3.1%
993
 
3.0%
418
 
1.3%
23
 
0.1%

Most occurring characters

Value Count Frequency (%)
70784
14.1%
68348
13.6%
46511
9.3%
41516
8.3%
33553
 
6.7%
32560
 
6.5%
31252
 
6.2%
30101
 
6.0%
28567
 
5.7%
20853
 
4.2%
Other values (15) 97838
19.5%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 395201
78.7%
Dash Punctuation 41516
 
8.3%
Uppercase Letter 32606
 
6.5%
Space Separator 32560
 
6.5%

Most frequent character per category

Value Count Frequency (%)
70784
17.9%
68348
17.3%
46511
11.8%
33553
8.5%
31252
7.9%
30101
7.6%
28567
7.2%
20853
 
5.3%
19419
 
4.9%
16442
 
4.2%
Other values (6) 29371
7.4%
Value Count Frequency (%)
15417
47.3%
10682
32.8%
4443
 
13.6%
1025
 
3.1%
993
 
3.0%
23
 
0.1%
23
 
0.1%
Value Count Frequency (%)
32560
100.0%
Value Count Frequency (%)
41516
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 427807
85.2%
Common 74076
 
14.8%

Most frequent character per script

Value Count Frequency (%)
70784
16.5%
68348
16.0%
46511
10.9%
33553
7.8%
31252
7.3%
30101
7.0%
28567
6.7%
20853
 
4.9%
19419
 
4.5%
16442
 
3.8%
Other values (13) 61977
14.5%
Value Count Frequency (%)
41516
56.0%
32560
44.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 501883
100.0%

Most frequent character per block

Value Count Frequency (%)
70784
14.1%
68348
13.6%
46511
9.3%
41516
8.3%
33553
 
6.7%
32560
 
6.5%
31252
 
6.2%
30101
 
6.0%
28567
 
5.7%
20853
 
4.2%
Other values (15) 97838
19.5%

Adm-clerical
Categorical

Distinct 15
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 254.4 KiB
4140 
4099 
4066 
3769 
3650 
12836 

Length

Max length 18
Median length 14
Mean length 13.20190418
Min length 2

Characters and Unicode

Total characters 429854
Distinct characters 33
Distinct categories 5 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%
Value Count Frequency (%)
4140
12.7%
4099
12.6%
4066
12.5%
3769
11.6%
3650
11.2%
3295
10.1%
2002
6.1%
1843
5.7%
1597
 
4.9%
1370
 
4.2%
Other values (5) 2729
8.4%
2021-01-22T10:24:01.665830image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
4140
12.7%
4099
12.6%
4066
12.5%
3769
11.6%
3650
11.2%
3295
10.1%
2002
6.1%
1843
5.7%
1597
 
4.9%
1370
 
4.2%
Other values (5) 2729
8.4%

Most occurring characters

Value Count Frequency (%)
42978
 
10.0%
40332
 
9.4%
39288
 
9.1%
32560
 
7.6%
29218
 
6.8%
28750
 
6.7%
25999
 
6.0%
22134
 
5.1%
20302
 
4.7%
17359
 
4.0%
Other values (23) 130934
30.5%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 335507
78.1%
Space Separator 32560
 
7.6%
Uppercase Letter 30726
 
7.1%
Dash Punctuation 29218
 
6.8%
Other Punctuation 1843
 
0.4%

Most frequent character per category

Value Count Frequency (%)
42978
12.8%
40332
12.0%
39288
11.7%
28750
8.6%
25999
 
7.7%
22134
 
6.6%
20302
 
6.1%
17359
 
5.2%
15992
 
4.8%
15696
 
4.7%
Other values (10) 66677
19.9%
Value Count Frequency (%)
4938
16.1%
4099
13.3%
4066
13.2%
3778
12.3%
3650
11.9%
3295
10.7%
2525
8.2%
2002
6.5%
1370
 
4.5%
1003
 
3.3%
Value Count Frequency (%)
32560
100.0%
Value Count Frequency (%)
29218
100.0%
Value Count Frequency (%)
1843
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 366233
85.2%
Common 63621
 
14.8%

Most frequent character per script

Value Count Frequency (%)
42978
11.7%
40332
11.0%
39288
10.7%
28750
 
7.9%
25999
 
7.1%
22134
 
6.0%
20302
 
5.5%
17359
 
4.7%
15992
 
4.4%
15696
 
4.3%
Other values (20) 97403
26.6%
Value Count Frequency (%)
32560
51.2%
29218
45.9%
1843
 
2.9%

Most occurring blocks

Value Count Frequency (%)
ASCII 429854
100.0%

Most frequent character per block

Value Count Frequency (%)
42978
 
10.0%
40332
 
9.4%
39288
 
9.1%
32560
 
7.6%
29218
 
6.8%
28750
 
6.7%
25999
 
6.0%
22134
 
5.1%
20302
 
4.7%
17359
 
4.0%
Other values (23) 130934
30.5%

Not-in-family
Categorical

Distinct 6
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 254.4 KiB
13193 
8304 
5068 
3446 
1568 

Length

Max length 15
Median length 10
Mean length 10.11962531
Min length 5

Characters and Unicode

Total characters 329495
Distinct characters 26
Distinct categories 4 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%
Value Count Frequency (%)
13193
40.5%
8304
25.5%
5068
 
15.6%
3446
 
10.6%
1568
 
4.8%
981
 
3.0%
2021-01-22T10:24:02.168357image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-01-22T10:24:02.278450image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Value Count Frequency (%)
13193
40.5%
8304
25.5%
5068
 
15.6%
3446
 
10.6%
1568
 
4.8%
981
 
3.0%

Most occurring characters

Value Count Frequency (%)
32560
 
9.9%
30011
 
9.1%
27671
 
8.4%
25924
 
7.9%
22657
 
6.9%
21707
 
6.6%
14353
 
4.4%
13193
 
4.0%
13193
 
4.0%
13193
 
4.0%
Other values (16) 115033
34.9%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 241718
73.4%
Space Separator 32560
 
9.9%
Uppercase Letter 32560
 
9.9%
Dash Punctuation 22657
 
6.9%

Most frequent character per category

Value Count Frequency (%)
30011
12.4%
27671
11.4%
25924
10.7%
21707
 
9.0%
14353
 
5.9%
13193
 
5.5%
13193
 
5.5%
13193
 
5.5%
11750
 
4.9%
10266
 
4.2%
Other values (9) 60457
25.0%
Value Count Frequency (%)
13193
40.5%
8304
25.5%
6049
18.6%
3446
 
10.6%
1568
 
4.8%
Value Count Frequency (%)
32560
100.0%
Value Count Frequency (%)
22657
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 274278
83.2%
Common 55217
 
16.8%

Most frequent character per script

Value Count Frequency (%)
30011
 
10.9%
27671
 
10.1%
25924
 
9.5%
21707
 
7.9%
14353
 
5.2%
13193
 
4.8%
13193
 
4.8%