Analyzing Average Kanji Complexity w/ Python

Video: https://youtu.be/wuXPPO2vNgU

GitHub: https://github.com/LexingtonWhalen/KanjiStrokesAnalysis

(I think you guys should be able to view the jupyter notebook file. Let me know!)

Purpose:

I wanted to analyze the stroke counts (a measure of kanji complexity) versus the frequency of words using that kanji. I was unable to find any data on this, so I did it myself! I thought maybe this would shed some light on how languages can balance possibility for expression and memory limits of human beings (ie: 8 strokes can create more characters than 1, but they are harder to remember).

In the GitHub are the pngs of the graphs shown. They are pretty neat!

Features:

* Analysis of Kanji stroke patterns by overall word frequency!

* Weighs each Kanji by their average use in natural language, then finds the most common stroke counts used in every-day language!

* Graphs! (Bar and Pie right now).

* Finds standard deviation and mean of most common stroke counts!

Modules / Packages:

* numpy: https://numpy.org/devdocs/contents.html

* pandas: https://pandas.pydata.org/pandas-docs/stable/index.html

* re: https://docs.python.org/3/library/re.html

* matplotlib: https://matplotlib.org/

* scipy: https://www.scipy.org/

* math: https://docs.python.org/3/library/math.html

* random: https://docs.python.org/3/library/random.html

* collections: https://docs.python.org/3/library/collections.html

submitted by /u/AGenericBackup
[link] [comments]

from Language Learning https://ift.tt/304UeCV
via Learn Online English Speaking

Comments