Chinese Datasets

Text from Taiwan newspaper
# Name Description URL Preview Citaion
1 News corpus News collected from 自由時報財經版、蘋果日報財經版、工商時報 from 2018/10/12 to 2019/2/09 (16,541 news, approximate 10M words). Files are named by NAME-YYYYMMDD-ID.extension, where *.raw means the raw news, *.txt means the news with digits and punctuations removed, *.cut means tokenized news, and *tag stores companies mentioned by this news (empty for none) Link None
2 Market info Various financial measures for each firm in TWSE in the span covered by the news (see above) Link See link None
3 Translated sentiment words Traditional Chinese translation of each word in Loughran and McDonald Sentiment Word Lists (2,041 words in total) Link See link
4 Expanded sentiment words Contains similar words for each Chinese sentiment word using word2vec (8,561 words in total) Link See link None