# | Name | Description | URL | Preview | Citaion |
---|---|---|---|---|---|
1 | Original 10-K reports | Raw and full 10-K reports that have not been processed yet. Contains
1996.full.tgz to 2013.full.tgz , where you can find
all reports in that year named in the format of key-date.full , where key
is the CUSIP code for the company (10GB). |
Link | View | TMIS 2016 |
2 | MD&A sections | Raw MD&A section from original 10-K reports. Contains
1996.mda.tgz to 2013.mda.tgz , where you can find
all reports in that year named in the format of key-date.mda (760MB). |
Link | View | See above |
3 | Tokenized MD&A sections | Tokenized MD&A section from original 10-K reports. Contains
1996.mda.tgz to 2013.mda.tgz , where you can find
all reports in that year named in the format of key-date.mda (533MB).
|
Link | View | See above |
4 | Logarithm post-event return volatility | Contains 1996.logfama.txt to 2013.logfama.txt , where
you can find the mapping of key (CUSIP code) and its corresponding post-event volatility
(Fama-French 3-factor model) in the following year. Logarithms are calculated
using base \(e\). |
Link | View | See above |
5 | Logarithm volatility | Contains 1996.logvol.[+-]12.txt to
2013.logvol.[+-]12.txt , where you can find the mapping of key and
its corresponding stock price volatility (standard deviation) in the following
(+12) and preceding (-12) year. Logarithms are calculated using base \(e\). |
Link | View | See above |
6 | Abnormal trading volume | Contains 1996.abnormal.txt to 2013.abnormal.txt , where
you can find the mapping of key and its corresponding abnormal trading volume in
that year (see [Loughran and McDonald 2011] for the detailed definition). |
Link | View | See above |
7 | Excess return | Contains 1996.excess.txt to 2013.excess.txt , where you
can find the mapping of key and its corresponding excess return in that year
(see [Loughran and McDonald 2011] for the detailed definition). |
Link | View | See above |
8 | Meta | Information about a report, including the issue date, URL, SEC ID, and the company name. | Link | View | See above |
9 | README | Brief guide of the above resources. | Link | View | See above |
# | Name | Description | URL | Preview | Citaion |
---|---|---|---|---|---|
1 | Without POS tag | Pre-trained vectors via word2vec (with the CBOW model) trained on the above 10-K Corpus (40,708 reports from 18 years). Embedding dimension is 200. | Vector Binary |
View | TMIS 2016 |
2 | With POS tag | Pre-trained vectors via word2vec (with the CBOW model) trained on the above 10-K Corpus (40,708 reports from 18 years). Embedding dimension is 200. | Vector Binary |
View | See above |
# | Name | Description | URL | Preview | Citaion |
---|---|---|---|---|---|
1 | Label platform | An online platform where users can label MWEs in a sentence. | Link | See link | ICASSP 2019 |
2 | MWE attributes | Mark which dictionary category provided by Loughran and McDonald Sentiment Word Lists that a MWE belongs to, only marks by the first letter of the category (4,722 MWEs in total). | Link | See link | See above |
3 | Label source | Original sentences where MWEs are lebeled. | Link | See link | See above |
4 | MWE and POS labels | The first and the second field are the numbering and the raw sentence, respectively. The last field is a JSON, which stores positions of strong as well as weak MWEs and the POS (part-of-speech) tags for each word. | Link | See link | See above |
# | Name | Description | URL | Preview | Citaion |
---|---|---|---|---|---|
1 | Binary label result | Contains the sentences extracted from 10-K Corpus with binary risk labels (2,432 sentences in total). | Link | See link | ICASSP 2020 |