Instructions News Analytics
News analytics typically involves several steps. First, you need to build a database of press releases or other news items. Second, you typically want to lay out your dataset within time by identifying the announcement dates of each news item. Finally, you characterize your news items along the dimensions you are interested in - e.g., through text analysis. Finally, you may quantify the capital market responses to the captured news. This website's news analytics research apps and services support you in all of these steps.
(1) Services for Database Building/Research Apps Use:
Larger news analytics research projects may require you to build up large databases. If you then want to use our news analytics research apps, you need to feed your news items to our apps. We can assist you in developing the CSV files required by our EDI. CATA, and CATA_Filter apps. Please reach out to us if you need such support.
(2) Event Date Identifier (EDI):
The EDI is a regular expression-based tool to identify dates in texts. There is practically no limit to the number of text strings you can process. Similarly, the maximum length of text strings is set very high to allow for most applications. All you need to do to start the EDI is to either input your text into the dialogue box and press the 'continue' button, or upload a CSV file including your texts. Both will trigger the EDI. In the first case, the EDI will directly prompt the dates it has found, whereas in the latter case, it will produce a CSV file holding the text IDs and the dates found in the individual text right to the IDs.
> If you want to use the EDI in batch mode, please provide the system with a CSV file of the following structure: Text ID; Issuer; "Text-String" (review and use this CSV file if you need an example).
(3) Text Analyzer/Event Type Categorizer (CATA):
The text analyzer is a powerful and versatile content-analysis tool that can process large amounts of text. It allows you to apply a categorization scheme of your choice to a large corpus of individual texts (e.g., press releases) for the purpose of text scaling (e.g., Laver, Benoit et al. 2003) and text categorization. With these features, the text analyzer is a server-side alternative to existing CATA tools such as Yoshicoder. It supports you in several steps implied by content analysis research and integrates well with the other tools of this website. For further references to the topic of computer-aided text analysis please refer to the content analysis website of the University of Georgia, or this website which focuses on the sentiments of texts (further links: 1).
CATA allows you to apply a distance condition to the counting of keyphrases. If you add "income{1,7}increase*" (without the ""-signs) to your keyword list, the counter will increase by one if the words income and increase* appear in a range of 1 to 7 words from each other (both income first and increase first). The wildcard * allows in the presented case e.g., for increase, increases, or increased.
For exemplary in- and output-CSVs (i.e., texts, analysis scheme, and results at the text and event category level) that illustrate the analysis capability, please refer to this zip-file. Table 1 summarizes the data items in the different in- and output files. Note: You can zip the individual CSVs before uploading to reduce your upload time.
(4) CATA_Filter (CATA_Filter):
This tool allows you to generate a sub-sample from a larger (CATA) file holding press releases. Simply upload the source file and a csv file holding the IDs of the releases you are interested in. The system will then generate for you a CSV only holding the sub-sample. The CATA_Filter app also allows you to generate individual CSV files for the sub-sample. If you check this option, the filter app will produce a zip archive holding one CSV file per press release.
Table 1: Structure of CATA's In- and Output CSV Files
In-/Output | File Name | Data Items |
---|---|---|
Input | Text CSV | Text ID; Firm; Text; Cut-off value |
Analysis scheme | Category ID; Category Label; S(caling)/C(ompeting)C(ategory); Columns with keywords | |
Output | Text-level results | Text String ID; Total Text String Length; Cut-off; Considered Text String Length; Considered words; Likely Event Category; Likelihood KPI1; Likelihood KPI2; Actual keywords occurrence; Expected keywords occurrence; Counter Keyword list <Label Category 1>;Counter Keyword list <Label Category 2> |
Category-level results | Category ID; Label; Scaling (S) / Competing Categories (CC); Benchmark Level (= Normal Level of Occurrence); Number of Texts Assigned |