Research with XBRL Data, Firm Complexity, and Accounting Reporting Complexity

This website provides a repository of processed eXtensible Business Reporting Language (XBRL) data. XBRL is an open standard for reporting structured financial information, which enables the efficient gathering of data and automated comparison of financial information over time and across firms. This website also provides access to a measure of firm complexity data and SAS code for generating this measure.

The paper below provides a review of the XBRL literature and discusses directions for future research.

Hoitash, R., U. Hoitash, and L. Morris. 2021. eXtensible Business Reporting Language (XBRL): A Review and Implications for Future Research. Auditing: A Journal of Practice and Theory (40)2 107-132.

Firm Complexity and Accounting Reporting Complexity (ARC)

This data set provides several measures of firm complexity based on accounting reporting complexity (ARC). These measures have been extensively validated and have several advantages over other traditional proxies for firm complexity. The data can be downloaded in an Excel or SAS format from the accounting reporting complexity page.

Go to Firm Complexity Page →

Financial Statement Notes - Textual Data

XBRL enables an accurate extraction and classification of financial statement notes, a task that is otherwise hard to accomplish. Data on many prevalent footnotes, including the complete text and count of words and numbers, are available for download in SAS format from the financial statement notes-textual data page.

Go to Textual Data Page →

Get Notified of Updates

If you would like to receive e-mail notifications of updates please complete the form below.

Measuring Firm Complexity with Accounting Reporting Complexity (ARC)

Background

We share a new measure of firm complexity based on accounting information. Accounting is the “language of business,” and accounting disclosures of most business activities are mandated. Therefore, relying on accounting disclosures is the best approach for consistently capturing a wide range of firm activities for a large cross-section of firms. Firm complexity is measured with accounting reporting complexity (ARC) and is based on the count of accounting items disclosed in eXtensible Business Reporting Language (XBRL) filings. ARC is associated with lower financial reporting quality, higher audit fees, greater filing delays, poor analysts' performance, and various other measures that are correlated with firm complexity. ARC possesses properties that are good measures of firm complexity: First, it is “valid,” i.e., it captures the intended construct of firm complexity. Second, it is “reliable,” meaning it is objectively and consistently measured across firms and over time. Third, it captures “variation” in the construct across firms and over time that is consistent with changes in complexity. Fourth, it is “usable,” which means that it is easily constructed for a broad population of public companies.

The data includes filings from 2011 through April 30, 2025. The most recent data also includes ARC of recognized (financial statements) vs. disclosed information (financial statement notes), ARC for each financial statement (e.g., Income Statement), and several additional measures. Annual and quarterly measures of ARC are included in the dataset.

Relevant papers that use ARC:

If you decide to use the data, please consider citing one or more of the following papers. These papers develop, validate, provide code, review of the literature, and/or use ARC:

Data and Code Downloads

Below we provide links to files that include ARC and several related measures. The Firm Complexity files contain only the primary ARC measure (Hoitash and Hoitash 2018). The ARC files include several other measures. We are happy to help. Please feel free to reach out to us with questions!


XBRL and Firm Complexity in the Media

Accounting Today:

Professors propose new measure of accounting complexity

Read more

FEI Daily:

Measuring Accounting Disclosure Complexity with XBRL

Read more

Compliance Week:

As Complexity Rises, Quality Slides, XBRL-Based Study Says

Read more

Textual Data on Financial Statement Notes

Background

We share textual data of the most prevalent financial statement notes in annual 10-K and 10-Q filings. The data includes the text of each financial statement note along with the count of numbers in each note. The data is available for fiscal years between 2011-2024.

Accurately extracting financial statement notes from financial reports is a challenging task because companies tend to use different terminologies to describe footnotes. The XBRL mandate alleviates this challenge by requiring companies to tag each footnote in its entirety using, when available, standardized TextBlock tags. For example, the most common fair value TextBlock tag is ”us-gaap:FairValueDisclosuresTextBlock” which appears across 73% of firms that report fair value (Ahn et al. 2020).

For more details on the data, please refer to the “TextBlock data dictionary.docx” and the “TextBlock Categories.xlsx” files.

Relevant papers that use XBRL TextBlocks:

If you decide to use the data, please consider citing at least one of the following papers. These papers explain and/or use XBRL TextBlock tags:

Data and Document Downloads

Below we provide links to footnote text data and several related documents.

A partial list of studies that use ARC, or other related XBRL measures

  • Ahn, J., Hoitash, R. and Hoitash, U., 2022. Are Words Beneficial to the Consumption of Numbers in Financial Reports? Working paper, Northeastern and Bentley University
  • Ai, X., 2021. The Auditor’s Application of Professional Judgment: Evidence from M&A-related Critical Audit Matters. Working paper, University of Tennessee.
  • Akamah, H. and Shu, S.Q., 2021. Large Shareholder Portfolio Diversification and Voluntary Disclosure. Contemporary Accounting Research, 38(4).2918-2950.
  • Asiri, M., 2021. Three Essays in Investment Efficiency, Accounting Reporting Complexity, and Cybersecurity Breaches: Evidence from Corporate Tax Avoidance (Doctoral dissertation, Curtin University).
  • Burd, C., 2021. More Numbers, Less Problems: Analysts’ Use of Tax-Related XBRL Data for ETR Forecasting. Less Problems: Analysts’ Use of Tax-Related XBRL Data for ETR Forecasting. Working paper, Boston University.
  • Burke, J.J., Hoitash, R. and Hoitash, U., 2020. The Disclosure and Consequences of U.S. Critical Audit Matters. Contemporary Accounting Research, 37(4). 2398-2437.
  • Burke, J.J., Hoitash, R., Hoitash, U. and Xiao, S.X., 2021. The costs and benefits of retirement policies at US audit firms. Journal of Accounting and Public Policy, 40(4). 1-21.
  • Burke, J.J., Hoitash, R., Hoitash, U. and Xiao, S.X., 2022. The disclosure and consequences of U.S. critical audit matters. Working Paper, University of Colorado, Denver.
  • Brown, N.C., Huffman, A.A. and Cohen, S., 2022. October. Accounting reporting complexity and non-GAAP earnings disclosure. Working paper, University of Illinois.
  • Cahan, S.F., Chang, S., Siqueira, W.Z. and Tam, K., 2021. The roles of XBRL and processed XBRL in 10‐K readability. Journal of Business Finance & Accounting. 49 (1-2):33-68
  • Cheng, X., Huang, F., Palmon, D. and Yin, C., 2021. How does information processing efficiency relate to investment efficiency? Evidence from XBRL adoption. Journal of Information Systems, 35(1).1-25.
  • Chychyla, R., Leone, A.J. and Minutti-Meza, M., 2019. Complexity of financial reporting standards and accounting expertise. Journal of Accounting and Economics, 67(1) .226-253.
  • Cohen, S., 2020. Accounting Reporting Complexity and Firm-Level Investment Efficiency. Working paper San Diego State University.
  • Docimo, W.M., Gunn, J.L., Li, C. and Michas, P.N., 2021. Do Foreign Component Auditors Harm Financial Reporting Quality? A Subsidiary‐Level Analysis of Foreign Component Auditor Use. Contemporary Accounting Research, 38(4). 3113-3145.
  • Huang, F., No, W.G. and Vasarhelyi, M.A., 2019. Do managers use extension elements strategically in the SEC’s tagged data for financial statements? Evidence from XBRL complexity. Journal of Information Systems, 33(3). 61-74.
  • Garcia, J., de Villiers, C. and Li, L., 2021. Is a client’s corporate social responsibility performance a source of audit complexity? International Journal of Auditing, 25(1). 75-102.
  • Griffin, P., Enache, L. and Moldovan, R., 2021. Characteristics and Consequences of ASC 842 Lease Transition Disclosures. Working paper, University of California, Davis.
  • Guo, F., Luo, X., Wheeler, P.R., Yang, L., Zhao, X. and Zhang, Y., 2021a. Enterprise Resource Planning Systems and XBRL Reporting Quality. Journal of Information Systems, 35(3). 77-106.
  • Guo, F., Walton, S., Wheeler, P.R. and Zhang, Y., 2021b. Early Disruptors: Examining the Determinants and Consequences of Blockchain Early Adoption. Journal of Information Systems, 35(2). 219-242.
  • He, C. and Kohlbeck, M.J., 2021. Federal Government Contracts and Financial Reporting Quality. Working paper, Marquette and Florida Atlantic University.
  • Hoitash, R. and Hoitash, U., 2018. Measuring accounting reporting complexity with XBRL. The Accounting Review, 93(1). 259-287.
  • Hoitash, R., Hoitash U. and Morris L., 2021a. eXtensible Business Reporting Language: A Review and Implications for Future Research. Auditing: A Journal of Practice & Theory, 40 (2): 107–132.
  • Hoitash, R., Hoitash, U., Morris, L., and Yezegel, A., 2021b. Quarterly Footnote Disclosures as a Leading Indicator of Audit Risk. Working paper, Bentley and Northeastern University.
  • Hoitash, R., Hoitash, U. and Yezegel, A., 2021c. Can sell-side analysts’ experience, expertise and qualifications help mitigate the adverse effects of accounting reporting complexity? Review of Quantitative Finance and Accounting. 57:1-39.
  • Johnston, J., 2020. Extended XBRL tags and financial analysts’ forecast error and dispersion. Journal of Information Systems, 34(3).105-131.
  • Li, M., 2020. Legal Intensity of Financial Reports, Corporate Governance, and Financial Reporting Quality (Doctoral dissertation, Temple University).
  • Seavey, S.E., Whitworth, J.D. and Imhof, M.J., 2021. Early Earnings Releases and the Role of Accounting Quality. Auditing: A Journal of Practice & Theory.
  • Smith, A.L., Zhang, Y. and Kipp, P.C., 2019. Cloud-computing risk disclosure and ICFR material weakness: The moderating role of accounting reporting complexity. Journal of Information Systems, 33(3).1-17.
  • Walton, S., Yang, L. and Zhang, Y., 2021. XBRL Tag Extensions and Tax Accrual Quality. Journal of Information Systems, 35(2).91-114.
  • Zhou, J., 2020. Does one size fit all? Evidence on XBRL adoption and 10‐K filing lag. Accounting & Finance, 60(3).3183-3213.
  • Zimmerman, A., Barr-Pulliam, D., Lee, J.S. and Minutti-Meza, M., 2021. Auditors’ use of in-house specialists. Working paper, Florida State University.

Unlock the Power of Financial Text with Analytext.com

Your ultimate resource for research-ready textual data from corporate financial reports. Stop cleaning, start analyzing.

Visit Analytext.com →

What We Provide

Everything you need to accelerate your textual analysis research.

Ready-to-Use Datasets

Access decades of parsed textual data from 10-Ks, 10-Qs, and 8-Ks. Data is cleaned and structured for immediate use in Stata, SAS, R, or Python.

Powerful Linguistic Features

Leverage pre-computed metrics including readability scores (Fog, Flesch-Kincaid), sentiment analysis, word counts, and topic modeling data.

Python Code

Access the open-source Python code used to generate our metrics for your own data processing and analysis.

Based on Academic Research

Our data repository and tools are built upon rigorous academic research. Please cite the following paper if you use our data:

Codesso, M., Hoitash, R., & Hoitash, U. (2025). Textual Financial Data Repository and Python Code for Machine Learning, Al, and Textual Analyses.

The Researchers

Photo of Udi Hoitash

Udi Hoitash

Udi Hoitash, Ph.D.
Lilian L. and Harry A. Cowan Research Professor
D'Amore-McKim School of Business, Northeastern University

Email: u.hoitash@neu.edu

Faculty Profile SSRN Google Scholar Firm Benchmarking

Udi Hoitash is the Lilian L. and Harry A. Cowan Endowed Professor of Accounting at the D’Amore-McKim School of Business, Northeastern University. Professor Hoitash received his Ph.D. in Accounting and Information Systems from Rutgers University, his MBA from Tel-Aviv University and his B.A. in Computer Science from the College of Tal Aviv-Yaffo. His primary research interests include auditing, disclosure quality, XBRL and corporate governance, and his research often uses textual analysis and other data analytics tools. Professor Hoitash currently serves as an Editor for Auditing: A Journal of Practice & Theory. He has published multiple peer-reviewed papers, including papers at top accounting and finance journals such as The Accounting Review, Contemporary Accounting Research, Journal of Accounting and Economics, the Journal of Financial Economics and the Journal of Financial and Quantitive Analysis. His work has been frequently featured in news outlets such as the WSJ CFO magazine, Bloomberg Radio and CFO.com. Professor Hoitash’s teaching interests include managerial accounting and corporate governance.


Photo of Rani Hoitash

Rani Hoitash

Rani Hoitash, Ph.D., CISA
John E. Rhodes Professor of Accountancy
Bentley University

Email: rhoitash@bentley.edu

Faculty Profile SSRN Google Scholar Firm Benchmarking

Rani Hoitash is the John E. Rhodes Professor of Accountancy at Bentley University. He received his PhD in Accounting and Information Systems from Rutgers University and a Bachelor of Science in Economics at the College of Management in Tel-Aviv Israel. Research by Rani Hoitash concentrates on corporate governance, internal controls and auditing. His work is published in The Accounting Review, Journal of Accounting Research, Journal of Accounting & Economics, Journal of Financial Economics, Contemporary Accounting Research, Auditing: A Journal of Practice and Theory, Sloan Management Review, and several other journals. Professor Hoitash’s teaching interests include financial accounting, accounting information systems, and auditing. Professor Hoitash recently served as an Editor of Auditing: A Journal of Practice and Theory and on the editorial boards of Contemporary Accounting Research and the Journal of Business Research. He is a member of the Information Systems Audit and Controls Association and the American Accounting Association and is a Certified Information System Auditor (CISA).