Artificial intelligence

Natural language processing (NLP) and machine learning (ML) at Thomson Reuters

Natural language processing focuses on designing algorithms to parse, analyze, mine, and ultimately understand and generate human language. We heavily build our capabilities on the latest breakthroughs in deep learning (DL) and other machine learning techniques supporting our customers’ work in information-heavy segments.

Language enables us to communicate, collaborate, negotiate, and socialize with each other. Language allows us to record our own experiences, how we learn from others, how we share knowledge, and how we preserve and advance civilization. At Thomson Reuters, we operate in language-rich industries: laws, regulations, news, disputes, and business transactions are all captured in text. The amount of text is growing exponentially; processing and acting upon it is a competitive advantage for all our customers.

The ability to process massive amounts of text, to mine it for insights and information nuggets, to organize it, to connect it, to contrast it, to understand it, and to answer questions about it, is of utmost importance for our customers and for us. This is why the combination of NLP and natural language understanding (NLU) has been one of our core research areas for the last 20 years.

The objectives of our NLP research span our editorial processes as well as our customer-facing products. On the editorial front, the primary focus is on building tools for mining, enhancing, and organizing content. Products such as Westlaw or Practical Law may have artificial intelligence (AI) components that enable our customers to extract or retrieve information at scale.

As many of our data sources are rich text collections, it should not come as a surprise that we solve many of our text-related problems via commonly used NLP techniques, such as named entity recognition and resolution, classification, and natural language generation.

Recent breakthroughs in deep learning also enable us to utilize language models such as Bidirectional Encoder Representations from Transformers (BERT) or Generative Pre-trained Transformer 3 (GPT-3) — Custis et al. 2019Shaghaghian et al. 2020, Song et al. 2022 — in order to enhance many of our products in terms of better question answering or text classification capabilities —such as Westlaw PrecisionHighQ Contract Analysis, and Litigation Analytics — while high-quality content is ensured by our human-in-the-loop approach, always testing and verifying machine-generated content.

Our work:

Tonya Custis, Frank Schilder, Thomas Vacek, Gayle McElvain, and Hector Martinez Alonso. Westlaw Edge AI Features Demo: KeyCite Overruling Risk, Litigation Analytics, and WestSearch Plus. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, ICAIL ’19, pages 256–257, Montreal (Québec), Canada, 2019. ACM.

Shohreh Shaghaghian, Luna Yue Feng, Borna Jafarpour, and Nicolai Pogrebnyakov. Customizing Contextualized Language Models for Legal Document Reviews. In 2020 IEEE International Conference on Big Data (Big Data), pages 2139–2148. IEEE, 2020.

Dezhao Song, Sally Gao, Baosheng He, and Frank Schilder. On the effectiveness of pre-trained language models for legal natural language processing: An empirical study. IEEE Access, 10:75835– 75858, 2022.