Artificial Intelligence

Information Retrieval and QA at Thomson Reuters

Our customers need the right information, in the right context and often under tight time-constraints. We adopt a comprehensive approach to the information findability problem, using a combination of search technologies, recommendation systems, and navigation-based discovery.

At Thomson Reuters, we are a text-heavy organization and as such our information retrieval research is focused on natural language search, which combines techniques from Natural Language Processing (NLP) and Information Retrieval (IR).

Formally, Information Retrieval is the science (and engineering) of searching for information in content repositories at different levels of granularities (e.g., documents, passages, meta-data) and across different types (documents, social media, images, video, sound) both at rest and in motion (e.g., streaming data). Our information retrieval research is a bit broader than the formal definition in that it includes recommender systems and navigation-based discovery.

From a technology perspective, our scientists and engineers have significant expertise in classical NLP and IR methodologies as well as more recent advances including using deep learning and language models for IR and question answering problems.

Our scientists and engineers are pioneers of IR. For example, within the legal domain we can proudly proclaim that we have fundamentally transformed how legal research is done. Example products include ResultsPlus (a large-scale, content- and behavior-based recommender system with personalization), Medical Litigator (a vertical search engine for the medical domain for lawyers), Westlaw Next and its patented WestSearch which is comprised of 13 vertical search engines each designed for a target content set, Westlaw Edge (which includes robust, open-ended question answering for the law) and Checkpoint Edge (a state of the art search engine for the tax domain).

Information retrieval and search will continue to play an important role in what we do as a team and in how we satisfy our customers' varied and often complex information needs. Directionally speaking, there is no distinction between finding and understanding, and we aim to develop experiences that accept more varied input (query, document, question, session-interactions, etc) while producing more focused output (an answer, a document, a dynamically generated report, etc).

Our Work:

Wenhui Liao, Sriharsha Veeramachaneni. 2010. “Unsupervised Learning for Reranking-based Patent Retrieval”. In 3rd International Workshop on Patent Information Retrieval, in 19th ACM C Conference on Information and Knowledge Management (ICKM).

Howard R. Turtle. 1994. “Natural Language vs. Boolean Query Evaluation: A Comparison of Retrieval Performance”. In Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Pages 212-220. Dublin, Ireland: Special Issue of the SIGIR Forum.

Tonya Custis, Frank Schilder, Thomas Vacek, Gayle McElvain, Hector Martinez Alonso. 2019. “Westlaw Edge AI Features Demo: KeyCite Overruling Risk, Litigation Analytics, and WestSearch Plus”. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law. Pages 256-257. Montreal (Quebec), Canada.  

Gayle McElvain, George Sanchez, Sean Matthews, Don Teo, Filippo Pompili, Tonya Custis. 2019. “WestSearch Plus: A Non-factoid Question-Answering System for the Legal Domain”. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Pages 1361-1364. New York, NY, USA.

Dezhao Song, Frank Schilder, Charese Smiley, Chris Brew, Tom Zielund, Hiroko Bretz, Robert Martin, Chris Dale, John Duprey, Tim Miller, Johanna Harrison. 2015. “TR Discover: A Natural Language Interface for Querying and Analyzing Interlinked Datasets”. The Semantic Web - ISWC 2015, volume 9367, pages 21-37. Springer International Publishing.