Open Information Extraction from the Web by Oren Etzioni @VideoLectures

English

Open Information Extraction from the Web

Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tupleswithout requiring any human input. The paper also introduces TEXTRUNNER, a fully implemented, highly scalable OIE system where the tuples are assigned a probability and indexed to support efficient extraction and exploration via user queries. We report on experiments over a 9,000,000 Web page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to perform extraction for a handful of pre-specified relations, TEXTRUNNER extracts a far broader set of facts reflecting orders of magnitude more relations, discovered on the fly. We report statistics on TEXTRUNNER’s 11,000,000 highest probability tuples, and show that they contain over 1,000,000 concrete facts and over 6,500,000more abstract assertions.

Find OpenCourseWare Online Exams!

Attribution: The Open Education Consortium
http://www.ocwconsortium.org/courses/view/b10bf389d747461d5b223c3b5807ca5a/
Course Home http://videolectures.net/akbcwekex2012_etzioni_information_extraction/

	Does your crush like you? By Hoy Wen Start Quiz
©flickr: U.S.	Biology Chapter 10 By Michael Sag Start Exam
	35 Biology 35 The Nervous System MCQ By OpenStax Start Quiz
©flickr: brett	Celine Dion Quiz By JavaChamp Team Start Quiz
	Assembly Programming Language By JavaChamp Team Start Quiz
	4 AP 04 Tissue Level of Organization Essay By OpenStax Start Flashcards
	Young Economist MCQ Test By Robert Murphy Start Test
©flickr:	Vocabulary Practice Quiz! By Katie Montrose Start Quiz
	NCE Ch 11 Counseling Families, Diagnosis... By Anh Dao Start Quiz
	26 AP Key Terms 26 Fluid, Electrolyte, Acid-Base By OpenStax Start Key Terms