Epstein Files Full PDF

CLICK HERE
Technopedia Center
PMB University Brochure
Faculty of Engineering and Computer Science
S1 Informatics S1 Information Systems S1 Information Technology S1 Computer Engineering S1 Electrical Engineering S1 Civil Engineering

faculty of Economics and Business
S1 Management S1 Accountancy

Faculty of Letters and Educational Sciences
S1 English literature S1 English language education S1 Mathematics education S1 Sports Education
teknopedia

  • Registerasi
  • Brosur UTI
  • Kip Scholarship Information
  • Performance
Flag Counter
  1. World Encyclopedia
  2. Sentence extraction - Wikipedia
Sentence extraction - Wikipedia
From Wikipedia, the free encyclopedia
Text summarization technique

Sentence extraction is a technique used for automatic summarization of a text. In this shallow approach, statistical heuristics are used to identify the most salient sentences of a text. Sentence extraction is a low-cost approach compared to more knowledge-intensive deeper approaches which require additional knowledge bases such as ontologies or linguistic knowledge. In short, sentence extraction works as a filter that allows only meaningful sentences to pass.

The major downside of applying sentence-extraction techniques to the task of summarization is the loss of coherence in the resulting summary. Nevertheless, sentence extraction summaries can give valuable clues to the main points of a document and are frequently sufficiently intelligible to human readers.

Procedure

[edit]

Usually, a combination of heuristics is used to determine the most important sentences within the document. Each heuristic assigns a (positive or negative) score to the sentence. After all heuristics have been applied, the highest-scoring sentences are included in the summary. The individual heuristics are weighted according to their importance.

Early approaches and some sample heuristics

[edit]

Seminal papers which laid the foundations for many techniques used today have been published by Hans Peter Luhn in 1958[1] and H. P Edmundson in 1969.[2]

Luhn proposed to assign more weight to sentences at the beginning of the document or a paragraph. Edmundson stressed the importance of title-words for summarization and was the first to employ stop-lists in order to filter uninformative words of low semantic content (e.g. most grammatical words such as of, the, a). He also distinguished between bonus words and stigma words, i.e. words that probably occur together with important (e.g. the word form significant) or unimportant information. His idea of using key-words, i.e. words which occur significantly frequently in the document, is still one of the core heuristics of today's summarizers. With large linguistic corpora available today, the tf–idf value which originated in information retrieval, can be successfully applied to identify the key words of a text: If for example the word cat occurs significantly more often in the text to be summarized (TF = "term frequency") than in the corpus (IDF means "inverse document frequency"; here the corpus is meant by document), then cat is likely to be an important word of the text; the text may in fact be a text about cats.

See also

[edit]
  • Sentence boundary disambiguation
  • Text segmentation

References

[edit]
  1. ^ Hans Peter Luhn (April 1958). "The Automatic Creation of Literature Abstracts" (PDF). IBM Journal: 159–165.
  2. ^ H. P. Edmundson (1969). "New Methods in Automatic Extracting" (PDF). Journal of the ACM. 16 (2): 264–285. doi:10.1145/321510.321519. S2CID 1177942.
  • v
  • t
  • e
Natural language processing
General terms
  • AI-complete
  • Bag-of-words
  • n-gram
    • Bigram
    • Trigram
  • Computational linguistics
  • Natural language understanding
  • Stop words
  • Text processing
Text analysis
  • Argument mining
  • Collocation extraction
  • Concept mining
  • Coreference resolution
  • Deep linguistic processing
  • Distant reading
  • Information extraction
  • Named-entity recognition
  • Ontology learning
  • Parsing
    • semantic
    • syntactic
  • Part-of-speech tagging
  • Semantic analysis
  • Semantic role labeling
  • Semantic decomposition
  • Semantic similarity
  • Sentiment analysis
  • Terminology extraction
  • Text mining
  • Textual entailment
  • Truecasing
  • Word-sense disambiguation
  • Word-sense induction
Text segmentation
  • Compound-term processing
  • Lemmatisation
  • Lexical analysis
  • Text chunking
  • Stemming
  • Sentence segmentation
  • Word segmentation
Automatic summarization
  • Multi-document summarization
  • Sentence extraction
  • Text simplification
Machine translation
  • Computer-assisted
  • Example-based
  • Rule-based
  • Statistical
  • Transfer-based
  • Neural
Distributional semantics models
  • BERT
  • Document-term matrix
  • Explicit semantic analysis
  • fastText
  • GloVe
  • Language model
    • large
    • small
  • Latent semantic analysis
  • Long short-term memory
  • Seq2seq
  • Transformer
  • Word embedding
  • Word2vec
Language resources,
datasets and corpora
Types and
standards
  • Corpus linguistics
  • Lexical resource
  • Linguistic Linked Open Data
  • Machine-readable dictionary
  • Parallel text
  • PropBank
  • Semantic network
  • Simple Knowledge Organization System
  • Speech corpus
  • Text corpus
  • Thesaurus (information retrieval)
  • Treebank
  • Universal Dependencies
Data
  • BabelNet
  • Bank of English
  • DBpedia
  • FrameNet
  • Google Ngram Viewer
  • UBY
  • WordNet
  • Wikidata
Automatic identification
and data capture
  • Speech recognition
  • Speech segmentation
  • Speech synthesis
  • Natural language generation
  • Optical character recognition
Topic model
  • Document classification
  • Latent Dirichlet allocation
  • Pachinko allocation
Computer-assisted
reviewing
  • Automated essay scoring
  • Concordancer
  • Grammar checker
  • Predictive text
  • Pronunciation assessment
  • Spell checker
Natural language
user interface
  • Chatbot
  • Interactive fiction
  • Question answering
  • Virtual assistant
  • Voice user interface
Related
  • Formal semantics
  • Hallucination
  • Natural Language Toolkit
  • spaCy
Retrieved from "https://teknopedia.ac.id/w/index.php?title=Sentence_extraction&oldid=1258006795"
Categories:
  • Computational linguistics
  • Natural language processing
Hidden categories:
  • Articles with short description
  • Short description matches Wikidata

  • indonesia
  • Polski
  • العربية
  • Deutsch
  • English
  • Español
  • Français
  • Italiano
  • مصرى
  • Nederlands
  • 日本語
  • Português
  • Sinugboanong Binisaya
  • Svenska
  • Українська
  • Tiếng Việt
  • Winaray
  • 中文
  • Русский
Sunting pranala
url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url
Pusat Layanan

UNIVERSITAS TEKNOKRAT INDONESIA | ASEAN's Best Private University
Jl. ZA. Pagar Alam No.9 -11, Labuhan Ratu, Kec. Kedaton, Kota Bandar Lampung, Lampung 35132
Phone: (0721) 702022
Email: pmb@teknokrat.ac.id