Epstein Files Full PDF

CLICK HERE
Technopedia Center
PMB University Brochure
Faculty of Engineering and Computer Science
S1 Informatics S1 Information Systems S1 Information Technology S1 Computer Engineering S1 Electrical Engineering S1 Civil Engineering

faculty of Economics and Business
S1 Management S1 Accountancy

Faculty of Letters and Educational Sciences
S1 English literature S1 English language education S1 Mathematics education S1 Sports Education
teknopedia

  • Registerasi
  • Brosur UTI
  • Kip Scholarship Information
  • Performance
Flag Counter
  1. World Encyclopedia
  2. Language resource - Wikipedia
Language resource - Wikipedia
From Wikipedia, the free encyclopedia
Linguistic material used for various types of language research and processing

In linguistics and language technology, a language resource is a "[composition] of linguistic material used in the construction, improvement and/or evaluation of language processing applications, (...) in language and language-mediated research studies and applications."[1]

According to Bird & Simons (2003),[2] this includes

  1. data, i.e. "any information that documents or describes a language, such as a published monograph, a computer data file, or even a shoebox full of handwritten index cards. The information could range in content from unanalyzed sound recordings to fully transcribed and annotated texts to a complete descriptive grammar",[2]
  2. tools, i.e., "computational resources that facilitate creating, viewing, querying, or otherwise using language data",[2] and
  3. advice, i.e., "any information about what data sources are reliable, what tools are appropriate in a given situation, what practices to follow when creating new data". The latter aspect is usually referred to as "best practices" or "(community) standards".[2]

In a narrower sense, language resource is specifically applied to resources that are available in digital form, and then, "encompassing (a) data sets (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) in machine readable form, and (b) tools/technologies/services used for their processing and management".[1]

Typology

[edit]

As of May 2020, no widely used standard typology of language resources has been established (current proposals include the LREMap,[3] METASHARE,[4] and, for data, the LLOD classification). Important classes of language resources include

  1. data
    1. lexical resources, e.g., machine-readable dictionaries,
    2. linguistic corpora, i.e., digital collections of natural language data,
    3. linguistic data bases such as the Cross-Linguistic Linked Data collection,
  2. tools
    1. linguistic annotations and tools for creating such annotations in a manual or semiautomated fashion (e.g., tools for annotating interlinear glossed text such as Toolbox and FLEx, or other language documentation tools),
    2. applications for search and retrieval over such data (corpus management systems), for automated annotation (part-of-speech tagging, syntactic parsing, semantic parsing, etc.),
  3. metadata and vocabularies
    1. vocabularies, repositories of linguistic terminology and language metadata, e.g., MetaShare (for language resource metadata),[4] the ISO 12620 data category registry (for linguistic features, data structures and annotations within a language resource),[5] or the Glottolog database (identifiers for language varieties and bibliographical database).[6]

Language resource publication, dissemination and creation

[edit]

A major concern of the language resource community has been to develop infrastructures and platforms to present, discuss and disseminate language resources. Selected contributions in this regard include:

  • a series of International Conferences on Language Resources and Evaluation (LREC),
  • the European Language Resources Association (ELRA, EU-based), and the Linguistic Data Consortium (LDC, US-based), which represent commercial hosting and dissemination platforms for language resources,
  • the Open Languages Archives Community (OLAC), which provides and aggregates language resource metadata,
  • the Language Resources and Evaluation Journal (LREJ),[7]
  • the European Language Grid is a European platform for language technologies (eg services), data and resources.

As for the development of standards and best practices for language resources, these are subject of several community groups and standardization efforts, including

  • ISO Technical Committee 37: Terminology and other language and content resources (ISO/TC 37), developing standards for all aspects of language resources,
  • W3C Community Group Best Practices for Multilingual Linked Open Data (BPMLOD),[8] working on best practice recommendations for publishing language resources as Linked Data or in RDF,
  • W3C Community Group Linked Data for Language Technology (LD4LT),[9] working on linguistic annotations on the web and language resource metadata,
  • W3C Community Group Ontology-Lexica (OntoLex),[10] working on lexical resources,
  • the Open Linguistics working group of the Open Knowledge Foundation, working on conventions for publishing and linking open language resources, developing the Linguistic Linked Open Data cloud,[11]
  • the Text Encoding Initiative (TEI),[12] working on XML-based specifications for language resources and digitally edited text.


References

[edit]
  1. ^ a b LD4LT (2020), The Metashare Ontology as Created by the LD4LT Community Group Archived 2023-02-10 at the Wayback Machine, W3C Community Group Linked Data for Language Technology (LD4LT), Development branch, version of Mar 10, 2020
  2. ^ a b c d Bird, Steven; Simons, Gary (2003-11-01). "Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources". Computers and the Humanities. 37 (4): 375–388. arXiv:cs/0308022. Bibcode:2003cs........8022B. doi:10.1023/A:1025720518994. ISSN 1572-8412. S2CID 5969663.
  3. ^ Calzolari, N., Del Gratta, R., Francopoulo, G., Mariani, J., Rubino, F., Russo, I., & Soria, C. (2012, May). The LRE Map. Harmonising Community Descriptions of Resources. In LREC (pp. 1084-1089).
  4. ^ a b McCrae, John P.; Labropoulou, Penny; Gracia, Jorge; Villegas, Marta; Rodríguez-Doncel, Víctor; Cimiano, Philipp (2015). "One Ontology to Bind Them All: The META-SHARE OWL Ontology for the Interoperability of Linguistic Datasets on the Web". In Gandon, Fabien; Guéret, Christophe; Villata, Serena; Breslin, John; Faron-Zucker, Catherine; Zimmermann, Antoine (eds.). The Semantic Web: ESWC 2015 Satellite Events. Lecture Notes in Computer Science. Vol. 9341. Cham: Springer International Publishing. pp. 271–282. doi:10.1007/978-3-319-25639-9_42. ISBN 978-3-319-25639-9.
  5. ^ Kemps-Snijders, M., Windhouwer, M., Wittenburg, P., & Wright, S. E. (2008). ISOcat: Corralling data categories in the wild. In 6th International Conference on Language Resources and Evaluation (LREC 2008).
  6. ^ Nordhoff, Sebastian (2012), Chiarcos, Christian; Nordhoff, Sebastian; Hellmann, Sebastian (eds.), "Linked Data for Linguistic Diversity Research: Glottolog/Langdoc and ASJP Online", Linked Data in Linguistics: Representing and Connecting Language Data and Language Metadata, Springer, pp. 191–200, doi:10.1007/978-3-642-28249-2_18, ISBN 978-3-642-28249-2{{citation}}: CS1 maint: work parameter with ISBN (link)
  7. ^ "Language Resources and Evaluation". Springer. Retrieved 2020-05-13.
  8. ^ "Best Practices for Multilingual Linked Open Data Community Group". www.w3.org. 2 October 2015. Retrieved 2020-05-13.
  9. ^ "Linked Data for Language Technology Community Group". www.w3.org. 26 June 2015. Retrieved 2020-05-13.
  10. ^ "Ontology-Lexica Community Group". www.w3.org. 10 May 2016. Retrieved 2020-05-13.
  11. ^ "Linguistic Linked Open Data".
  12. ^ "TEI: Text Encoding Initiative". tei-c.org. Retrieved 2020-05-13.
  • v
  • t
  • e
Natural language processing
General terms
  • AI-complete
  • Bag-of-words
  • n-gram
    • Bigram
    • Trigram
  • Computational linguistics
  • Natural language understanding
  • Stop words
  • Text processing
Text analysis
  • Argument mining
  • Collocation extraction
  • Concept mining
  • Coreference resolution
  • Deep linguistic processing
  • Distant reading
  • Information extraction
  • Named-entity recognition
  • Ontology learning
  • Parsing
    • semantic
    • syntactic
  • Part-of-speech tagging
  • Semantic analysis
  • Semantic role labeling
  • Semantic decomposition
  • Semantic similarity
  • Sentiment analysis
  • Terminology extraction
  • Text mining
  • Textual entailment
  • Truecasing
  • Word-sense disambiguation
  • Word-sense induction
Text segmentation
  • Compound-term processing
  • Lemmatisation
  • Lexical analysis
  • Text chunking
  • Stemming
  • Sentence segmentation
  • Word segmentation
Automatic summarization
  • Multi-document summarization
  • Sentence extraction
  • Text simplification
Machine translation
  • Computer-assisted
  • Example-based
  • Rule-based
  • Statistical
  • Transfer-based
  • Neural
Distributional semantics models
  • BERT
  • Document-term matrix
  • Explicit semantic analysis
  • fastText
  • GloVe
  • Language model
    • large
    • small
  • Latent semantic analysis
  • Long short-term memory
  • Seq2seq
  • Transformer
  • Word embedding
  • Word2vec
Language resources,
datasets and corpora
Types and
standards
  • Corpus linguistics
  • Lexical resource
  • Linguistic Linked Open Data
  • Machine-readable dictionary
  • Parallel text
  • PropBank
  • Semantic network
  • Simple Knowledge Organization System
  • Speech corpus
  • Text corpus
  • Thesaurus (information retrieval)
  • Treebank
  • Universal Dependencies
Data
  • BabelNet
  • Bank of English
  • DBpedia
  • FrameNet
  • Google Ngram Viewer
  • UBY
  • WordNet
  • Wikidata
Automatic identification
and data capture
  • Speech recognition
  • Speech segmentation
  • Speech synthesis
  • Natural language generation
  • Optical character recognition
Topic model
  • Document classification
  • Latent Dirichlet allocation
  • Pachinko allocation
Computer-assisted
reviewing
  • Automated essay scoring
  • Concordancer
  • Grammar checker
  • Predictive text
  • Pronunciation assessment
  • Spell checker
Natural language
user interface
  • Chatbot
  • Interactive fiction
  • Question answering
  • Virtual assistant
  • Voice user interface
Related
  • Formal semantics
  • Hallucination
  • Natural Language Toolkit
  • spaCy
Retrieved from "https://teknopedia.ac.id/w/index.php?title=Language_resource&oldid=1303344031"
Categories:
  • Natural language processing
  • Computational linguistics
Hidden categories:
  • Webarchive template wayback links
  • CS1 maint: work parameter with ISBN
  • Articles with short description
  • Short description with empty Wikidata description

  • indonesia
  • Polski
  • العربية
  • Deutsch
  • English
  • Español
  • Français
  • Italiano
  • مصرى
  • Nederlands
  • 日本語
  • Português
  • Sinugboanong Binisaya
  • Svenska
  • Українська
  • Tiếng Việt
  • Winaray
  • 中文
  • Русский
Sunting pranala
url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url
Pusat Layanan

UNIVERSITAS TEKNOKRAT INDONESIA | ASEAN's Best Private University
Jl. ZA. Pagar Alam No.9 -11, Labuhan Ratu, Kec. Kedaton, Kota Bandar Lampung, Lampung 35132
Phone: (0721) 702022
Email: pmb@teknokrat.ac.id