HLRnet.com: Languages: Resources in language teaching: Applied Linguisitics
Applied linguistics and editors
Teaching / studying
Evaluation / Assessment
Text-to-speech
Translation / see also on
line translation
Screen scraping
Translation software
Spelling software
Subtitles
Voice recognition / Speech recognition
Handwriting recognition
Other
Open software initiatieven
- Atlantida
is a multilingual cross-platform dictionary. Currently it has 310,000
definitions, and knows how to pronounce 21,000 English words.
- Calenco
is a
collaborative editing Web platform. It allows remote teams of writers,
translators, and designers to create multi-lingual content and publish
it in various formats: PDF, HTML, etc. It is based on XML technology to
store and transform content.
- ccextractor
is a fast closed caption extractor for MPEG files. It can generate
.srt/.smi (subtitles) files directly from your TV recordings (both
analog and digital) or DVDs, Tivo, etc.
- ClipSpeak
is a portable, lightweight text-to-speech tool that speaks text copied
to the clipboard. It is intended to be as transparent as possible to
the user, keeping user interface and interaction to a minimum.
- The Computational
Linguistics Toolset
is a set of tools for computational linguistics. It contains re-usable
code for cleaning, splitting, refining, and taking samples from corpora
(ICE, Penn, and a native one), for tagging them using the TnT-tagger,
for doing permutation statistics on N-grams (useful for finding
statistically significant syntactical differences between any two sets
of tagged texts), and various examination-tools. The tools themselves
are well documented.
- Diqt
is a Web-based multilingual dictionary reference tool. That is,
dictionaries of many languages can be searched using a Web browser. Any
language is available if you have its dictionary data. For example, you
can search English-Japanese, English-German, English-French, and
Japanese-English dictionaries at the same time.
- DM Dictionary
is a PocketPC program that it allows you to use a simple text file
(.txt) to find the translation of every word you need. It finds a word
in 3 seconds on a file with 11'000 words!!!
- Esperantilo
("Tool for Esperanto") is a UTF-8 editor with linguistics functions for
the language Esperanto, and is also a system for computer aided
translation. It contains a spell checker and grammar checker for the
Esperanto language. It can translate Esperanto text in different
formats into Polish, German, and English. It also supports computer
aided translation by interactive machine translation. Translation
memory can be used also for any language pairs. It is an XLIFF editor.
It supports XLIFF and TMX (Level 1) formats. Machine translation uses
direct translation at syntax level (dictionary-based translation with
some grammar transformations).
- Experience-Based Language Acquisition
is a computational model of human language acquisition. It is written
entirely in Java and currently acquires a protolanguage of nouns and
verbs language based on visual perception.
- teca
- Glossword
is a system to publish dictionaries, glossaries, and encyclopedias. It
features an installation wizard, support for multiple languages, visual
themes, multi-domain installation, an administrative interface with
multi-user support, built-in search and cache engines, the ability to
export/import dictionaries in XML format, and W3C-validated code.
Glossword is useful for any sort of dictionary-like content, including
sites with game cheat codes, online translators, references, and
various kinds of CMS solutions.
- Gnuspeech
is
an extensible text-to-speech package, based on real-time, articulatory,
speech-synthesis-by-rules. It converts text strings into phonetic
descriptions, aided by a pronouncing dictionary, letter-to-sound rules,
and rhythm and intonation models. It then transforms the phonetic
descriptions into parameters for a low-level articulatory synthesiser.
It uses these to drive an articulatory model of the human vocal tract,
producing output suitable for sound output devices. The system
currently deals with spoken English.
- IPA Zounds
models language sound changes by applying a given set of sound change
rules to a given lexicon. It has a built-in model of the International
Phonetic Alphabet, allowing users to write input words in IPA
characters and rules using those characters or the distinctive features
of the model.
- JVoiceXML
is an implementation of VoiceXML 2.1, the Voice Extensible Markup
Language. VoiceXDigital Object IdentifierML is designed for
creating audio dialogs that
feature synthesized speech, digitized audio, recognition of spoken and
DTMF key input, recording of spoken input, telephony, and mixed
initiative conversations.
- Jubler
is a tool to edit text-based subtitles. It can be used as an authoring
software for new subtitles or as a tool to convert, transform, correct
and refine existing subtitles.
- Julius
is a high-performance large vocabulary continuous speech recognition
(LVCSR) engine for speech-related research and development. You can
construct your own speech recognition system, but you need a separate
English acoustic model and language model or grammar file.
- minpair
generates a complete list of minimal pairs (words differing in exactly
one segment) for use in linguistic research from a list of words in
UTF-8 Unicode. By default, only searches for pairs of words of the same
length differing in exactly one segment. Command line options allow the
addition of single insertions or deletions and single transpositions.
In order to find all minimal pairs it is normally necessary for the
input notation to use one character for each segment. Even in IPA
transcription, this is often not the case. minpair provides for this
situation by accepting definitions of multigraphs.
- moz-hocr-edit:
hOCR is a file format for representing the output of Optical Character
Recognition (OCR) programs such as OCRopus. OCR programs are not
perfect at recognizing text, so human editing is often necessary.
moz-hocr-edit provides a line-by-line user interface for people to edit
and proofread hOCR documents.
- OmegaT
is a free and open source multiplatform Computer Assisted Translation
tool with fuzzy matching, translation memory, keyword search,
glossaries, and translation leveraging into updated projects.
- Open
Subtitle Editor
is an open-source subtitle editor that provides an easy solution to
various editing jobs such as translation, resyncing, adding and
removing subtitles, as well as to creating subtitles for any video file
from scratch.
- Open
Translation Engine (OTE) is a project developing
language translation and dictionary tools for the Internet community.
- Poedit
is a gettext translation (.po file) editor for Unix, Windows, and OS X.
It aims to provide translators with a simple, easy to use user
interface with all the essential tools such as spellchecker or
translation memory. It can also be used to manage translations for
small projects
- Pootle
is a Web-based translation and translation management tool. It provides
a rich set of features for mangaging a translation project. It
integrates components of the Translate Toolkit to provide error
checkers for translation messages and the ability to download files in
a number of formats: PO, XLIFF, CSV. Pootle can also provide compiled
PO files for download. You can use it to assign work to translators in
your team, and you can define goals to help focus the efforts of your
translation. Pootle can run without a Web server or be proxied through
your existing Apache server.
- pyECTOR
is a chatterbot which learns from what people say. It is based on an
artificial intelligence architecture that is inspired by Copycat, an AI
system from Mitchell and Hofstadter. The Concept Network it uses is a
mix between neural and semantic networks. It uses co-occurrences to
compute the influence of one semantic node on another. The links are
statistically weighted
- Subtitle
Processor:
Subtitle editor for editing, repairing and translating subtitles for
movies. Contains integrated movie and DVD player for easy
synchronization of subtitles with the movie. Many simple and advanced
editing functions.
- The Translate
Toolkit
is a set of tools designed to assist translators and localisers, with a
specific focus on the Gettext PO and XLIFF formats. These tools
currently include converters for Mozilla to PO and OpenOffice to PO,
and checkers for punctuation, accelerators, etc.
- The Language Machine
is a free software toolkit for language and grammar. It includes a
shared library, a main program, and several metalanguage compilers with
one frontend. The system is easy to use on its own or as a component.
It directly implements unrestricted rule-based grammars with actions
and external interfaces. A unique diagram shows rulesets in action.
Bookstores and Editors
See also: