Concordancers: taking a peek into the linguistic context of a word

Natural language processing (NLP), the branch of computer science concerned with transforming human language (as opposed to formal languages, such as math or computer code) into something readable and understandable by a machine, has generated many interesting applications that are commonplace today: spell correction, word prediction, search engines, automatic translation apps (a.k.a. machine translation), chatbots, among others.

Even when these apps can be useful for the language learner (who hasn’t googled for a word?), they are designed with the general digital user in mind. However, there are some apps that —while using similar NLP technology— are more directly concerned with language and that can be a better aid for writers wanting to broaden their vocabulary and polish their style. In this article, I’ll tell you about one particular language technology that I believe is both very simple and tremendously useful for language learning: concordancers.

Concordance and Concordancer

When we talk about a concordance we are basically meaning the context of a word. Here we have to remark that we are understanding “context” in the simplest way possible, as the words that directly surround any given word in a text.

So, if we considered a collection of only three sentences to build our concordances, say:

“That dog is black”

“That dog is cute”

“That dog is lazy”

We would get that, given the word “is”, there are only 3 possible combinations “is black”, “is cute”, and “is lazy”. Of course, with a collection of only 3 sentences, this seems silly. But when you have huge collections of text, indeed, a fully-fledged corpus, concordances can be quite enlightening. Also, in our example, we could easily identify the concordances by hand, but when you have to deal with huge volumes of information, the only feasible way to move forward is to let a computer program do the job for you. That’s what a concordancer is: a computer program that analyzes text and looks for concordances.

Simple, right? Well, the strength of concordancers is based on the volume and quality of the documents with which you feed it. They operate by a very interesting mixture of maths and language and are in good part the basis for word prediction. I mention maths because when you have a full corpus informing your concordancer, it will not show you every possible combination, but only a few of the most frequent (and thus, most likely relevant for you). This can be a great help when you are struggling with word choices, or not entirely sure about that collocation, since it allows you to find recurring phrases, grammatical patterns, and whatnot.

There are quite a few concordancers out there. Some are paid, while some others you can use for free. I’d love for you to go and experiment with one of these tools, so let’s check a free one, shall we?

AntConc

AntConc is a reliable freeware concordancer, developed by Dr. Laurence Anthony from Waseda University, Japan. He has quite an interesting resumé holding both a B.Sc in Maths as well as a Ph.D in Applied linguistics (just the mixture we talked about), and participates in a line of research in language education sometimes referred to as data-driven learning, which is concerned with using linguistic data (corpora) to better tackle the language learning process.

AntConc is a cross-platform (Windows, Mac, Linux) software that allows you to create your own collection of documents to analyze. 

AntConc can be tremendously useful if you have the time to build your own collection of documents or if you want to work only with the concordances of a specific knowledge domain, for instance, by only feeding it with, say, biology papers, if you wanted to learn to write biology-sh.

The same can be said to be its main drawback. Even when you can download ready-made corpora online (such as the Brown corpus, COCA, etc.), it can be time-consuming to load, process, and work over these collections. Furthermore, AntConc design can be said to be a little old fashioned and not really user-friendly. Finally, and —I’d say— crucially, it is a stand-alone app that cannot be integrated with your word processor of choice. Overall, if you want to dive into language research, AntConc definitely is a tool to consider.

WriteBetter: a concordancer integrated into your word processor

WriteBetter is another concordancer worth mentioning. Its advantage is the integration into the word processor and easy interaction with the corpora stored. Unlike AntConc, you don’t need to search for text collections: WriteBetter stores more than 60 GB of corpora available for its users.

Another advantage is its usability. You don’t need to be a linguist to use WriteBetter. Just open it in your word editor (Word, Google docs, Overleaf) to start seeing real-time suggestions based on the corpora available. You can also select a couple of words to see concordances and learn how they work in context.

In brief, if you’re interested in using language technology to improve your writing, WriteBetter can be a good choice, inasmuch as it draws from a ready-made, carefully constructed collection of documents and can be integrated into the most widely used word processors to give you relevant word suggestions.

In brief, if you’re interested in using language technology to improve your writing, WriteBetter can be a good choice, inasmuch as it draws from a ready-made, carefully constructed collection of documents and can be integrated into the most widely used word processors to give you relevant word suggestions.


Other articles

Keyword Stuffing is Bad for SEO: how to Rank Better without it

Keyword stuffing is one of the Search Engine Optimization (SEO) methods aimed to deceive search engines. It consists of overusing some keywords on a page with the aim to manipulate search results (spamdexing) and increase traffic and popularity.  In the past, this black hat tactic was highly used by web sites trying to boost their search rankings unethically. Today, as most…

Continue Reading Keyword Stuffing is Bad for SEO: how to Rank Better without it

Develop/have/suffer from an illness: What is the correct collocation?

The use of expressions in the technical or specialized language is not restricted to one specific field. Many expressions and terms that arise in a specific field of knowledge then are taken by another one. Even most of the specialised languages come from the general language. This happens because language is not a closed system.…

Continue Reading Develop/have/suffer from an illness: What is the correct collocation?

Leave a Reply

Your email address will not be published. Required fields are marked *